starting your devops journey – practical tips for ops

67
Starting your DevOps Journey Practical Tips for Ops http:// dynatrace.com/trial Brian Chandler Systems Engineer @ Raymond James @Channer531 Andreas Grabner Chief DevOps Activist @ Dynatrace @grabnerandi

Upload: dynatrace

Post on 17-Jan-2017

170 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Starting Your DevOps Journey – Practical Tips for Ops

Starting your DevOps JourneyPractical Tips for Ops

http://dynatrace.com/trial

Brian ChandlerSystems Engineer @ Raymond James@Channer531

Andreas GrabnerChief DevOps Activist @ Dynatrace@grabnerandi

Page 2: Starting Your DevOps Journey – Practical Tips for Ops

Promise of DevOps: Faster & Efficient Innovation

Smaller Apps, Micro-Services More Deployments

App-, Service- & End-User Feedback Loops

Happy Users

Lower Costs

+

-

Page 3: Starting Your DevOps Journey – Practical Tips for Ops

Proof: DevOps Adopters Are …200x 2,555x

more frequent deployments faster lead times than their peers

More Agile

3x 24xlower change failure rate faster Mean Time to Recover

More Reliable

More Successful 2x 50%More likely to exceed market

expectationsHigher market cap growth

over 3 years

Source: Puppet Labs 2015 State Of DevOps Report: https://puppet.com/resources/white-paper/2016-state-of-devops-report

Page 4: Starting Your DevOps Journey – Practical Tips for Ops

Dynatrace Transformation by the numbers

23x

170

More releases

Deployments / Day

31000 60hUnit+Int Tests / hour UI Tests per Build

More Quality

~200 340Code commits / day Stories per sprint

More Agile

93%Production bugs found by Dev

More Stability 450 99.998%Global EC2 Instances Global Availability

Webinar @ https://info.dynatrace.com/17q3_wc_from_agile_to_cloudy_devops_na_registration.html

Page 5: Starting Your DevOps Journey – Practical Tips for Ops

YET: „DevOps Adoption is only 2%“ Gene Kim, Nov 2016

Page 6: Starting Your DevOps Journey – Practical Tips for Ops

Interesting Ops Learnings from Adopters

New Technology Stack

New Architectural Patterns

End User Focused

New Deployment Models

Page 7: Starting Your DevOps Journey – Practical Tips for Ops

DevOps Requirements and Engagement Options for OpsFeedback through High Quality App & User Data

Ops as a Service: “Self-Service for Application Teams”

Bridge the Gap between Enterprise Stack and New Stack

Shift-Left: (No)Ops as “Part of Application Delivery”

Requ

irem

ents

Enga

gem

ent O

ption

s

Page 8: Starting Your DevOps Journey – Practical Tips for Ops

Basic App Monitoring1

App Dependencies2

End User Monitoring3 How to monitor mobile vs desktop vs tablet vs service endpoints?How much network bandwidth is required per app, service and feature?Where to start optimizing bandwidth: CDNs, Caching, Compression?

Are our applications up and running?What load patterns do we have per application?What is the resource consumption per application?

What are the dependencies between apps, services, DB and infra?How to monitor „non custom app“ tiers?Where are the dependency bottlenecks? Where is the weakest link?

Closing the Ops to Dev Feedback Loop: One Step at a Time!

“Soft-Launch” Support4

Virtualization Monitoring5 How to automatically monitor virtual and container instances?

What to monitor when deploying into public or private clouds?

How to deploy and monitor multiple versions of the same app / service?What and how to baseline?Do we have a better or worse version of an app/service/feature?

Ops: Need answers to these questions! Closing the gap to AppBizDev

Ready for “Cloud Native”

How to alert on real problems and not architectural patterns?How to consolidate monitoring between Cloud Native and Enterprise?

Who is using our apps? Geo? Device?Which features are used? Whats the behavior?Where to start optimizing? App Flow? Page Size?Conversion Rates? Bounce Rates?

Where are the performance / resource hotspots?When and where do applications break?

Do we have bad dependencies through code or config?How does the system really behave in production?What to learn for future architecturs?

What are the usage patterns for A/B or Green/Blue?Difference between different versions and features?

Does the architecture work in these dynamic enviornments?Does scale up/down work as expected?

Provide „Monitoring as a Service“ for Cloud Native Application Teams6

Today

Page 9: Starting Your DevOps Journey – Practical Tips for Ops

Questions to Answer!

Are our applications up & running?

What are the real load patterns?

What is the resource consumption?

Where to start optimizing?

Page 10: Starting Your DevOps Journey – Practical Tips for Ops

Are our Apps Up, Running & Accessible?

Time of Deployment

Availability dropped to 0%

Page 11: Starting Your DevOps Journey – Practical Tips for Ops

Early Warning SLA Monitoring!Quality of

Connectivity, DNSQuality of

Connectivity & DNS

Quality of Content Delivery

Quality of Content Delivery

3rd Party Impact

Delivery by Geo Quality of Content Delivery

Page 12: Starting Your DevOps Journey – Practical Tips for Ops

Client Center Daily Traffic Pattern

Page 13: Starting Your DevOps Journey – Practical Tips for Ops

Client Center sees a peak of about 3,800 Req/min against the it’s API.

Client Center Daily Traffic Pattern

Page 14: Starting Your DevOps Journey – Practical Tips for Ops

Client Center sees a peak of about 3,800 Req/min against the it’s API.

60 unique calls/functions that make up the Client Center API

Client Center Daily Traffic Pattern

Page 15: Starting Your DevOps Journey – Practical Tips for Ops

~20% of that traffic is ClientCenter/API/Holdings

Client Center Daily Traffic Pattern

Page 16: Starting Your DevOps Journey – Practical Tips for Ops

~20% of that traffic is ClientCenter/API/Holdings

~20% of that traffic is ClientCenter/API/ClientDetails

Client Center Daily Traffic Pattern

Page 17: Starting Your DevOps Journey – Practical Tips for Ops

~20% of that traffic is ClientCenter/API/Holdings

~20% of that traffic is ClientCenter/API/ClientDetails

~20% of that traffic is ClientCenter/API/RecentSearch

Client Center Daily Traffic Pattern

Page 18: Starting Your DevOps Journey – Practical Tips for Ops

Typical Peak Hour If you’re not careful, it could look like this…

Rhythmatic peaks and valleys suggest “lock-step” scripts (all virtual users start and end at the same time.)

PRD usage is much more “fluid”. Steady stream and balance across transaction usage

Total sum of traffic load was met. However, correct ratio of key transactions were not met.

Leveraging PRD data to tune QA Load Tests

Page 19: Starting Your DevOps Journey – Practical Tips for Ops

Normal Production Distribution Failed Load Test Distribution

Black: Overall application load and peak volume Percentile breakdown of fast, warning, slow txs

VS.

Performance Differences Before and After Release

Page 20: Starting Your DevOps Journey – Practical Tips for Ops

Occurrences of slow AccountList Transactions from load testingDistribution of “yellow” transactions for that time

AccountList makes up most of these transactions.

Normal distribution of “expected” slow transactions for this API function.

Distribution generated from load test. New code would greatly increase the occurrences of slow transactions in production!

What is making up all that yellow?

Page 21: Starting Your DevOps Journey – Practical Tips for Ops

Detection Load Distribution and Deployment HotspotsOverall Load Distribution by SLA

Very Slow, Slow, Med, FastTip: Logarithmic Y-Axis

Finding #3:Server #3 only gets

load at certain times!Finding #2a:Server #1 was put back

in rotation HERE

Finding #2b:Server #2 saw less

errors once #1 was up

Finding #1:Response Time Spikes at certain times not related

to load!

Validate Load BalancingTip: Load per Server!

Validate Load BalancingTip: Load per Server!

Validate Load BalancingTip: Load per Server!

Page 22: Starting Your DevOps Journey – Practical Tips for Ops

Detection Load Distribution and Deployment HotspotsRequests by App Server:

Tip: Percentage Bar ChartThread Usage:

Tip: Pool Size + Actual Use

Same for Web ServerSame for Web Server Transfer RateIdentify “heavy hitters”

Resource UtilizationTip: CPU, Memory, I/O …

Page 23: Starting Your DevOps Journey – Practical Tips for Ops

Detecting Resource Regression Hotspots

Time of Deployment

Other Resources: Bytes Transferred, Disk I/O, # of Log Messages, # of Open Connections, # of Calls …

Page 24: Starting Your DevOps Journey – Practical Tips for Ops

Detecting Error Hotspots under Load

# of Exceptions

Response Time

Load by Endpoints

Page 25: Starting Your DevOps Journey – Practical Tips for Ops
Page 26: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Hotspot Detection under Load

My Favorite: Layer Breakdown ChartWith increasing load: Which LAYER

doesn’t SCALE?

Load (= # of Requests)

Page 27: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Availability Root Cause Detection

Web Performance Optimization Automated

List of root cause explanations for SLA violations

Page 28: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Baselining per Business Transaction

Response Time Baselines based on 50th & 90th Percentile

Smart Alerting based on Significant Measurement Violation

Direct link to Layer Breakdown and Method Hotspot!

Page 29: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Anomaly and Root Cause Detection

Automatic Anomaly Detection Automatic Root Cause Information

Automatic Impact Details

Page 30: Starting Your DevOps Journey – Practical Tips for Ops

Summary: Capabilities to Get AnswersThrough Synthetic Monitoring: Are our applications up & running?Availability, Response Time, CDN, Geo, …Content Size and Content Validation

Through Endpoint Monitoring: What are the real load patterns?Bucket by Response Time (Fast, Medium, Slow, Very Slow ...)Bucket by Status Code (HTTP 2xx, 3xx, 4xx, 5xx, ...)

Through System Monitoring: What is the resource consumption?CPU, Memory, Network and I/O

Through Basic Application Monitoring: Where to start optimizing?Top Exceptions & Log Messages; # Thread (Idle, Busy)Memory by Heap Space, Garbage Collection ActivityExecution Hotspots by Component

Page 31: Starting Your DevOps Journey – Practical Tips for Ops

Which services do we actually host?

What is the health state of every component?

What are the dependencies?

What impacts the interconnected system health?

Questions to Answer!

Page 32: Starting Your DevOps Journey – Practical Tips for Ops

Agent-Based Monitoring & Tracing: Bridging Enterprise and New Stack

From Mobile

Via Middleware

To Mainframe

And Services

To SQL / NoSQL

To SQL / NoSQL

To SQL / NoSQL

To External Services

Page 33: Starting Your DevOps Journey – Practical Tips for Ops

Analyzing Inter Tier Impact#1: Load Spike

Direct correlation with # of SQL queries -> OK!

#2: Same Load SpikeDirect correlation with # of

Exceptions -> OK!

#3: Starting with Load SpikeTime spent in JDBC (blue) stays very high -> NOT OK!

#4: Problem SolvedIssue on Oracle Server

caused all SQL to be slow

Page 34: Starting Your DevOps Journey – Practical Tips for Ops

Health State and Impact of Database!

DB-Related Blogs from Sonja: https://www.dynatrace.com/blog/author/sonja-chevre/

Page 35: Starting Your DevOps Journey – Practical Tips for Ops

Proper Connection Pool Sizing!Do we have enough DB

CONNECTIONS per pool?

Page 36: Starting Your DevOps Journey – Practical Tips for Ops

Detecting Database Impact on Message Processing

#1: Cluster Failover Event

#2: System Struggled but managed load

#2: System Struggled but managed load

#3: DB Index Job with MAJOR impact on End Users

Page 37: Starting Your DevOps Journey – Practical Tips for Ops

@ Dynatrace: Service Tier Monitoring#3: Queue Sizes

#1: Cassandra Health

#2: Cassandra Health

#1: Overall Tier Health

#4: Error States

Page 38: Starting Your DevOps Journey – Practical Tips for Ops

What’s lurking under the water of the iceberg?

Page 39: Starting Your DevOps Journey – Practical Tips for Ops

What is the cause of all performance problems?

Page 40: Starting Your DevOps Journey – Practical Tips for Ops

40

Red wave of death appears on dashboard.

Conference Bridge/Crisis Center call with lots of “Smart Guy Correlation”

Application recovers.

Triaging w/o anomaly detection on app dependencies

Page 41: Starting Your DevOps Journey – Practical Tips for Ops

App1

Web

AppSvc

MB

EntSvc

DB

App2

Web

AppSvc

MB

EntSvc

DB DB

EntSvc

MB

App3

Web

AppSvc

App4

Web

AppSvc

MB

EntSvc

DB

App5

Web

AppSvc

MB

EntSvc

DB

41

DCRUM – True enterprise monitoring

Page 42: Starting Your DevOps Journey – Practical Tips for Ops

App1

Web

AppSvc

MB

EntSvc

DB

App2

Web

AppSvc

MB

EntSvc

DB DB

EntSvc

MB

App3

Web

AppSvc

App4

Web

AppSvc

MB

EntSvc

DB

App5

Web

AppSvc

MB

EntSvc

DB

42

DCRUM – True enterprise monitoring

Page 43: Starting Your DevOps Journey – Practical Tips for Ops

43

DCRUM – True enterprise monitoring

Page 44: Starting Your DevOps Journey – Practical Tips for Ops

44

App1 App2 App5App4App3

Web Web Web

Svc1

WebWeb

DB1

EntSvc2

DB2

ENTSvc1

MB

Svc2 Svc4Svc3

DCRUM – True enterprise monitoring

Page 45: Starting Your DevOps Journey – Practical Tips for Ops

45

DB1

EntSvc2

DB2

ENTSvc1

MB

Svc2 Svc4Svc3

DCRUM – True enterprise monitoring

Page 46: Starting Your DevOps Journey – Practical Tips for Ops

46

DB1

EntSvc2

DB2

ENTSvc1

MB

Svc2 Svc4Svc3

DCRUM – True enterprise monitoringSuccessful application dependency monitoring will allow you to take a “bottom-up” approach to monitoring your enterprise.

Page 47: Starting Your DevOps Journey – Practical Tips for Ops

“Bottom-up” Service View

Client Group 1, Servers A-D

Client Group 2, Servers E-H

Client Group 3, Servers I-L

Client Group 4, Servers M-Q

Client Group 5, Servers R-S

Different Apps and services exercise enterprise services and databases in varying ways!

Lack of load from these peers against this service

Poor performing node in this clientgroup

Page 48: Starting Your DevOps Journey – Practical Tips for Ops

48

Link to the appropriate heat map

Alert sent based on deviation of calculated baseline

Baseline alerting granularity down the operation level, not just the Software Service

Delivering this data as actionable alerts

Page 49: Starting Your DevOps Journey – Practical Tips for Ops

Usage and application behavior vary day-to-day. A rolling average of services is not good enough

One week application usage trendMonday Tuesday Wednesday Thursday Friday

The need for seasonal baselining

Page 50: Starting Your DevOps Journey – Practical Tips for Ops

To achieve deeper statistical capabilities, we use a combination of the PureLytics stream and DCRUM REST interface to pour data into analysis tools.

This allows us to reach back several weeks, on a single minute for the given day (e.g. Monday at 10:03am compared to the last 5 Mondays at 10:03am) to calculate our baselines. For every unique operation in or enterprise (25k+ recorded). That is a great deal of data!

Dynatrace performance metrics streaming

Page 51: Starting Your DevOps Journey – Practical Tips for Ops

By reaching that far back at granular 1-minute intervals, you can be very confident with the validity of your baseline values

A 50ms-150ms deviation may not seem like a huge deal – but in the world of app dependency monitoring, it truly is!

Graphical View of deep seasonal baselining

Page 52: Starting Your DevOps Journey – Practical Tips for Ops

Service 1 needs to call Service 2 multiple times. If service 2 slows down, it has an enormous impact on all upstream services.

150ms shift in service 2 causes Service 1 to shift from 200ms-2s

Service 1

Service 2

Upstream impact of dependencies

Page 53: Starting Your DevOps Journey – Practical Tips for Ops
Page 54: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Full Stack Monitoring#1: All your Technologies #2: All Key Metrics

#3: Physical, Virtual, Containers or Cloud

Page 55: Starting Your DevOps Journey – Practical Tips for Ops

Smartscape: Real Time Service-Oriented CMDB

#1: Understand WHO talks with WHOM?

#2: Where are tiers deployed?

#3: WHO might be impacted by a failure?

Page 56: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Service Flow Tracing#1: Understanding

Flow#2: Dependencies between Service

#3: Service Clustering

Page 57: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Architectural Pattern Detection

#1: Action initiated by the SPA (Single Page

App)

#2: SPA was making 3 AJAX Calls in total!

#3: One of the calls makes 13! Backend

REST Calls to external system on 13 asynchronous

threads

Page 58: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Problem Pattern Detection

#1: Select Top Common

Problem Patterns

#1: Explore which

transactions have this and

other problems

Page 59: Starting Your DevOps Journey – Practical Tips for Ops

Automating Anomaly Detection

#1: All Root Cause

Information „encapsulated“

into a single Problem

#2: “Time-Lapse” of Problem Evolution

#3: All relevant Events: Infra, Logging, App,

Service, End User …

Page 60: Starting Your DevOps Journey – Practical Tips for Ops

Automatic Integration with ChatOps

Page 61: Starting Your DevOps Journey – Practical Tips for Ops

Summary: Capabilities to get answersThrough Automatic Dependency DetectionWhich services hosted by which processes?Where do these processes run?

Through Component MonitoringKey metrics from Oracle, SQL, DB2, MySql, PostgresThroughout on your Message Broker / Bus, Firewalls / Proxies

Through End-to-End TracingWhich Services are depending for end-to-end use cases?Where are our bottlenecks? How to optimize Deployment and archtiecture?

Through Anomaly DetectionWhich tiers are acting out-of-the norm after an update or under certain load?Who is impacted when one tier has an issue?Where to look for the real root cause when a service goes down?

Page 62: Starting Your DevOps Journey – Practical Tips for Ops

Promise of DevOps: Faster & Efficient Innovation

Smaller Apps, Micro-Services More Deployments

App-, Service- & End-User Feedback Loops

Happy Users

Lower Costs

+

-

Page 63: Starting Your DevOps Journey – Practical Tips for Ops

Basic App Monitoring1

App Dependencies2

End User Monitoring3 How to monitor mobile vs desktop vs tablet vs service endpoints?How much network bandwidth is required per app, service and feature?Where to start optimizing bandwidth: CDNs, Caching, Compression?

Are our applications up and running?What load patterns do we have per application?What is the resource consumption per application?

What are the dependencies between apps, services, DB and infra?How to monitor „non custom app“ tiers?Where are the dependency bottlenecks? Where is the weakest link?

DevOps Monitoring Maturity: What we covered today?

“Soft-Launch” Support4

Virtualization Monitoring5 How to automatically monitor virtual and container instances?

What to monitor when deploying into public or private clouds?

How to deploy and monitor multiple versions of the same app / service?What and how to baseline?Do we have a better or worse version of an app/service/feature?

Ops: Need answers to these questions! Closing the gap to AppBizDev

Ready for “Cloud Native”

How to alert on real problems and not architectural patterns?How to consolidate monitoring between Cloud Native and Enterprise?

Who is using our apps? Geo? Device?Which features are used? Whats the behavior?Where to start optimizing? App Flow? Page Size?Conversion Rates? Bounce Rates?

Where are the performance / resource hotspots?When and where do applications break?

Do we have bad dependencies through code or config?How does the system really behave in production?What to learn for future architecturs?

What are the usage patterns for A/B or Green/Blue?Difference between different versions and features?

Does the architecture work in these dynamic enviornments?Does scale up/down work as expected?

Provide „Monitoring as a Service“ for Cloud Native Application Teams6

Page 64: Starting Your DevOps Journey – Practical Tips for Ops

We have the experience. One of the largest health care

insurance providers in the nation – to DevOps in two weeks One of the largest furniture retailers in

the United States – to DevOps in two weeks

Page 65: Starting Your DevOps Journey – Practical Tips for Ops

We have a proven approach--The DevOps Xcelerator Outline your digital performance

management (DPM) strategy Build on what you already have Implement DPM to support DevOps Validate your success

DPM Vision & Strategy

Discovery & Planning

Implementation

Validate Success

Identify DPM goals that guide your implementation strategy in alignment with business objectives.

Ask the right questions. Collect the information. Assemble required resources. Create your

implementation plan.

Follow the Dynatrace Expert Services (DXS) implementation framework to successfully execute

your implementation plan.

Track, measure and report progress towards your DPM goals so that your digital performance

investments add increasing value to the business.

Page 66: Starting Your DevOps Journey – Practical Tips for Ops

66

Q & ABrian ChandlerSystems Engineer @ Raymond James@Channer531

Andreas GrabnerChief DevOps Activist @ Dynatrace@grabnerandi

Action Items for you!Try Dynatrace SaaS: http://bit.ly/dtsaastrialTry Dynatrace AppMon On Premise: http://bit.ly/dtpersonalList to our Podcast: http://bit.ly/pureperf Read more on our blog: http://blog.dynatrace.com

Page 67: Starting Your DevOps Journey – Practical Tips for Ops