topic: observability platforms

16
EMA Top 3 Enterprise Decision Guide 2021 Topic: Observability Platforms Data-Driven Guidance for Product Evaluation in DevOps, SRE, IT Operations, and Business Q3 2021 EMA Top 3 Report By Torsten Volk

Upload: others

Post on 07-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic: Observability Platforms

EMA Top 3 Enterprise Decision Guide 2021Topic: Observability PlatformsData-Driven Guidance for Product Evaluation in DevOps, SRE, IT Operations, and Business

Q3 2021 EMA Top 3 ReportBy Torsten Volk

Page 2: Topic: Observability Platforms

Table of Contents 3 EMA Top 3 Awards

4 The Five-Step Product Selection Process

5 Product Selection Criteria

6 The Cloud-Native Universe: Choice Brings Complexity

7 CNCF: Metrics

8 Learning From Real-Life Failure

8 Failure Impact Across all 52 Cases

9 Zooming Out: 41,558 Challenges From the Cloud-Native Universe

10 Observability as the #1 DevOps Challenge

11 Harnessing the Value of Technology by Mastering Complexity

12 Five Most Frequently Asked Questions About Observability

13 Market Analysis: Observability

Page 3: Topic: Observability Platforms

. 3

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

EMA Top 3 Awards

EMA Top 3 Products Help Enterprises Shatter the Vicious Cycle Between Speed, Quality, Cost, and Innovation

EMA Top 3 Awards What are the EMA Top 3 Awards?Enterprise Management Associates (EMA) presents its EMA Top 3 Awards to software products that help enterprises reach their digital transformation goals by optimizing product quality, time to market, cost, and ability to innovate.

Personas and PerspectivesEMA derives its Top 3 product categories from today’s critical pain points expe-rienced by software developers, DevOps professionals, site reliability engineers (SREs), IT operators, data scientists, and business staff belonging to enterprises of any size and industry.

Data-Driven ResearchEMA Top 3 Awards are based on the consolidated analysis of a combination of real-life project data, data obtained through daily briefings and end-user interaction, and additional data collected from public sources to optimally understand customer requirements.

You should read this report if… …you want to learn from the successes and failures of your peers.

…you require hard data on market trends in DevOps, IT operations, and busi-ness technology.

INN

OV

AT

ION

SPEED 01

02

04

03

QU

AL

ITY

COST

Page 4: Topic: Observability Platforms

. 4

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

The Five-Step Product Selection Process

The purpose of the EMA Top 3 decision guide is to present the reader with products that address the key business requirements and pain points in 2021. The EMA selection process follows five key steps.

1. Empirical: EMA identifies the specific key customer pain points for each one of the top challenges in DevOps, SRE, IT operations, and busi-ness in 2021.

2. Strategic: EMA evaluates how each product addresses the key pain points identified in step 1 and how it aligns with today’s most relevant technology trends.

3. Innovative: This criteria rewards products for breaking with legacy constraints in order to provide customers with truly innovative solutions.

4. Customer-Centric: EMA Top 3 Awards recognize a product’s radical focus on customer requirements instead of marketing an existing prod-uct as something new.

5. Specific: EMA Top 3 Award-winning products address quantifiable cus-tomer pain points.

Please treat these EMA Top 3 vendor recommendations as a starting point to inform your product selection process and overall digital transformation strat-egy. While this report can provide valuable data-driven insights, it aims to inform, not replace, your own due diligence process.

The Five-Step Product Selection Process

How well are the current product capabilities aligned with the requirements derived from the empircal dataset and the recommendations created in dialogue with customers and partners.

1. Emperical

When selecting IT vendors, there are often elements to consider that can dramatically increase the value of an IT solution by aligning with today’s key IT trends.

2. Strategic

EMA Top 3 products must address the critical enterprise customer pain points revealed in this report.

5. Specific

The EMA Top 3 report aims to point out true customer centric innovation that is unique in the

market place. This can be due to a new architecture that frees customers from legacy constraints or it

could simply be due to a reorientation of a traditional vendor that will benefit customers in the near future.

3. Innovative

EMA awards dramatic strategic realignment of vendors that required a significant leap of faith in return for “doing the right thing” for customers.

4. Customer Centric

Page 5: Topic: Observability Platforms

. 5

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Product Selection Criteria

Real-life use cases are the crucial link between PowerPoint and how products work in the wild. This report relentlessly focuses on identifying and cluster-ing use cases based on direct customer observations and on the analysis of quantitative project data. While EMA aggregates and anonymizes all customer-specific data points, the EMA Top 3 evaluation process is based on actual customer problems.

Instead of exclusively relying on EMA survey data and research notes, EMA created a data framework that enables EMA analysts to directly analyze project bottlenecks and enterprise pain points by looking at real-life project artifacts. The example on the right shows an extract of the analysis of 41,558 Kubernetes-related developer problems posted to the Stackoverflow support forum. From this specific evaluation, EMA received a series of technology clusters with a high probability of being involved in production issues of different types and severity levels. By no means does this result reveal that these technology com-binations should be avoided in your next project. Instead, it helps EMA define problem areas that require further examination and deserve some additional questioning by enterprises selecting product vendors.

EMA Top 3 Product Awards - Reward for Addressing the Difficult Problems Each EMA Top 3 Award-winning product has demonstrated its direct focus on addressing today’s key pain points for software developers, DevOps teams, SREs, IT operators, and business professionals.

Example: Learning From Real-Life ChallengesThis simplified heatmap shows the most common problem clusters within Kubernetes-based environments. Starting at the top, we can learn that chal-lenges around Helm, the Kubernetes package manager, occur within the context of “ingress control” (1) and “yaml definitions”(2), with “pod manage-ment” (3), “the use of minikube” (4), and “Nginx” (5) playing a significant role. This empirical research approach helps focus the EMA Top 3 product awards on actual real-life customer pain points in 2021.

Clustering Real-Life Cloud Native Issues

This heatmap is based on the 41,558 most recent Kubernetes-related developer posts on the Stackoverflow support forum.

Product Selection Criteria

dockergooglekubernetesengine

kuberneteshelm

kubernetesingress

kubernetespod

minikube

nginxpython

springboot

yaml

amazoneks

amazonwebservices

azure

azureaks

containers

docker

googlecloudplatform

googlekubernetesengine

java

kuberneteshelm

0

50

100

150

200

1 23 4 5

Page 6: Topic: Observability Platforms

. 6

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

The Cloud-Native Universe: Choice Brings Complexity

This chart shows the full extent of cloud-native choice, in the form of all 936 products from the CNCF landscape, available to product teams to assemble their applications. The fact that teams can select one or more components from each product category brings the flexibility needed to optimize devel-oper productivity while providing DevOps engineers with the required governance and control to ensure cost-efficiency and continuous compliance. The fact that different product teams make different choices and often the same team leverages different compo-nents to create their stack for different projects creates a level of entropy that is hard to control with traditional skill-sets, processes, and tools.

The Cloud-Native Universe: Choice Brings ComplexitySecurity,

Compliance, and Automation Observability

Data, Application, and DevOps

Runtime

Orchestration and Management

Container Platform

EMA Top 3 Award-winning products help customers leverage their preferred cloud-native products in a well governed, integrated, and automated manner.

Page 7: Topic: Observability Platforms

. 7

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

CNCF: Metrics

CNCF: MetricsThese metrics belong to the CNCF product overview chart from the previ-ous page and aim to provide readers with a high-level overview of the degree of choice and complexity attached to modern distributed and cloud-native applications.

Total Venture Funding:

$601,889,361,536

Total GitHub Stars:

2,571,835

Total Number of Organizations

Involved: 779

Total Number of Annual

Code Commits: 484,198

Total Individual Contributors:

104,285

Total Number of Products:

936

Number of Development

Languages Used: 28

. 7CNCF: Metrics

Page 8: Topic: Observability Platforms

. 8

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Learning From Real-Life Failure

Failure Impact Across all 52 CasesForty-eight percent of this specific set of incidents were full production outages, 18% were less disruptive Kubernetes cluster outages, and the remaining 34% con-stituted a mix of operational inconsistencies and deploy-ment failures.

Analyzing FailuresDrilling further into the data shows that 45% of the issues are directly related to public cloud technologies (pink) with the remaining 55% shared between secu-rity, availability, compute, data, deployment tech-nologies, network, and performance topics.

EMA Top 3 products help enterprises address the underlying challenges of these issues in a cost-efficient manner while optimizing the end-user experience.

Examining the failures of your peers can offer with valuable lessons for your own decision-making processes, without having to endure the pain of a real-life product failure. The following chart consists of quotes from 52 cloud-native application failures within an enterprise context. These incidents reach all the way from failed code deployments with minimal user impact to major produc-tion outages affecting most or all internal and external user groups.

Quotes on the Impact of 52 Kubernetes Production Failures

Learning From Real-Life Failure

avai

labi

lity

com

pute

data

deployment

network

performance

publiccloud

security

gcp

ingr

ess

wea

ve sc

ope

aws

aws spot in

stances

gke

kiam

gke

nginx ingress

cpu limitgke

haproxy-ingress

kiamkops

aws

aws eks

aws iam

aws spot instances

azure

eks

google kubernetes enginekiam

weave scope

weave scope

gke

clus

ter

publ

ic a

ws e

lb

kops

kops

cluster autoscaler

dns

conntrack

network interrupts

cpu throttling

cpu limitjobsservice vipsdns

cpu limit

dns

kops

kube-a

ws

nginx

notready nodes

snat

kubelet

kops

dns

aws cni plugin

scheduler

dns

public aws elb

public aws elb

node

s

(NA)

large c

luste

rs

centos

hpa

aws iam

haproxy

conntrack

dnscpu throttling

overload

golang templating

aws iam

cpu throttling

coredns

elbha

prox

ylarg

e cl

uste

rs

elb d

ynam

ic ip

s

con�gmap change

systemoom

conntrack

kube2iam

centos

ndots:5

anti-a�nity rules

aws iam

(NA)

(NA)

(NA)

(NA)

batch

jobs i

nfrastr

ucture

container-selin

ux

alias ip vpc (vpc native)

latency

(NA)

frozen cronjob

(NA)

(NA)

(NA)

(NA)

latency(NA)

oomkill

back

endc

onne

ctio

nerr

ors

liven

essp

robe

batc

h jo

bs in

frast

ruct

ure

api s

erve

r

(NA)

helm

amazon vpc cni plugin

--kube-api-qps

container-selinux

alpine musl libc

(NA)

(NA)

latency

(NA)

(NA)

Source: codeberg.org, hjacobs/Kubernetes-failure-stories, Aug 5, 2021

0% 10% 30%20% 40% 50%

Deploy

Error

High latenc

Cluster outag

Product outag

Technologies Involved in Kubernetes Failures

Page 9: Topic: Observability Platforms

. 9

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Zooming Out: 41,558 Challenges From the Cloud-Native Universe

Zooming Out: 41,558 Challenges From the Cloud-Native Universe These challenges are divided into 488 cat-egories and 1,506 subcategories captur-ing the full spectrum of cloud-native devel-oper challenges.

The tree map is based on 41,558 Kubernetes-related developer posts on the Stackoverflow support forum.

Page 10: Topic: Observability Platforms

. 10

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Observability as the #1 DevOps Challenge

Observability as the #1 DevOps ChallengeObservability, availability, and security are the three primary DevOps challenges today. This is a direct result of the rapidly increasing complexity introduced by the adoption of mostly autonomous product teams, distributed application archi-tecture, and a hybrid multi-cloud operating model. SREs are often scrambling to find the information required to manage operational risk, since organizations typically do not have a unified approach toward collecting, processing, and stor-ing logs, metrics, and traces across the stack in an application-centric manner.

50% Estimated daily time spent by software engineers on overhead tasks.

2016 2017 2018 2019 2020 2021

availability

observability

security

-14.57%-10.24%

6.69%11.81%

52.76%

42.13%

105.51%

198.43%201.57%

255.12%251.57%

235.43%

311.81%316.93%

410.63%

362.6%409.84%

425.59%

35.76%49.31% 56.6%87.85%

177.43%

224.31%

307.99%

604.17%647.57%

743.4%

720.83%

828.47%

907.99%

1,029.51%

1,197.57%1,228.13%

1,169.79%1,225.69%

-2.02%41.41%

114.14%

171.72%190.91%

188.89%229.29%

486.87%581.82%

668.69%

526.26%

665.66%

678.79%

894.95%

1,045.45%1,053.54%

913.13%

1,077.78%

146.06%

427.43%

291.92%

10.24%41.41%

0%

200%

400%

600%

800%

1000%

1200%

175.69%

Page 11: Topic: Observability Platforms

. 11

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Harnessing the Value of Technology by Mastering Complexity

The fact that simple changes to the application stack are the number-one root cause for the failure of cloud-native applications demonstrates that organiza-tions in 2021 are still struggling to master complexity in software development, DevOps, and IT operations. EMA Top 3 products enable enterprises to better harness existing and new software technologies in a cost-effective and com-pliant manner. Policy-driven software development, operations management, scalability, and unified operations of application infrastructure across data centers and public clouds are the key requirements for achieving this goal. The bottom chart quantifies the overall complexity increase in cloud-native tech-nologies between 2017 and 2021 by showing how a 343% increase in the number of individual technologies resulted in a 527% increase in the number of tech-nology combinations that were part of application problems.

Cloud-Native Complexity Increase Measured by the Number of Unique Technologies (Orange) and by the Number of Technology Combinations (Blue)

The number of distinct cloud-native technologies (orange) increased from 478 to 1,733 (343%) between 2017 and 2021, while the number of cloud-native technology combinations (blue) increased from 826 to 4,349 (527%) within the same timeframe.

Harnessing the Value of Technology by Mastering Complexity

Jan2017

Jul2017

Jan2018

Jul2018

Jan2019

Jul2019

Jan2020

Jul2020

Jan2021

0

500

1000

1500

2000

2500

3000

3500

4000

4500

826

1,678

2,922

3,937

4,349

464768

1,0901,371 1,485

Public Cloud Complexity Increase Measured by the Number of Unique Technologies (Orange) and by the Number of Technology Combinations (Blue) Related to AWS, Azure, and GCP

The number of distinct public cloud services (blue) increased from 1,475 to 3,689 (250%) between 2015 and 2021, while the number of combinations of public cloud services (orange) increased from 2,366 to 8,129 (344%) within the same timeframe.

The top chart shows the 250% increase in the number of different public cloud technologies, which corresponds to a 344% increase in the number of overall public cloud technology combinations between June 2015 and June 2021 (source: Stackoverflow).

In complex distributed environments, any configuration change or change to the software code can lead to downtime and performance degrada-tion. Anytime changes of any kind are made to a complex system, such as a microservices application, there is a risk of immediate or delayed negative impact on the application itself or on other applications.

5 Top Causes of Application Downtime and Performance Degradation

31% 11%37% 16% 5%

Changes Code Bugs

Migrations Security Upgrades

2015 2016 2017 2018 2019 2020 2021

Individualtopics

TopicCombinations

1,366 1,646 1,861 2,132 2,1962,733 2,8492,366

3,2704,510

5,318 5,730

8,453 8,129

Page 12: Topic: Observability Platforms

. 12

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Five Most Frequently Asked Questions About Observability

Five Most Frequently Asked Questions About ObservabilityQ: What is the difference between monitoring and observability?A: Observability tracks relationships and dependencies between applications, microservices, and infrastructure in order to enable developers, IT operations, and DevOps to understand and optimize performance, cost, and reliability of production applications.

Q: What is telemetry versus observability?A: Telemetry data can consist of logs, traces, and metrics emitted from appli-cations. Observability ingests, consolidates, normalizes, and analyzes this telemetry data in order to continuously optimize an application.

Q: Who is responsible for establishing observability?A: Operators and developers need to collaboratively implement observability into the application stack. This means overcoming the still prevalent practice of both personas using different platforms for gaining application visibility.

Q: What is observability engineering?A: Observability engineering aims at building software systems that are easy to understand, debug, optimize, operate, and enhance.

Q: How do we achieve full-stack observability?A: Full-stack observability provides real-time visibility for development and oper-ations personas to understand the impact of any component that is part of the application stack on application performance. Achieving full-stack observabil-ity requires the consolidation, normalization, and analysis of all logs, traces, and metrics across the application stack, including their contextual dependencies.

Distributed Application

Observability Platform

Metrics

Logs

Traces

Events

Multiple clouds and

data centers

Multiple microservices

Multiple application

stacks

Distributed App

Optimization Actions Analysis Normalization Consolidation

Continuously optimize the system

Page 13: Topic: Observability Platforms

. 13

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Market Analysis: Observability

EMA Research Facts

Market Analysis: Observability

EMA Quick TakeThe term “observability” was coined in the early 1960s by Rudolf Emil Kalman, a Hungarian-born mathematician seeking to explain how automated control sys-tems work, only by observing their output. In 2021, “observability” has become one of the key pain points and requirements in cloud-native computing due to the struggles experienced by organizations when attempting to control and opti-mize the performance, reliability, and cost of their distributed applications.

Observability platforms enable developers, operators, DevOps engineers, SREs, and security personas to directly connect the dots between end-user experience, application performance, and the underlying code, data, and infrastructure in data centers and the public cloud. The ability to seamlessly drill down from a big picture view of the overall application and infrastructure topology into the details of how individual infrastructure components and code functions impact the end-user experience constitutes the core capability of observability plat-forms. The rapid adoption of a cloud-native distributed application architecture and the resulting complexity of semi-autonomous teams developing and man-aging individual microservices across the corporate data center and different clouds have accelerated the growth of the market for observability platforms.

Business Problems SolvedDeveloper, DevOps, and SRE productivity

High reliability

Cost-performance optimization

Unified management for cloud-native and monolithic apps

Rapid recovery

Minimize release and operations cost

#1 Observability is the number-one DevOps challenge in 2021

100% YoY Increase in observability-related developer challenges

50% Estimated daily time spent by SREs on finding decision-relevant data

Our SREs are often spending 50% of their time on fact-finding missions. Addressing this inefficiency is one of our key goals for 2021 and beyond.

VP Cloud Engineering, Automotive Parts Manufacturer

Quote From the Trenches

Page 14: Topic: Observability Platforms

. 14

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Market Analysis: Observability

Market Size

Incumbents: 106

Funding: $78B

VC Funding: $1B

Revenue: $10B

Employees: 125,000

Open Jobs: 4,432

GitHub Stars: 443,890

GitHub Contr.: 12,241

Market Growth

Competition

Outlook

Productivity ImpactDevOps and SRE

• Continuous access to all required context data

• Security and compliance across the DevOps pipeline

Software Developers

• Rapid debugging of distributed apps

• Continuous compliance and security

IT Operations

• Automatic dependency and topology tracking

• Full stack app management across data center and cloud

Very High

Very High

High

smal

l

slow fast hot

none

a

verage heating up intense

Page 15: Topic: Observability Platforms

. 15

EMA Top 3 Report | EMA Top 3 Enterprise Decision Guide 2021

Market Analysis: Observability

Market Segment:Automatic End-to-End ObservabilityChanges to the application stack, code, and release pipeline are the three key reasons for performance degradation and downtime in cloud-native apps and traditional enterprise applications alike. EMA Top 3 award-winning applications in the “auto-matic end-to-end observability” segment capture these changes in near-real time, at full resolution, and without requiring manual instrumentation, in order to provide:

1. Business-driven production insights

2. Targeted alerts with problem context

3. Automatic root cause analysis

4. Monitoring across application environments

These components lead to better alignment between IT and business through the ability of tuning optimization and resolution actions to opti-mize specific sets of business KPIs.

EMA Top 3 Award

Business Impact

Why IBM Observability by Instana Received the EMA Top 3 AwardInstana received the EMA Top 3 Award for the platform’s ability to automatically discover and moni-tor cloud-native and traditional application stacks within the context of their orchestration platform (typically Kubernetes), and the underlying data center or cloud infrastructure. Instana’s reinforcement learning models continuously learn to watch out for issues similar to the ones that were detected within a comparable context in the past. Instana automatically discovers new applications simply by develop-ers adding standard configuration code to their Git repository that enables the platform to automatically place, configure, and manage the required agents in order to ensure comprehensive observability.

We are set in our ways of guessing what the business needs and how we prioritize developer and opera-tor tasks to get there. Sometimes we are right and sometimes we are wrong, since in the end, we are just taking our best guess.

Development Lead, Global Financial Institution

Enhance developer and SRE productivity

Decrease MTTR

Lower operational risk by continuously optimiz-ing the application stack

Proactive issues resolution and application optimization

Complete observability for traditional and cloud-native apps across data center and cloud

Empowering developers and operations staff to handle complex cloud-native applications inde-pendently of where they currently run

Automatic End-to-End ObservabilityMachine Learning-driven Instant Application Insights for Dev, Ops, SRE, and Security

. 15EMA Top 3 Award: Instana

Page 16: Topic: Observability Platforms

About Enterprise Management Associates, Inc.Founded in 1996, Enterprise Management Associates (EMA) is a leading industry analyst firm that provides deep insight across the full spectrum of IT and data management technologies. EMA analysts leverage a unique combination of practical experience, insight into industry best practices, and in-depth knowledge of current and planned vendor solutions to help EMA’s clients achieve their goals. Learn more about EMA research, analysis, and consulting services for enterprise line of business users, IT professionals, and IT vendors at www.enterprisemanagement.com. You can also follow EMA on Twitter or LinkedIn.

This report, in whole or in part, may not be duplicated, reproduced, stored in a retrieval system or retransmitted without prior written permission of Enterprise Management Associates, Inc. All opinions and estimates herein constitute our judgement as of this date and are subject to change without notice. Product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. “EMA” and “Enterprise Management Associates” are trademarks of Enterprise Management Associates, Inc. in the United States and other countries.

©2021 Enterprise Management Associates, Inc. All Rights Reserved. EMA™, ENTERPRISE MANAGEMENT ASSOCIATES®, and the mobius symbol are registered trademarks or common law trademarks of Enterprise Management Associates, Inc.

1995 North 57th Court, Suite 120, Boulder, CO 80301 +1 303.543.9500 www.enterprisemanagement.com 4113.091421