how servicenow delivers high- performance business

13
How ServiceNow delivers high- performance business services with AIOps Cut through the noise and fix service degradations faster with the power of artificial intelligence START

Upload: others

Post on 27-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

1

How ServiceNow delivers high-performance business services with AIOpsCut through the noise and fix service degradations faster with the power of artificial intelligence

START

2

Digital transformation demands unprecedented business service qualityAs digital transformation accelerates, organizations are becoming increasingly dependent on IT. More than ever before, applications and business services need to deliver performance and high availability. Service degradations and outages have an enormous and unacceptable business impact, compromising customer engagement, business operations, and revenue generation.

IT operations is on the frontline of digital transformation, tasked with the job of meeting demanding availability and performance SLAs for an increasing number of applications and business services. At the same time, IT operations is expected to be an equal partner in business innovation, imposing additional demands on its time. And, IT operations is often asked to do this without growing budgets or resources.

It’s not just about adding more applications and business services. These services are also becoming more distributed and interconnected, leading to an explosion in the amount of data and resources to be managed, increasing management complexity. For example, containerized cloud-based architectures create a significant increase in event volumes, leaving already overloaded IT operations teams drowning in event noise.

3

IT operations systems need to become intelligentTo meet ever-increasing business and technology demands, IT operations teams can no longer rely on adding more people. Instead, IT operations management systems need to become intelligent, working hand in hand with IT operations staff to pinpoint service and infrastructure issues, accelerate remediation, and drive service quality improvement.

AIOps, or artificial intelligence for IT operations, is about applying machine learning and analytics to IT operations functions, something ServiceNow has been doing for years. In this ebook, we take a detailed look at how ServiceNow uses machine learning, natural language processing, advanced analytics, service context, learnings from historical data and human behavior, and other intelligent techniques to help you:

• Cut through event noise, creating a clean, actionable signal

• Rapidly identify the cause of service outages and degradations

• Remediate service and infrastructure issues quickly and accurately

• Drive continuous improvement in service quality and operational efficiency

4

Creating a clean signalTraditional event management solutions rely on user-defined rules to filter, deduplicate, and normalize events, turning a large number of events into a smaller number of alerts. This reduces event noise; however, with increasingly distributed, complex, and dynamic business services, rule-based event processing alone is not enough.

Why? When something goes wrong in your IT infrastructure, there are typically many alerts, even when events have been filtered, deduplicated, and normalized. For example, a storage issue can cause a database issue, which in turn leads to an application issue. If you don’t know that these alerts are related, you end up chasing multiple symptoms, rather than focusing on identifying and remediating the root cause. What’s more, it takes even more time to determine which events need to be addressed first based on the criticality of the impacted business services.

ServiceNow® Event Management with AIOps gives you clarity, creating alert groups that contain related alerts. Instead of seeing individual symptoms, you now see a set of symptoms caused by a single underlying issue. While you can define these alert groups manually, ServiceNow Event Management uses the power of machine learning to create groups automatically, identifying patterns in your historical alert data and then using these patterns to group new alerts in real time.

Let’s look at how ServiceNow does this.

Externalmonitoring

sources

Operationaldata sources

Event correlationusing machine Learning

Service HealthDashboard

Servicesimpacted

5

Temporal alert correlationThe first way ServiceNow Event Management creates alert groups is by looking for temporal patterns. When there is an issue in your IT environment, symptoms can appear over time, rather than happening all at once. In some cases, this is because it takes time to detect that upstream and downstream systems are affected—for instance, when a network link issue causes an application’s response characteristics to degrade.

In other cases, the underlying issue becomes worse over time, causing additional symptoms. As a simple example, think about an application that has an alert because its processes are consuming too many system resources. Typically, this may start as a low-priority alert; however, if this condition persists and gets worse, it can lead to the host server running out of memory or overloading its CPU, which creates a critical alert.

ServiceNow uses machine learning to identify these temporal alert patterns, analyzing historical alert data to look for repeating sequences. When it sees the same pattern in your current alerts, it groups these alerts and displays them on a timeline. Not only does this reduce noise, it also dramatically simplifies diagnosis since ServiceNow Event Management allows you to see how the pattern unfolds.

By identifying partial patterns, ServiceNow Event Management can also give a proactive indication of alerts that are likely to occur. This gives you advance warning of potentially serious infrastructure and service issues.

6

Topological alert correlationServiceNow Event Management uses its knowledge of how your IT environment is interconnected to learn how symptoms—and corresponding alerts—propagate across your infrastructure and business services.

How does it do this? Once again, it uses machine learning, along with topological constructs such as clique graphs, to analyze upstream and downstream relationships between CIs in your CMDB. It combines this information with your historical alert data to identify repeating topological alert patterns. An example of this is identifying the upstream symptoms of an unresponsive web server supporting a business service. Upstream CIs that depend on the web server are affected and generate alerts. ServiceNow Event Management understands the relationships between these CIs and uses this information in concert with the arrival time of the alerts to group the alerts.

This topological correlation extends across entire business services. By using ServiceNow Service Mapping to create service maps, your alert correlation becomes intelligent and fully service aware. This helps you prioritize business service issues, simplify diagnosis, and accelerate remediation, reducing the duration and impact of business service outages. And, even if you don’t have service maps, ServiceNow Event Management still uses CI dependencies in your CMDB to group related alerts at the infrastructure level.

7

Refining the model with operator feedback While machine learning excels at identifying patterns, it still needs feedback to improve the model accuracy over time. Operators provide feedback to better qualify the alert grouping.

With ServiceNow Event Management, you can easily add or delete alerts in automatically generated alert groups. ServiceNow learns from these changes and uses this knowledge to modify its future alert grouping behavior. The user can also provide feedback on the usefulness of each alert group—and again, the ServiceNow solution learns from this information.

This feedback is called semi-supervised learning, and it doesn’t require a data scientist to understand it. In fact, it doesn’t require any specialized AI knowledge and skills because it taps into the existing native knowledge of IT operations staff, making it easy to adopt AI technology without extensive reskilling or process changes.

8

Dynamic thresholding and anomaly detection for operational metricsServiceNow Operational Intelligence detects anomalous infrastructure behavior that isn’t captured by raw events. Rather than processing events from monitoring tools, Operational Intelligence ingests raw operational metric data from both on-premise and public cloud environments. It then applies machine learning algorithms to this data, automatically setting dynamic upper and lower thresholds that represent normal behavior adjusted for seasonality. This eliminates the need to manually set and manage thousands of monitored thresholds. Users can also add human insights when appropriate by overriding dynamic thresholds with static thresholds.

Operational Intelligence can also apply anomaly scores to individual infrastructure items using these dynamic thresholds. A high anomaly score indicates that an infrastructure item may be at risk of causing a service outage. A qualified anomaly alert generates an IT alert, which is shown on the alert console and event management dashboard to help with root cause analysis.

9

Intelligent incident routing accelerates incident responseThe faster you respond to service issues, the faster you fix them. But how do you know which service, application, or infrastructure team needs to respond? It’s even more complicated when you have multiple regional support models with a mix of internal resources and external managed service providers. This is a huge issue for many IT operations organizations, with incidents bouncing from team to team until they find the right home. Meanwhile, the clock is ticking.

With ServiceNow, you get issues to the right people at the right time. Simply create an incident from an alert, and ServiceNow does the rest. Because ServiceNow combines IT operations management and IT service management on a single platform, it has full visibility of your incident history and learns from it. Using a combination of machine learning and natural language processing, ServiceNow understands where specific types of incidents were successfully routed in the past and uses this knowledge to automatically route new incidents to the correct support teams.

10

Accelerated root cause analysis and remediation ServiceNow Alert Intelligence uses AIOps techniques to help you isolate the root cause of service outages and to remediate them faster and more accurately. When a new alert occurs, it analyzes your alert history using natural language processing to identify similar alerts that happened in the past. Rather than starting from scratch, you have real-time, contextual access to your accumulated operational knowledge, helping you pinpoint potential reasons for the current alert. This dramatically accelerates diagnosis and helps you shift resolution to the left, offloading your tier 2 and tier 3 support teams.

And, because ServiceNow Event Management works seamlessly with ServiceNow® ITSM, this contextual information isn’t limited to similar historical alerts. The ServiceNow solution uses your operational “big data” to identify and retrieve relevant incidents, problems, and knowledge base articles. This gives your team instant access to potential remediation steps and workarounds, accelerating resolution of services outages and minimizing their business impact.

11

Drive continuous improvementWith ServiceNow Performance Analytics, you can also proactively improve service quality and operational efficiency. Performance Analytics allows you to analyze your historical operational data, identifying areas for improvement. For example, you can:

• Pinpoint recurring service and infrastructure issues, tying these back to factors such as specific applications, time of day, software releases, particular vendors, geographies, and more. By identifying these systemic issues and then resolving them using ServiceNow Problem Management, you prevent future issues and reduce your “keep the lights on” workload.

• Identify process bottlenecks such as long incident resolution times or labor-intensive changes. By uncovering these issues with Performance Analytics, you can take proactive steps to eliminate them—for example, by streamlining process workflows, creating new knowledge articles, or automating specific remediation and change activities.

• Uncover trends before they become major issues. Performance Analytics lets you measure your operational performance by creating key performance indicators (KPIs) and forecasts your future performance based on current and past data. It even identifies specific actions you can take to improve performance and meet your service-level objectives.

12

The bottom lineYour business depends on you to keep mission-critical applications and business services up and running. With digital transformation, you’re being asked to manage more and more applications and business services, and these services are becoming more distributed and complex. To rise to this challenge, you need your operations management systems to become intelligent, working hand in hand with your team to deliver high service availability and performance.

ServiceNow delivers that intelligence, using AIOps techniques such as machine learning, natural language processing, and advanced analytics techniques to help you:

• Cut through event noise, creating a clean signal

• Rapidly identify the cause of service outages and degradations

• Remediate service and infrastructure issues quickly and accurately

• Drive continuous improvement in service quality and delivery

The result? Improved service health, increased operational efficiency, and IT operations processes that scale for the digital future. It’s that simple.

SN-EB-Service Health-with-AIOps-022019

© 2019 ServiceNow, Inc. ServiceNow, the ServiceNow logo, Now, Now Platform, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Your journey doesn't end here! Visit our AIOps page to learn even more about how you can get the most from ServiceNow.

ServiceNow was founded on a very simple idea: that work should be easier. ServiceNow is making the world of work, work better for people. Our cloud-based platform and solutions deliver digital experiences that help people do their best work. For more information, visit: www.servicenow.com.

BACK

LEARN MORE