cisco smart services automate network support and operations

57

Upload: cisco-data-center

Post on 28-Nov-2014

917 views

Category:

Technology


5 download

DESCRIPTION

Cisco Smart Services Automate Network Support and Operations presentation from Cisco Live US 2013

TRANSCRIPT

Cisco Smart ServicesAutomate Your Network Support and Operations

Mynul Hoda, Sr. Technical Leader

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

UCS Server

UCS6100

Common Challenges

Our tools are strictly reactive.

Our IT resources are overspent on level 1 incidents.

We want to leverage Cisco’s expertise to improve customer satisfaction.

We want to apply Cisco automations to manage in-house.

We want to apply Cisco automations to co-manage remotely.

3

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Topics

4

CNOASWhat and

WhyReference

ArchitectureDemo

and Examples

Use Cases: Best

Practices

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Monday Morning Without CNOAS

5

CEO Unable to Join a WebEx Meeting

Productivity SlowsToo late to fix the issue

Phone Not WorkingUnable to join audio conferenceProductivity Stops

Has to call IT but meeting already started

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public 6

Monday Morning Without CNOASThe Engineer

Basic StepsMust follow routine troubleshooting steps over the phone End User

Frustration

Need to connect to multiple systemsEngineer

FrustrationBasic troubleshooting feels like waste of timeNeed to connect to number of systems

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public 7

Monday Morning Without CNOASThe CIO

Dismal TTRTime to resolution is too long CSAT Low

Customers are dissatisfied

InefficiencyOperations are slow

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Could it be Different?

8

Could problem-solving be proactive?

Could troubleshooting be automated?

Could resources be better utilized?

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Monday Morning With CNOAS

9

The CEO is Happy

Technology That Works

A well-functioning and healthy network facilitates flexible productivity

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Monday Morning With CNOAS

10

The Engineer

Automations That Liberate

Automated basic troubleshooting frees the human mind to tackle complex challenges

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Monday Morning With CNOAS

11

The CIO

A Solution That Makes SenseCisco Network Operation Automation Service gives you a flexible, in-house solution to network management

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

What Really Happened Here?

12

During last change management, network admin forgot to bring up interface

OR Error condition of a switch port flapping causing spanning tree protocol loop

OR A machine in the network infected with worm caused a lot of traffic generation

Problem is in the network, not on the phone or CUCM

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

What Really Happened Here?

13

Detected the problem immediately when it happened

Ran multiple work flows simultaneously such as check CPU, spanning tree stability, performed traffic threshold analysis

Connected to the problematic phone and performed triage

Connected to the CUCM and performed triage

Proactively and accurately detected the problem as on the network

Created incident record

Proactively alerted network admin and sought approval to fix the problem

Ultimately fixed the problem with approval from network admin

Kept record of all activities for audit trail

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

What Really Happened Here?

14

Freed from downtime

End User

Freed from basic troubleshooting tasks

Engineer

Freed from firefighting

CIO

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Drivers for Network Operations Automation

Reduction of Operational Expenses

• Saves time• Reduces manual operating tasks

Reduction in Mean Time to Resolution (MTTR)

• Reduces diagnostic time• Reduces downtime

Cisco Value Add

• Domain Expertise• Cisco Best Practices• ITIL-based integration expertise • Faster ROI

Improves Network Operations Support

• Reduces NOC operations costs• Improves NOC staff productivity

and efficiency

Operational Automation Benefits

• Reduce Total Cost of Ownership • Reduce the risk of human error

15

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Why Cisco Network Operations Automation Service?

Knowledge in documents

Drives Operational Excellence in the Customer Network

16

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Benefits for Network Operations and Engineering

Network Operations Center—instantly perform troubleshooting and diagnostic procedures

Network Operations Center—migrate tasks from level 3 employees to level 1 employees

Network Engineering—automate mundane and repetitive health checks

Service Desk—automate incident response and corrective action

17

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Topics

Cisco RMS for Data Center

18

CNOAS

Use Case and

Automation Backend

What and Why

Reference Architecture

Demo and

Examples

Use Cases: Best

Practices

Reference Architecture

Architecture and

Examples

Use Cases: Portal

What and Why

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Routing and Switching

Automation PacksBased on Cisco Best Practices

Cisco Process Orchestrator

Terminal (SSH/Telnet) Web Services SNMP Database Windows

Cisco Network Operations Automation Service Overview

Adapters for Network Automation

Remedy

Day 2 Service Optimization

Day 0 Service Delivery

Day 1 Service Operations

Security,Unified

Communications

Data Center Networks

Integrated Services Router

Aggregated Services Router

Cisco Services

19

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Topics

Cisco RMS for Data Center

20

CNOAS

Use Case and

Automation Backend

What and Why

Reference Architecture

Demo and

Examples

Use Cases: Best

Practices

Reference Architecture

Architecture and

Examples

Use Cases: Portal

What and Why

Demo: Auto Response to Network Outage

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

NOA Deployment Architecture

22

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

How to Implement using Scaled Down Version?How to Implement Using Scaled-Down Version?

23

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Setup – Nexus

24

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Setup – Nexus

25

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

E-Mail Notification for Approval

26

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Approval Request

27

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Approval Request – Contd.

28

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Auto Remediation Success E-Mail

29

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Automation Summary

30

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Automation Summary – Contd.

31

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Automation Summary

32

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Cisco Process Orchestrator

33

Demo: NOS Best Practice Auto Remediation (Fully Automated)

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Architecture - NOS Remediation with CNOAS

RemedyTicketing System

4 Get Alerts from NP Db

3 Alerts pulled daily via WebSvc API

5 Cisco Process Orchestrator (CPO) takes action based on Remediation instruction from the customer

1 Inventory/ Configuration Data

2 Analyze &Persist DataNetwork Profiler (NP) /

Network Performance Analytics (NPA)

6 MTTR Data

CiscoPrime orNCCM

Audit

Remediation

Customer Environment

Network Performance Analytics (NPA)

35

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Automatic E-Mail to NCE of NOS Account

36

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

NCE Reviews BP Exceptions, Publishes Important Ones For Customer to Review and Auto-Remediate

37

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

End User Receives Automatic E-Mail to Review BP Exceptions to be Remediated

38

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

End Customer Reviews BP Exceptions and Selects Exceptions for Auto-Remediation

39

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

End User Reviews Exceptions to Auto-Remediate

40

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Exceptions Ready to be Remediated

41

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Approval E-Mail Sent

42

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Approval

43

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Successful Auto-Remediation of BP Exception

44

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Remedy Incident Record Auto Created with Resolution Summary

45

Automation Summary

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Automation Summary

46

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Cisco Process Orchestrator

47

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Topics

Cisco RMS for Data Center

48

CNOAS

Use Case and

Automation Backend

What and Why

Reference Architecture

Demo and

Examples

Use Cases: Best

Practices

Reference Architecture

Architecture and

Examples

Use Cases: Portal

What and Why

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Health Check Automation–Avoid OutagesAvoid Network Latency Due to Spanning Tree Loop Problem

Design itWhere can we put it?

Procure it

Install it

Configure it

Secure it

Is it ready?

Architect it

Before Automation After Automation• Manual checking• Error Prone• Time intensive• Repetitive

• Proactive approach• Rapid checking (5 mins vs. 45 mins)• Simultaneous device check

• Complex (CCIE Level Experience)• Monotonous activity• Configuration risk

• Easy, Consistent, and Accurate• Repeatable• Fast ( 5 mins)

Manual

Design it

Execute show log

Parse HSRP protocol messages

Check for instabilityExecute show CPU utilization

check for > 50%

Architect it

Automation

Examine switch log files

Examine HSRP protocol messages

Check for instability

Check CPU Utilization

Check port flapping

Identify ports causing the problem

Request permission to disable ports

Disable ports Is it ready?

Request Permission to disable ports

Identify ports causing problem

Disable ports

49

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Case: UC Phone Down

Problem: IP phone cannot make any calls due to no dial tone

Solution: allow the service desk to troubleshoot the problem– Check if the phone in question is registered with the CUCM

– Retrieve IP address based on IP phone number from the CUCM

– Connectivity test performed (ping the phone)

– Identify the switch and switch port number using the HTTP web server query on the IP phone

– Check the switch port interface for connectivity.

Value: improves MTTR from 2 hours to 3 minutes for IP phone down problem. Migrates IP phone troubleshooting to level 1 Service Desk employees from level 3 personnel

Automating Troubleshooting support by the Service Desk

50

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Case: Detecting Branch Failure due to Max Transmission Unit (MTU) in DMVPN network

Problem: packet loss at a branch

Solution: diagnose problem by varying MTU size– Connect to edge router (Branch/HeadEnd) – Run the workflow with different MTU sizes.– Identify correct MTU size which does not

result in packets dropping.

Value: Reduce MTTR due to branch outage

Diagnose failure at a branch

51

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Case: UC – Proactively Prevent Outage on Unity

Problem: disk space on Unity server exceeded

Solution: proactively react to the disk space sizing– Monitoring system (CA, BMC, etc.) should generate an

alert when disk space reaches a threshold of 90%.

– Shrink the report databases

– Shrink the Unity databases

– Check the disk utilization and exit if less than 70%

– If necessary, move the oldest and largest files (>2M) to another directory (separate from the Unity databases)

– Create an automation summary

Value: prevents customer satisfaction issues associated with a Unity Server down problem

Prevent Outages by Proactively Addressing Unity Disk Space Utilization

52

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Cases – Virtual NOC

• Node/Device Down Troubleshooting

• Troubleshooting performance issue due to high CPU utilization

• Latency troubleshooting across the network

• Large branch outage due to LAN interface failure

• Incorrect MTU detection and remediation in DMVPN network

• Spanning Tree Protocol Loop Detection and Remediation

• Top Talker detection and remediation (performance issue from Branch to Data Center)

• Detect CRC error on switch port

• End-to-end connection check• Diagnose high availability

problem due to HSRP issue• Troubleshoot performance issue

due to high CPU• Many users do not connect to

the network due to VTP or routing issue

• Diagnose end user cannot connect to the network

• Diagnose multicast issue with user not receiving stream

• Routing Authentication Problem Troubleshooting

• Diagnose slow user response from branch office to server

• Spanning Tree stability check

• End user cannot connect to the network

• Circuit troubleshooting• EIGRP route missing• Detect proper MTU size for

end-to-end connection in a DMVPN network

• ISDN Troubleshooting• Troubleshooting Switch Port

Problem• Call Home – ASIC Port

Problem• Call Home – Power Fan

Problem• Slow user response from a

branch to a server in a Data Center

53

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Cases – Auditing

• Spanning Tree stability check• ISDN backup testing for routers• Validate FWSM failover configuration• Validate HSRP redundancy pairs configuration• Database comparison between Solar winds (Orion) and LMS• Validate spanning tree and HSRP affinity match for redundant

switches• Bulk export of running configuration from LMS• Database comparison for CA Spectrum and NCM• Routed HA check

54

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

Use Cases – Best Practices

• UC Phone Troubleshooting• List registered/unregistered phones• Cisco Telepresence call launch and diagnostics• Cisco Untiy connections server – excessive disk usage detection/reduce space• Emergency 911 call validation• Provision a UC phone• List IP phone by name and class• Debug voice/video gateway• Troubleshooting voice/video gateway using show commands• Troubleshooting CUCM – trace collection• Troubleshooting Cisco Unity – trace collection• Troubleshooting Cisco Unity Connection – trace collection• Troubleshooting Cisco emergency responder – trace collection• Cisco Telepresence inventory collection for mgmt• Cisco Telepresence software inventory collection for mgmt and compliance• Enable IP SLA responder for Cisco Telepresence CTS• Reset Telepresence peripheral hardware• Validate NIC is enabled on each Telepresence device• Monitor alarms on CUOM• Move/Add/Change/Delete on CUCM via CUPM• Tandberg videoconference detect & record unit version

55

© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public

DIY with CNOAS added to your existing tools

UCS Server

UCS6100

Final Thoughts: Move From Reactive to Proactive

Service Level Agreements/Operations (SLA/SLO) like Mean Time To Restore (MTTR) are key evaluation criteria and can be dramatically improved with Day 2 automated operation services like RMS and CNOAS.

Cisco intellectual capital is captured in both automated services and helps dramatically to bring a lot of operational efficiencies.

Auto-Remediations, Auto-Ticket Enhancements, and Auto-Dynamic Checks can help mitigate the majority of level-one NOC incidents.

Reactive to Proactive support is the key to sophisticated network operations… …achieved:

56