cisco smart services automate network support and operations
DESCRIPTION
Cisco Smart Services Automate Network Support and Operations presentation from Cisco Live US 2013TRANSCRIPT
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
UCS Server
UCS6100
Common Challenges
Our tools are strictly reactive.
Our IT resources are overspent on level 1 incidents.
We want to leverage Cisco’s expertise to improve customer satisfaction.
We want to apply Cisco automations to manage in-house.
We want to apply Cisco automations to co-manage remotely.
3
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Topics
4
CNOASWhat and
WhyReference
ArchitectureDemo
and Examples
Use Cases: Best
Practices
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Monday Morning Without CNOAS
5
CEO Unable to Join a WebEx Meeting
Productivity SlowsToo late to fix the issue
Phone Not WorkingUnable to join audio conferenceProductivity Stops
Has to call IT but meeting already started
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public 6
Monday Morning Without CNOASThe Engineer
Basic StepsMust follow routine troubleshooting steps over the phone End User
Frustration
Need to connect to multiple systemsEngineer
FrustrationBasic troubleshooting feels like waste of timeNeed to connect to number of systems
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public 7
Monday Morning Without CNOASThe CIO
Dismal TTRTime to resolution is too long CSAT Low
Customers are dissatisfied
InefficiencyOperations are slow
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Could it be Different?
8
Could problem-solving be proactive?
Could troubleshooting be automated?
Could resources be better utilized?
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Monday Morning With CNOAS
9
The CEO is Happy
Technology That Works
A well-functioning and healthy network facilitates flexible productivity
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Monday Morning With CNOAS
10
The Engineer
Automations That Liberate
Automated basic troubleshooting frees the human mind to tackle complex challenges
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Monday Morning With CNOAS
11
The CIO
A Solution That Makes SenseCisco Network Operation Automation Service gives you a flexible, in-house solution to network management
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
What Really Happened Here?
12
During last change management, network admin forgot to bring up interface
OR Error condition of a switch port flapping causing spanning tree protocol loop
OR A machine in the network infected with worm caused a lot of traffic generation
Problem is in the network, not on the phone or CUCM
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
What Really Happened Here?
13
Detected the problem immediately when it happened
Ran multiple work flows simultaneously such as check CPU, spanning tree stability, performed traffic threshold analysis
Connected to the problematic phone and performed triage
Connected to the CUCM and performed triage
Proactively and accurately detected the problem as on the network
Created incident record
Proactively alerted network admin and sought approval to fix the problem
Ultimately fixed the problem with approval from network admin
Kept record of all activities for audit trail
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
What Really Happened Here?
14
Freed from downtime
End User
Freed from basic troubleshooting tasks
Engineer
Freed from firefighting
CIO
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Drivers for Network Operations Automation
Reduction of Operational Expenses
• Saves time• Reduces manual operating tasks
Reduction in Mean Time to Resolution (MTTR)
• Reduces diagnostic time• Reduces downtime
Cisco Value Add
• Domain Expertise• Cisco Best Practices• ITIL-based integration expertise • Faster ROI
Improves Network Operations Support
• Reduces NOC operations costs• Improves NOC staff productivity
and efficiency
Operational Automation Benefits
• Reduce Total Cost of Ownership • Reduce the risk of human error
15
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Why Cisco Network Operations Automation Service?
Knowledge in documents
Drives Operational Excellence in the Customer Network
16
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Benefits for Network Operations and Engineering
Network Operations Center—instantly perform troubleshooting and diagnostic procedures
Network Operations Center—migrate tasks from level 3 employees to level 1 employees
Network Engineering—automate mundane and repetitive health checks
Service Desk—automate incident response and corrective action
17
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Topics
Cisco RMS for Data Center
18
CNOAS
Use Case and
Automation Backend
What and Why
Reference Architecture
Demo and
Examples
Use Cases: Best
Practices
Reference Architecture
Architecture and
Examples
Use Cases: Portal
What and Why
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Routing and Switching
Automation PacksBased on Cisco Best Practices
Cisco Process Orchestrator
Terminal (SSH/Telnet) Web Services SNMP Database Windows
Cisco Network Operations Automation Service Overview
Adapters for Network Automation
Remedy
Day 2 Service Optimization
Day 0 Service Delivery
Day 1 Service Operations
Security,Unified
Communications
Data Center Networks
Integrated Services Router
Aggregated Services Router
Cisco Services
19
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Topics
Cisco RMS for Data Center
20
CNOAS
Use Case and
Automation Backend
What and Why
Reference Architecture
Demo and
Examples
Use Cases: Best
Practices
Reference Architecture
Architecture and
Examples
Use Cases: Portal
What and Why
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
NOA Deployment Architecture
22
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
How to Implement using Scaled Down Version?How to Implement Using Scaled-Down Version?
23
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
E-Mail Notification for Approval
26
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Approval Request
27
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Approval Request – Contd.
28
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Auto Remediation Success E-Mail
29
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Automation Summary
30
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Automation Summary – Contd.
31
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Automation Summary
32
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Cisco Process Orchestrator
33
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Architecture - NOS Remediation with CNOAS
RemedyTicketing System
4 Get Alerts from NP Db
3 Alerts pulled daily via WebSvc API
5 Cisco Process Orchestrator (CPO) takes action based on Remediation instruction from the customer
1 Inventory/ Configuration Data
2 Analyze &Persist DataNetwork Profiler (NP) /
Network Performance Analytics (NPA)
6 MTTR Data
CiscoPrime orNCCM
Audit
Remediation
Customer Environment
Network Performance Analytics (NPA)
35
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Automatic E-Mail to NCE of NOS Account
36
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
NCE Reviews BP Exceptions, Publishes Important Ones For Customer to Review and Auto-Remediate
37
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
End User Receives Automatic E-Mail to Review BP Exceptions to be Remediated
38
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
End Customer Reviews BP Exceptions and Selects Exceptions for Auto-Remediation
39
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
End User Reviews Exceptions to Auto-Remediate
40
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Exceptions Ready to be Remediated
41
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Approval E-Mail Sent
42
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Successful Auto-Remediation of BP Exception
44
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Remedy Incident Record Auto Created with Resolution Summary
45
Automation Summary
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Automation Summary
46
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Cisco Process Orchestrator
47
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Topics
Cisco RMS for Data Center
48
CNOAS
Use Case and
Automation Backend
What and Why
Reference Architecture
Demo and
Examples
Use Cases: Best
Practices
Reference Architecture
Architecture and
Examples
Use Cases: Portal
What and Why
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Health Check Automation–Avoid OutagesAvoid Network Latency Due to Spanning Tree Loop Problem
Design itWhere can we put it?
Procure it
Install it
Configure it
Secure it
Is it ready?
Architect it
Before Automation After Automation• Manual checking• Error Prone• Time intensive• Repetitive
• Proactive approach• Rapid checking (5 mins vs. 45 mins)• Simultaneous device check
• Complex (CCIE Level Experience)• Monotonous activity• Configuration risk
• Easy, Consistent, and Accurate• Repeatable• Fast ( 5 mins)
Manual
Design it
Execute show log
Parse HSRP protocol messages
Check for instabilityExecute show CPU utilization
check for > 50%
Architect it
Automation
Examine switch log files
Examine HSRP protocol messages
Check for instability
Check CPU Utilization
Check port flapping
Identify ports causing the problem
Request permission to disable ports
Disable ports Is it ready?
Request Permission to disable ports
Identify ports causing problem
Disable ports
49
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Case: UC Phone Down
Problem: IP phone cannot make any calls due to no dial tone
Solution: allow the service desk to troubleshoot the problem– Check if the phone in question is registered with the CUCM
– Retrieve IP address based on IP phone number from the CUCM
– Connectivity test performed (ping the phone)
– Identify the switch and switch port number using the HTTP web server query on the IP phone
– Check the switch port interface for connectivity.
Value: improves MTTR from 2 hours to 3 minutes for IP phone down problem. Migrates IP phone troubleshooting to level 1 Service Desk employees from level 3 personnel
Automating Troubleshooting support by the Service Desk
50
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Case: Detecting Branch Failure due to Max Transmission Unit (MTU) in DMVPN network
Problem: packet loss at a branch
Solution: diagnose problem by varying MTU size– Connect to edge router (Branch/HeadEnd) – Run the workflow with different MTU sizes.– Identify correct MTU size which does not
result in packets dropping.
Value: Reduce MTTR due to branch outage
Diagnose failure at a branch
51
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Case: UC – Proactively Prevent Outage on Unity
Problem: disk space on Unity server exceeded
Solution: proactively react to the disk space sizing– Monitoring system (CA, BMC, etc.) should generate an
alert when disk space reaches a threshold of 90%.
– Shrink the report databases
– Shrink the Unity databases
– Check the disk utilization and exit if less than 70%
– If necessary, move the oldest and largest files (>2M) to another directory (separate from the Unity databases)
– Create an automation summary
Value: prevents customer satisfaction issues associated with a Unity Server down problem
Prevent Outages by Proactively Addressing Unity Disk Space Utilization
52
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Cases – Virtual NOC
• Node/Device Down Troubleshooting
• Troubleshooting performance issue due to high CPU utilization
• Latency troubleshooting across the network
• Large branch outage due to LAN interface failure
• Incorrect MTU detection and remediation in DMVPN network
• Spanning Tree Protocol Loop Detection and Remediation
• Top Talker detection and remediation (performance issue from Branch to Data Center)
• Detect CRC error on switch port
• End-to-end connection check• Diagnose high availability
problem due to HSRP issue• Troubleshoot performance issue
due to high CPU• Many users do not connect to
the network due to VTP or routing issue
• Diagnose end user cannot connect to the network
• Diagnose multicast issue with user not receiving stream
• Routing Authentication Problem Troubleshooting
• Diagnose slow user response from branch office to server
• Spanning Tree stability check
• End user cannot connect to the network
• Circuit troubleshooting• EIGRP route missing• Detect proper MTU size for
end-to-end connection in a DMVPN network
• ISDN Troubleshooting• Troubleshooting Switch Port
Problem• Call Home – ASIC Port
Problem• Call Home – Power Fan
Problem• Slow user response from a
branch to a server in a Data Center
53
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Cases – Auditing
• Spanning Tree stability check• ISDN backup testing for routers• Validate FWSM failover configuration• Validate HSRP redundancy pairs configuration• Database comparison between Solar winds (Orion) and LMS• Validate spanning tree and HSRP affinity match for redundant
switches• Bulk export of running configuration from LMS• Database comparison for CA Spectrum and NCM• Routed HA check
54
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
Use Cases – Best Practices
• UC Phone Troubleshooting• List registered/unregistered phones• Cisco Telepresence call launch and diagnostics• Cisco Untiy connections server – excessive disk usage detection/reduce space• Emergency 911 call validation• Provision a UC phone• List IP phone by name and class• Debug voice/video gateway• Troubleshooting voice/video gateway using show commands• Troubleshooting CUCM – trace collection• Troubleshooting Cisco Unity – trace collection• Troubleshooting Cisco Unity Connection – trace collection• Troubleshooting Cisco emergency responder – trace collection• Cisco Telepresence inventory collection for mgmt• Cisco Telepresence software inventory collection for mgmt and compliance• Enable IP SLA responder for Cisco Telepresence CTS• Reset Telepresence peripheral hardware• Validate NIC is enabled on each Telepresence device• Monitor alarms on CUOM• Move/Add/Change/Delete on CUCM via CUPM• Tandberg videoconference detect & record unit version
55
© 2013 Cisco and/or its affiliates. All rights reserved.BRKDCT-1379 Cisco Public
DIY with CNOAS added to your existing tools
UCS Server
UCS6100
Final Thoughts: Move From Reactive to Proactive
Service Level Agreements/Operations (SLA/SLO) like Mean Time To Restore (MTTR) are key evaluation criteria and can be dramatically improved with Day 2 automated operation services like RMS and CNOAS.
Cisco intellectual capital is captured in both automated services and helps dramatically to bring a lot of operational efficiencies.
Auto-Remediations, Auto-Ticket Enhancements, and Auto-Dynamic Checks can help mitigate the majority of level-one NOC incidents.
Reactive to Proactive support is the key to sophisticated network operations… …achieved:
56