root cause analysis - pdated practices for new problems

14
Root Cause Analysis Updated practices for new problems Michael Coté / [email protected] 1

Upload: michael-cote

Post on 29-Jan-2018

2.898 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Root Cause Analysis - pdated practices for new problems

Root Cause Analysis

Updated practices for new problemsMichael Coté / [email protected]

1

Page 2: Root Cause Analysis - pdated practices for new problems

a persistent problem

“We also know that the increasing use of larger and more complex systems potentially results in a greater number of problems. Furthermore, many of these problems have a more series impact on the business and on the use of systems than ever before, and they are also in many cases much more difficult to solve.”

--A Management System for the Information Business 

2

Page 3: Root Cause Analysis - pdated practices for new problems

same RCA, new technologies

• Virtualization

• Multi-tier

• Cloud Computing

3

Page 4: Root Cause Analysis - pdated practices for new problems

virtualization

• More to manage - virtualization layer, virtual networks, virtual storage, etc.

• Tracking transients - removing the constraints of physical reality

4

Page 5: Root Cause Analysis - pdated practices for new problems

common virtualization

• Correlating virtual instances to physical problems 

• Configuration drift

• Network configuration

• VM access management

• Capacity management 

5

Page 6: Root Cause Analysis - pdated practices for new problems

multi-tier applications

• Dividing monolithic applications

• More parts, more connections to break

• Inter-dependencies cause weird failures

• Cascading problems well suited for RCA

6

Page 7: Root Cause Analysis - pdated practices for new problems

common multi-tier problems

• Blizzard of logs, events, an messages

• Memory leaks and resource limit cascade upwards

• Network connection failure

• Storage issues - network drives and shared disks

• Cache updates and gardening

7

Page 8: Root Cause Analysis - pdated practices for new problems

cloud computing

• Public vs. private cloud

• Virtualization problems & more

• Managing cloud provider relationship

8

Page 9: Root Cause Analysis - pdated practices for new problems

common cloud problems

• Remote access & instrumentation

• Configuration drift, resource degradation, networking, etc.

• Provider escalation policy & process

• Start with “status pages”

9

Page 10: Root Cause Analysis - pdated practices for new problems

Root Cause Analysis in

• Relationship tracking

• Event & change tracking

• Tickets & system of record

• Business impact

10

Page 11: Root Cause Analysis - pdated practices for new problems

rca metrics

• Mean time to isolate

• SLAs

• Costs - how much does RCA cost, save, make

• Revenue generation

11

Page 12: Root Cause Analysis - pdated practices for new problems

go ahead, be a firefighter

12

Page 14: Root Cause Analysis - pdated practices for new problems

Credits & Co.

14

• Virtualization image: Chen Zhao, http://www.flickr.com/photos/livepine/177132253/

• Russian Dolls: http://www.flickr.com/photos/barteverts/2481199686/

• Typing: http://www.flickr.com/photos/indi/113183027