student textbook desk reference - quality web based training...t08rva-studenttextbooka.doc root...

T08RVA-StudentTextbookA.doc Root Cause Analysis

2007 Duke Okes and J.P. Russell & Assoc. of 1

Student Textbook Desk Reference

Root Cause Analysis: Fixing Problems for Good

By: Duke Okes and Quality WBT Center for Education THIS IS A LONG DOCUMENT. IF PRINTING IS A PROBLEM WE SUGGEST ONLY PRINTING THE SECTIONS YOU ARE INTERESTED IN. PRINITING IS ENABLED BUT NO CUTTING AND PASTING. Note: The student textbook contains the text content of the class without interactive exercises, activities, glossary links, images, examples, key points, tips, tests, reviews or handouts. The student textbook can be used for off-line refresher and future reference after the class. The student textbook should not be used in place of the web-based training program.



Table of ContentsINTRODUCTION AND CLASS RULES.......................................................................................................... 3

LESSON 01: WHAT IS ROOT CAUSE ANALYSIS ...................................................................................... 4

LESSON 02: DEFINE THE PROBLEM ........................................................................................................ 13

LESSON 03: UNDERSTAND THE PROCESS............................................................................................ 18

LESSON 05: COLLECT DATA ..................................................................................................................... 31

LESSON 06: STEP 5: ANALYZE DATA...................................................................................................... 38

LESSON 07: EXAMPLE PROJECTS........................................................................................................... 44

LESSON 08: TESTING CAUSE AND EFFECT RELATIONSHIPS .......................................................... 48

LESSON 09: COMING UP WITH THE SOLUTIONS.................................................................................. 53

LESSON 10: IMPLEMENTATION AND FOLLOW-UP ............................................................................... 60

LESSON 11: SUMMARY OF RCA................................................................................................................ 67



Introduction and Class Rules

Root Cause Analysis: Fixing Problems for Good

Welcome to the class provided by QualityWBT Center for Education.

Class Rules, navigation and features

Content Provider: Duke Okes Instructional Design: J.P. Russell Service Provider: QualityWBT Center for Education Web Design: Wes Whalen, Kinetik

Recommended prerequisite: To get the full benefit of the class we recommend that you be familiar with the Basic Seven QC Tools. A short assessment quiz must be taken before lesson one to help you determine if you may need additional references, training or guide books.

General Communication:

Note: Navigation and design features are not included in the student text book.

Next, you can start the first lesson.



Lesson 01: What is Root Cause Analysis What is root cause analysis (RCA)?

• RCA is an analytical process that allows you to break a process or system down into components parts in order to understand cause and effect relationships between them.

• It is a critical part of the problem solving process as it is required in order to find out what caused a particular problem.

• Knowing the root cause of a problem will lead to a permanent solution and its implementation.

Q: How is creative problem solving different? A: Root cause analysis is quite different from creative problem solving, which you might use for a situation such as: Locking your keys in your car and you don't really care what the root cause is. You just want to find a way to get into your vehicle. Isn’t RCA just basic common sense?

• Well yes and no. If you have been troubleshooting much of your life, such as many maintenance or information technology (IT) people have done, it may seem like common sense. But that’s because they’ve learned through repetition what thinking processes work best to solve problems.

• Most of us are not that focused on problem solving and therefore can benefit by developing our deductive thinking skills.

Topics to be discussed: In this class you will learn about how to:

- enhance your problem solving effectiveness by using the given model for in-depth analysis of problem situations - differentiate between analytical and creative thinking - facilitate problem solving situations where you are not an expert in the process or

technology involved - expand the range of tools that you have available for analysis of problem situations. Reminder: This course assumes that you are familiar with those Basic Seven QC Tools. Therefore we do not teach how to create them, but instead how to use them at the right time in a RCA project. If you are not familiar with those tools, it is recommended that you have access to a reference during the course (examples are “Memory Jogger II,” “The Quality Toolbox”) or that you take the online class “Improvement Tools and Techniques.” (You can view the class syllabus in the catalog.)



Please note that some lessons have a test at the end (such as this one) that you must pass to continue the class.

Instructor Notes:

Instructor Note 1: This course teaches a rigorous, logical, data driven approach to finding and fixing root causes of problems. It can be applied to problems of all types. Instructor Note 2: Not all problems call for the level of detail presented in this course. For some problems, the cause can be found easily and quickly by one or more individuals who take the time to think it through. Instructor Note 3: This course teaches a tried and true problem solving model and methodology but its application depends on each specific situation. Be willing to skips steps or add steps depending on the simplicity or complexity of the problem or undesirable situation. Instructor Note 4: All organizations solve problems, however some problems recur all too often due to: wrong assumptions; wrong conclusions based on opinions; the analysis not going deep enough to identify the real root cause. Why Do RCA? Many problems in your organization seem to recur, even after corrective action was thought to have been taken.

• Many of the “low hanging fruit” problems have been solved, leaving more difficult ones to be dealt with now.

• Standards for quality and risk have gotten much tighter in recent years--the bar has been raised!

Are you aware of problems that seem to recur, even though they were thought to have been solved?

Are there performance standards where you work or in your life where expectations are much greater than they were a few years ago?

Topics to be discussed In this medium-long length lesson we’re:

• introducing ways to categorize and think about problems



• going to explore common terms and common problem solving methods • going to analyze problems from different perspectives

In this introductory lesson we want to lay the foundation for defining the problem needing root cause analysis.

It’s important to understand the types of problems for which RCA is appropriate. Here are two different categorizations of problems as they relate to RCA. Analytical problems and creative problems:

Analytical problems are situations that call for finding the root cause. This means we need to understand how the system works (i.e., the cause & effect relationships within it that apply at this point in time). An example is food poisoning that occurs at a restaurant.

Creative problems are those that have many possible solutions, and where the root cause is not a concern. For example, if you lock yourself out of your car, you don’t care how or why you did it, you just want to find a solution that will solve the problem. (Note, however, that if you did this several times, root cause analysis may become useful!)

[Exercise] Which type of problem is appropriate for root cause analysis? Analytical. The other categories are: Technical problems and organizational / personal problems

• Technical problems are problems based on the laws of science, and which usually have scientific solutions that will be known to work

• Organizational/personal problems are less predictable and subject to differences in values or opinions, making them more difficult to solve

Note that RCA can be applied to either technical or organizational problem types, and often it will be found that the root cause of technical problems are actually organizational issues.

(For example: Personal problems, not technical problems, are subject to differences in values or opinions.)

Symptoms –> Physical Causes –> System Causes

• One of the reasons RCA is important is that for any problem (observed symptom) there can be several contributing causes. We want don't want to spend a lot of



time “shot-gunning” (using a creative approach for an analytical problem). Instead we want to use a focused “rifle” approach for identification of causes.

• Note that there are two different types of root causes, which we’ll call physical and system causes. Sometimes we only find one, yet other times we need to find both.

Examples:

Problem, your color printer stopped printing blue tones. Physical cause, the cyan (blue tone) cartridge has run out of ink. System cause, previous users ignored the message on the printer display saying the cartridge needed to be replaced.

Problem, an employee fell and broke a leg. Physical cause, water coming from a downspout froze on the walkway at the corner of the building just outside the cafeteria door. System cause, designers for the downspout installation did not provide adequate water discharge containment.

Problem, several vendors complain that they've not received payment. Physical cause, accounts payable does not have sufficient personnel to handle the number of invoices received per month. System cause, when the company started placing smaller, more frequent orders (to reduce inventory) they forgot to consider the impact on other departments.

Notice how the system cause occurs separate from the physical cause in both time and space.

Two terms & concepts we need to be familiar with:

• Correction makes the physical problem go away. For example, replacing the cyan cartridge that ran out of ink.

• Corrective action changes a process so the system will not produce the same problem again. For example, having a single individual be responsible for monitoring cartridge status and replacing any nearing empty ones.

Finding the physical cause and finding the system root cause both require the same type of thinking, but the second one can only be done after the first.

[Interactive exercise here with case study]

When do you need RCA?

• Once the physical cause is found, a key issue is deciding whether to find and correct the system cause.



• This is an important issue because not all problems are serious enough to warrant a full-blown RCA effort. Some reasons for choosing to not investigate the system causes might be:

• The problem has never occurred before and is unlikely to again • The cost to do the investigation would be more than the cost of dealing

with the problem if it did recur • The system that produced the problem is not under our control

• So for each problem someone in the organization needs to decide (based on issues such as frequency, cost and risk) whether to stop after correcting the problem or whether to also take corrective action.

You should know when to take corrective action to find and correct the system cause (i.e., when someone in the organization decides it is appropriate).

[Interactive exercise]

One of the problems with RCA is a lack of a detailed process. For example, the following is a typical model used to guide an organization’s Corrective Action Process

1. Problem is identified and documented 2. Corrective action request is assigned 3. Root cause is determined 4. Action is taken to eliminate root cause 5. Effect of actions taken is evaluated Note that there is only one step (step 3) for finding the root cause. If it were this easy, we’d be much more effective at it.

This is a more detailed model, the Eight Discipline (8-D) problem solving used by many organizations.

1. Use a Team Approach 2. Describe the Problem 3. Implement Containment Actions 4. Define and Verify Root Causes 5. Verify Corrective Actions 6. Implement Permanent Corrective Action 7. Prevent Recurrence 8. Congratulate the Team While this model has some useful additions over the previous one, it still is very weak relative to finding the root cause since, again, only one step (step 4) deals with it.

This is the Six Sigma DMAIC model, which can be very effective for RCA.



1. Define the project to work on 2. Measure the process 3. Analyze the process to understand important contributors 4. Improve the process by implementing solutions 5. Control the process to maintain the gain However, a Six Sigma Black Belt who uses it typically receives 4 weeks of classroom training, plus real practice with a project, in order to learn all the details behind the model. The following is a model that provides a level of detail that nearly anyone can use to effectively find and eliminate root causes of problems. I call it the DO IT2 model.

• It has two major phases: • Diagnostic phase - Find the root cause • Solution phase - Fix the root cause

• Each phase consists of 5 steps A key difference from most models is that the Diagnostics Phase is an interactive loop … that is, the 5 steps are repeated until one gets to the level of root cause that is desired for a specific problem.

The DO IT* Problem Solving Model Find IT* Diagnostic Phase: 1. Define the Problem

2. Understand the Process

3. Identify Possible Causes

4. Collect Data

5. Analyze the Data Fix IT* Solution Phase: 6. Identify Possible Solutions

7. Select Solution(s) to be Implemented

8. Implement the Solution(s)

9. Evaluate the Effect(s) 10.Standardize the Process

* = IT means root cause of the problem! [handout]



Steps of the RCA Model: You may ask, why so many steps? Let’s take an example and discuss the value of each of the first five steps of the Diagnostic phase: [Practice and handout of case study] This is a pictorial view of the model. Note that some steps are convergent thinking (taking a lot of information and interpreting it, as with step 1) or divergent thinking (taking a bit of information and expanding it, such as steps 2 & 3. We get into trouble making assumptions or jumping to conclusions when we use the wrong thinking process at the wrong time.

RCA can also be applied to 3 different time perspective levels. 1) single event 2) repetitive problem 3) repetitive cause

1. Single (or low frequency) event RCA: Responding to problems immediately when they occur or after only a few occurrences (viewed as minor). This level of RCA is unlikely to identify the root causes, but instead only deal with the physical cause.

2. Repetitive problem RCA: When a problem occurs repeatedly over time, RCA can be used not only to deal with each occurrence (e.g., JIT basis), but can also be used to identify the root cause so the problem will not occur again.



3. Once organizations have become effective at finding root causes, they can begin looking at repetitive causes (Meta-RCA) in different processes, which will allow detection of even deeper organizational issues that affect the organizations’ ability to define and execute good processes. Sometimes a problem cause will surface as several different problems. E.g. Material did not ship on-time, arrived late, material returned or did not work, could all be a result of poor package markings.

All of these require logic, process thinking, and incremental cause & effect analysis There are two different types of knowledge required for effective RCA / Problem Solving: Content knowledge is understanding the technology of the system one is having the problem with. For a medical problem it would be medical technology. For an automotive problem it would be automotive technology. For services it may be banking technology. The other knowledge is an understanding of the generic process for RCA/problem solving (e.g., the steps of the RCA model presented earlier).

The other knowledge is an understanding of the generic process for RCA / problem solving (e.g., the steps of the RCA model presented earlier).

Neither type of knowledge alone will suffice to solve a problem. Effective problem solving requires the integration of both, which is one of the key reasons for the use of task forces, action teams, etc., since different individuals will bring different types of knowledge to the analysis.

[exercise] examples:

1. Tools for analyzing data equals process knowledge. 2. How library books are categorized equals content knowledge.

Visual tools are used to support RCA such as flowcharts or run charts. There are several important reasons for using them: - Picture the idea: They allow one to provide a clearer description of what one is thinking to other people. - Picture is worth a thousand words: They help everyone have the same perspective, improving communications and decision making. - Organize thoughts: They develop people’s mental models and thinking processes - Reference: They provide a record of what was done, which could have potential learning potential for others who were not involved.



If we could see what we know, we'd understand more than we think! However, it is not the tools themselves that are important ... it is the thinking and decision making processes that the tools enable. [example handout] Next we will start the process by defining the problem. This is Diagnostic Step 1.

End of Lesson.



Lesson 02: Define The Problem Find IT* - Diagnostic Phase

Step 1 – Define the Problem In this short lesson we’re:

• deciding what problem to work on • scoping the problem to a manageable size • writing a problem statement that provides sufficient information to allow us to

have some idea of:

• what type of error, defect or situation [characterize it] • where the problem might be occurring [point in process, geographical,

location on product] • when did it happen [point in time] • how big of a problem it is, frequency, process history [risk to organization]

Which Problem? How Simple is the Problem? Sometimes the decision of which problem to work on is simple.

• A customer calls and complains • We find a nonconformity during an audit

Even in cases such as these, we need to be careful since we always have limited time and resources. So whether or not to do RCA still requires thought and analysis.

Why do you believe the Pareto principle is so important for organizations to use when working on RCA? Think about it.

Your answer should include something like: Every organization has limited resources (people, money, time), and must make sure it expends energy where it will be most useful.

A more systemic approach is to look at the range of problems that are being encountered (e.g., all customer complaints over several months’ time) and look for repetitive problems. The Pareto Diagram is often used for this purpose. For example, the Pareto diagram below might be:

• the number of hours of downtime for equipment by department • amount of defective product by part number • number of customer complaints by type of complaint.



The Pareto analysis identifies what problems are more significant, and therefore are likely viable candidates for RCA. Note that the Pareto should be done multiple times, using frequency / count as one measure, cost as another, and perhaps risk as another. Just because something happens a lot doesn’t mean it is really significant from an organizational performance standpoint.

In deciding what projects / problems to work on, we should consider A, B and C: A. The critical nature of the problem, such as • Impact on customers • Impact on safety / environmental performance • Cost B. Alignment with strategy and objectives. Even though a product / process is performing poorly, if improvement of it does not support where the company is going strategically, it may not be a good use of resources. C. Availability of adequate resources. It makes no sense to begin a project if the people needed to make it succeed will not be available or resources will be pulled from other projects that are more beneficial to the organization. How does this project benefit the organization compared to others?

Instructor Note: We must also be careful to avoid trying to “solve world hunger.” Scoping a problem too large will almost always result in failure. We can perhaps solve the hunger problem within a particular community, then go on to the next one. So make sure the breadth/scope of the project is reasonable.

A fundamental tool for both selecting projects and for creating the problem statement is a run chart. The run chart (which could be applied to any organizational or process performance measure) lets us know when process performance has changed. We can



then determine whether the change warrants investigation. The example below is control chart, but a run chart can be any time-oriented chart that shows how something performs over time.

The run chart also helps us understand how the process performs over time. For example: Does the problem come & go? Is it a slow trend? Is it a significant shift? As will be pointed out later, this analysis can help us narrow down the range of causes much more quickly.

The primary output from Step 1 is the Problem Statement

Major components of the statement include:

• What happened (e.g., description of the defect) • Where it happened (or at least where it is was found / observed) • When it happened (e.g., how long ago, and/or during what time period) • How often / How much it has happened during that period

Remember: What, Where, When, and How Often or How Much

Use clear operational definitions (terms / language) that all involved will understand in the same way. For example, if the problem statement involved “cycle time” would everyone involved know when the clock is started and stopped when measuring it?

Instructor Note:



The problem statement becomes a baseline against which success can later be measured. If the statement is no longer true after action has been taken to address the causes, then the project has probably been a success.

Let’s evaluate some problem statements. On a separate piece of paper write down any deficiencies you find in the following problem statements.

For example, the deficiencies for number 1, may be: What is too high? What is the actual level of downtime?

Now as you continue on, think about what, where, when, how often / how much.

Problem Statements are:

1. Downtime in department A is too high 2. The number of errors have increased from 70 to 400 3. Amount of time to respond to inquiries has increased in the past 3 months 4. Two associates who are using SPC have not had the required training 5. 3% of orders given to drive-thru customers at a fast-food restaurant are

incorrect

Let’s evaluate some problem statements to see some deficiencies and improved problem statements.

1. Downtime in department A is too high What does “too high” mean? Is there a specific target, and if so, what is it? What is the actual level of downtime (eg, # of hours per month) now, and what was it before the issue was identified as a problem? When did the downtime become unacceptable (eg, how long ago), and was the increase sudden or did it occur slowly over some time period? What type(s) of downtime? All, or only certain types (eg, breakdown vs. planned vs. waiting for approval, etc.)? A better version: Equipment downtime in Department A related to unexpected equipment failures has increased from an average of 5 hours per month to 20 hours per month in the past 3 months. 2. The number of errors have increased from 70 to 400 What type of errors? Where are the errors being found? Over what time period (eg # of days, weeks, or months) did the increase take place (eg, when was the last time the error rate was 70, and the first time it was 400, and what was the time period between)? Is the error rate based on the same number of opportunities (eg, has total output of the process been stable or has it increased?) If the output has changed providing percentages, in addition to absolute numbers, that would be useful.



A better version: The number of errors for bill of ladling content increased from 70% (1% of labels used) to 400 (3%) since January. These errors are found when the customer checks the shipment against the bill at their location. 3. Amount of time to respond to inquiries has increased in the past 3 months. What type of inquiries, and where? How much has the time increased (eg, what was the average time before and what is in now)? A better version: The amount of time it takes to respond to inquiries for billing errors has increased in the past 3 months from 1 business day to 3 days. 4. Two associates who are using SPC have not had the required training. Which department(s) or processes are involved? How do we know they have not been trained? A better version: There are no records indicating that two associates (one in department A and one in department M) who are using SPC have been trained in SPC. 5. 3% of orders given to drive-thru customers at a fast-food restaurant are

incorrect. Is it the entire order or only some part of it that is wrong? How long has the problems existed, and is it better or worse than before? A better version: 3% of orders handed to drive-thru customers at the XYZ fastfood restaurant on Smith Avenue do not contain what the customer ordered. This is an increase from 1% error rate six months ago.

What type of situation is it? In some cases it may be useful to clarify the statement. You may want to think about the problem from a situation perspective such as the following.

• Is something happening that isn’t wanted? • Is something not happening that is desired? • Is something happening that we’d like to repeat? • Is it simply that a process value is varying from the target value?

Instructor Note: Step 1 is a linkage to customer needs and organization objectives which will keep management interested.

End of Lesson.

Root Cause Analysis

T08RVA-StudentTextbookA.doc 2007 Duke Okes and J.P. Russell & Assoc. of 18

Lesson 03: Understand the Process Find IT* Diagnostic Phase cont'd

Step 2: Understand the Process In this short lesson we’re:

• setting boundaries for the project • flowcharting the major steps between the boundaries

The key idea is that nearly all problems are due to a failure of a process. We want to therefore understand the process where we’re finding the problem.

Setting boundaries is an important part of any RCA effort. Although the cause(s) might be anywhere upstream of where the problem was found, we should limit our initial investigation to where the problem was found.

• The ending boundary will almost always be where the problem is actually being found. The error could be the evaluation point itself or any step that leads up to the evaluation.

• e.g., A customer complaint may be due to an incorrect measurement by the customer, or it could be that what we provided to the customer was produced incorrectly.

• The beginning boundary is a bit more difficult to decide, but here are some general guidelines:

• Make it time-logical (e.g., If variation in the problem occurs at a frequency that is aligned with the organization’s operating frequency, then first just include operations. If frequency of the problem is more attuned to the planning cycle, then also include the planning process.)

• Keep it inside your organization, initially. There is a natural human tendency to start pointing fingers, and pointing to suppliers is often a first response. Make sure you’ve first ensured that it isn’t something internal.

Instructor Note: You can always shift the boundaries later if data points in a particular direction. As a matter of fact, good RCA involves continual adjustment of boundaries until they include only the root cause!

Root Cause Analysis


Once the boundaries are set the next step is to create a high level flowchart of the process. What is meant by “high level”? Typically there are no fewer than 3 and no more than 8 steps (there will, of course, be exceptions).

Suppose we had a problem whereby customers who showed up at a particular hotel often found that the room they thought they had reserved was not available. We might then focus on the phone reservation process and flowchart it at a high level. See the flowchart on the next page for the major steps to the reservation process.

We might instead choose to represent the step with pictures or symbols that reflect what is going on at that step (although labels should be given to each step in order to facilitate discussion).

A deployment or swim-lane flowchart is often used when the process crosses multiple boundaries (e.g., departments or organizations). It allows easy clarification of responsibility for each step. The following is an example of an engineering change request initiated by a customer:

Root Cause Analysis


The power of using flowcharts in this way for RCA is that:

• By only flowcharting at a high level we avoid spending a lot of time on details that may be unnecessary. It’s unlikely that all steps of the process are failing, and detailing them would therefore add no value (but would consume time and money).

• The flowchart can be used to help identify possible causes of the problem

• The flowchart helps identify points in the process where data could be collected to determine where the process is failing

Suppose, for example, that we collected and analyzed data (steps 4 & 5 of the RCA model) on the Reservation process, and it indicated that the problem was primarily due to the Data Entry step of the process. We’ve avoided flowcharting the logic used by the computer to search out the right room & dates. We will now narrow our boundaries to look at only the data entry process, and flowchart it in more depth.

[handout]

Flowcharts are important RCA tools for several reasons, including: - All problems are process-related. Problems may include:

lack of a process defective process process not followed

- The process is part of a larger system, interacting with other processes - Flowcharts help keep the focus on activities and process steps, not people, so we won’t be trying to blame someone for the problem

Root Cause Analysis


Practice Flow Chart Step 2:

Let’s practice flowcharting a process. Remember in an earlier problem, when we created a problem statement for errors at a fast food drive-thru restaurant. We had defined the beginning boundary as when the customer pulled up to the order board and the ending boundary as when the customer picked up the order.

1. Flowchart the process that occurs between these two boundaries using no fewer than 3 and no more than 7 steps.

2. After doing the first flowchart , imagine that we had some data that indicated most of the order errors were in the “order picking” portion of the process (e.g., where someone collects and bags the food). Flowchart this one step of the process in more detail.

After you have completed both of these charts, download an example done by someone else, and see how your flowcharts compare. [handout]

Instructor Note: In order to flowchart a process, one must know the process. You can gain knowledge by reviewing the procedure, visiting and observing the process, or ask people who design/operate the process.

The next lesson is about identifying all the possible causes. It is time to fill up your shopping cart with all possible causes.

End of Lesson.

Root Cause Analysis


Lesson 04: Identify Possible Causes Find IT* Diagnostic Phase cont'd

Step 3: Identify Possible Causes In this long lesson we're:

• going to use the flowchart, logic tree and detailed approaches to find causes • using brainstorming method to generate possible causes • applying techniques to assess information collected

This is a step most people jump into too rapidly. In the RCA model presented in this course, steps 1 & 2 are first done to help us get oriented to the problem and to help prevent us from jumping to conclusions. We have a choice of how to come up with possible causes, which are presented in order of preference.

Here are the three approaches:

1. flowchart approach: Use the steps on the process flowchart as the list of possible causes.

2. logic tree approach: Build a logic tree that breaks the system down in a cause-effect (why-why) way.

3. detailed approach: Use brainstorming, scanning the flowchart, and/or a cause & effect diagram to come up with a list of possible causes.

Instructor Note: Though the preferred approaches are tried and true, there are exceptions due to the nature of the problem or the analysis becomes too complex.

Flowchart Approach: In this approach, Step 1 maximizes the use of the flowchart previously created during Step 2. Step 2 minimizes the number of process steps to those necessary, and Step 3 focuses on the fact that most problems are related to a failure of a process.

Look at the following stepped items of a flowchart. If the problem description was: “Errors in auto repair invoices have increased from nearly zero to 3% in the past 2 months,” which steps of the process could (or could not) have caused the problem?

After you have made your assessment, the process steps will have comments. 1. Record service request--> 2. Schedule work--> 3. Perform and record work--> 4. Prepare invoice--> 5. Pay for service

Root Cause Analysis


1. Record service request- will probably not be a cause unless invoiced for wrong service. 2. Schedule work - will probably not be a cause. 3. Perform and record work - yes, this step could have caused the problem. 4. Prepare invoice - yes, this step could have caused the problem. 5. Pay for service - will probably not be a cause since this is where the problem was found. Errors in auto repair invoices have increased from nearly zero to 3% in the past 2 months. 1. Record service request--> 2. Schedule work--> 3. Perform and record work--> 4. Prepare invoice--> 5. Pay for service Logic would say that only the 3rd or 4th process steps are likely to contain the cause of the problem. With the process flow approach you can limit your investigation to the most likely process steps. Next we can immediately collect some data on the types of errors, in order to figure out which of the two most likely steps (3 and 4) is failing most often.

Drive-thru Flowchart: 1. Place order-> 2. Pay--> 3. Make order--> 4. Order picked--> 5. Hand to customer Recall the fast-food drive-thru problem in our practice exercise. If we look at the first level process flowchart we can see that:

• Placing Order could be a cause (e.g., miss-keyed it into the computer) • Pay could not be a cause of wrong food, so could be eliminated • Making Order could be a cause, if the wrong food was made but wrapped in the

“right” package • Order Picked could be a cause, since the person might pick up the wrong thing

and place it in the bag or something is left out • Hand to Customer could be a cause, since the wrong bag might be given to the

customer So just through logic we have eliminated 1 of the 5 major process steps. We could then collect data to find out which of the 4 other steps are actually causing the problem.

Logic Tree Approach:

Root Cause Analysis


Sometimes there isn’t a process that one can flowchart for the problem. Imagine you were responsible for traffic safety on a military base or other similar conclave with a large amount of traffic. Now imagine that the problem you want to solve is to reduce the number of accidents. There is no flowchart you could draw (OK, perhaps “1. Get into Car, 2. Start it, 3. Drive it, 4. Crash it,” but that won’t be much help). In this case we would instead use a logic tree that takes the system we’re working on and breaks it down into its functional / logical components. For example, the major components of the system might be:

• driver error • vehicle failure • road conditions

As you may notice, these components are similar to the major causal groupings / categories used in a cause & effect diagram (manpower / people, machines / equipment, materials, and methods). However, with a logic tree we create them based on the specific system we are working on.

We begin the logic tree by listing the problem at the top, then take the major components and show them as possible causes. These are the first level of “why’s” for the top level.

Now, as with the flowchart, we won’t go any farther until we collect and analyze some data. The data will help us know which of these three causes occurs most frequently, and we will then break down that particular branch of the tree to define the causes at the next level. For example, suppose we find that Driver Error causes the highest proportion of accidents. We might then look at different categories of driver error, such as:

• being inattentive or distracted • being tired • miscalculating

Root Cause Analysis


Next, we would find out which of these is contributing most often. This same cycle is repeated until we get to the level of cause where action will be taken.

Let’s look at a logic tree for a simpler problem such as an audit nonconformity. During a surveillance audit (conducted to ISO 9001), a firm was written up for having two manufacturing associates using statistical process control (SPC), even though they were not trained in SPC. (The company’s qualification process for personnel who used SPC was that they have completed an internal SPC training course. There are training records for all who attend SPC training.) How can you imagine that this might have happened? Logically, what are the ways this could happen? Think of the ways. 1. The SPC training class has not been conducted since the associates moved into their positions. 2. The class was provided but the two associates were not scheduled to attend. 3. The class was held and the associates were scheduled, but for some reason did not attend.

In this case we would determine which of the three possibilities had actually occurred (it is of course possible that it was different for each of the two associates), then expand the causes for that branch. In this case it turns out that the training had not been held because the company required a minimum of eight (8) people to hold a class … a policy issue. Instructor Note: For this example the problem is that process for scheduling training does not ensure consistent compliance to company requirements. The causes are clear and the process is straight forward. However, at times it may be beneficial to use flowcharting or process mapping in combination with the logic tree approach. Sometimes when doing a logic tree we will use the process steps (e.g., from the appropriate level of the flowchart) as the major components of the system. The example below shows that for medication errors occurring in a hospital, it was decided to first break the system down by process. The process steps for a patient to receive medicine are: 1) a physician must prescribe it

Root Cause Analysis


2) the internal pharmacy must then dispense it

3) the nurse must administer it to the patient

An error could occur at any of these three major steps.

Let’s return to the fast-food drive-thru problem. We already have a flowchart for it, so let’s try a logic tree. [Practice with handout]

Detailed Approach: Sometimes the problem is simple, or perhaps a structured, level-by-level approach that drills down through the flowchart or logic tree does not appear to be working. In such cases, an alternative approach can be taken where rather than working thru the model on an interactive basis, a list of all possible causes is generated. This can be done by one, or a combination, of three ways:

• scanning the flowchart • cause & effect diagrams • brainstorming

Scanning the flowchart means simply stepping through the high-level steps of the process, if one can be defined, and listing possible causes at each step. For example, imagine that customers frequently find that when they arrive at their hotels, there is not a room available. We can diagnose the problem by looking at the phone reservation process and creating a flowchart.

Root Cause Analysis


Note: Along with each of the process steps there is a list of reasons that may cause an error in the reservation process. A cause & effect (C&E) diagram is based somewhat on the same principles of a logic tree, with the following exceptions:

• The categories used are often the same regardless of application • There is a limitation as to the number of levels that can be analyzed • The C&E diagram could be repeated at lower levels, adding more details of

components identified in another C&E diagram

Here's a C&E diagram for the hotel example:

Rather than using the 6 M’s (manpower, methods, materials, machinery, measurements & mother earth) often used in a manufacturing environment, we’ve used the 4 Ps often used in an office or service environment. Note that it did result in some additional causes not identified through scanning the flowchart.

Brainstorming: Most people know about brainstorming as a means to come up with solutions to problems, but it can also be used to create a list of possible causes. However, there are actually several different ways for conducting a brainstorming session. The following are three different ways:

• Structured • Unstructured • Crawford slip

Unstructured – This is the traditional “list at the top of a flipchart you’re brainstorming topic, then ask people to shout out their ideas.” Write down all ideas as they are

Root Cause Analysis


randomly offered without evaluation (which will be done later). While this approach can be useful, it has limitations when there are people present who may have important input, but who don’t traditionally try to out-shout others. Structured – This is similar to the unstructured, but instead of random order, each person in order is asked for one idea. If they have none they say “pass” (although they may still contribute an idea next time around if they have one). This rotation process is continued until everyone passes on the final round. [Note: It can be useful to use the unstructured approach first in order to get more ideas, then follow with the structured approach of checking with each individual in the room until they all pass.] Crawford slip (also called brainwriting): This approach is useful when there are reasons why people may not be willing to openly state what they think. So instead, they are asked to write their brainstorming ideas on a sheet of paper, and all participants give their sheets to the facilitator, who transfers all the information to the flip charts (including any redundancies). This anonymity can help a group to overcome problems, when they find that many of them had the same ideas. [Practice with handout]

List More Causes: Hopefully you have a long list of possible causes. However, there are two more techniques that can be used to surface more possible causes. These techniques help ensure that ALL possible causes are gleaned. It is better to have a list that includes an extra cause that is not really possible than to have a list that doesn’t include the actual cause! The techniques are:

• system checks • people checks

System checks: Ask yourself or the investigatory team the following list of questions: - What could be causing it? The obvious, this is what we're looking for. - What could not be causing it? This helps to clarify what items are intentionally being excluded, and why. This allows checking of logic by others. Sometimes when you look at a process from a different perspective, you get a different view of things. - What combinations of things could cause it? Remember the fire triangle, that in order to have a fire, you must have ignition source, something to burn that is combustible, and oxygen. For an engine to start, there must be spark, fuel, and oxygen. Many problems do not have a single cause.

Root Cause Analysis


- What changes may have been made? If a process was originally working fine, then suddenly isn't, we might ask what might have changed. Note that the changes might have been intentional or unintentional, and made by us or someone else. - What barriers might have failed? Many processes have safeguards (e.g., checklists, inspections, signoffs) in place to prevent or detect problems. If a problem now exists that did not before, might one of the safeguards (barriers) have failed, creating the problem (or allowing it to get further downstream)?

People Checks: Another thing to consider is sources of possible cause information. Some key reminders are: - Ask personnel who are involved with the process, such as folks who designed it or those who carry it out. The former will have theoretical understanding of how they expected the process to work, and the latter will have real-world experience as to how the process actually works. - Ask suppliers and/or customers of the process. They have information about the interfaces of processes, and may have actually seen signals (e.g., early working signs) in their own processes that help point out a particular direction. - Ask personnel who have scientific understanding of the technology involved, such as engineers, scientists, researchers, etc. They should have a good understanding of cause & effect relationships. - Ask experts to evaluate, simulate, or use other diagnostic tools (e.g., failure mode & effects analyses {FMEA}) to predict failures. Problem Causes - Find IT* Diagnostic Phase Now, think how much time it would take to go through and look at each of these causes. We’ll instead show you a logical way to narrow the list down much quicker. In Step 4 of the RCA model we will be gathering data in order to help determine which of the causes we’ve listed is or is not creating the current condition. First, however, we may need to narrow the list of causes to a smaller number. The following are some ways of doing this.

• logic • data • probability • voting

[RCA model handout]

Root Cause Analysis


Logic: Think back to the flowchart for the car repair invoicing problem. We were able to eliminate some of the steps of the process that we were certain could not have caused the problem. The same is true when using a logic tree. Although there may be several possible causes, we may be able to eliminate some of the causes based on logic, supported by data. You can view the handout if you need a refresher of the car repair invoice flowchart. Data: Imagine you were trying to start your car, but it wouldn’t start. When you

turn the key nothing happens at all. Now if we listed things that could cause a car to not start, being out of gas would be one of the possible causes, but in this case we have some data (e.g., the engine isn’t even turning over) that indicates that fuel level isn’t pertinent at this time, so we can eliminate gas from our list. Information already collected may eliminate some of the potential causes.

[handout about flowchart]

Probability: Although there may be multiple possible causes, often either science or experience tells us that the probability associated with each cause is different. We can then determine which have the greatest probability, delaying investigation of others until they become necessary to probe. This might be done by dividing / allocating 100 points among all causes, with the number of points indicating the relative probability. Voting: If all else fails, we can take a vote to see which causes people believe

should be investigated first. Multi-voting is one technique often used when there is a large list to be processed. Multi-voting is used to identify the highest priority possible causes to investigate.

Here’s how the Multi-voting process works:

• Divide the # of items on the list by 2, and allow each individual to have that many votes. If there were 8 possible causes, each team member would get 4 votes.

• Ask members to vote. This can be done as a team or by written ballot.

• Count the # of votes for each item, and eliminate those with the lowest votes.

• Repeat the process until the desired (manageable) number of items is left.

We would then go into Step 4 of the RCA model to investigate these 4 possible causes. The next lesson is about data collection. What are the most likely causes? What are the facts? What samples do we need take? End of Lesson.

Root Cause Analysis


Lesson 05: Collect Data Find IT* - Diagnostic Phase

Step 4 – Collect Data In this long lesson we will :

• identify the most likely causes • practice using analysis tools • discuss technical versus organizational problems

By this point you have recognized that what we are doing is equivalent to carrying out a research project … an investigation. We have defined a problem (Step 1) and taken time to identify what is most likely to be contributing to it (Steps 2 & 3). We now need to collect data (Step 4) and analyze (Step 5) it to help us to eliminate and/or point to particular possible causes.

Before collecting data we should first determine what information / data is already available. Most organizations have lots available, and it may be possible that it will be sufficient. However, in many cases, the data is not of sufficient depth, sample size, or precision to allow determination of the cause, so additional data may be needed. If the latter is true, then we need to identify what this data is, and develop a process for gathering it. In some cases we will need to run some experiments, tests, or pilot studies in order to gather the needed data.

Potential Problems with Data • When using existing data , people may just write down what they want the result

to be rather than the real results, and the calibration status of measurement devices may not be known (at the time when it was used).

• When collecting new data, people may change their behavior since they know the process is being studied, so “normal operation” isn’t what we see.

Since these issues could impact the processes, we need to think about the degree to which the process may be impacted. Therefore, we need to consider comparing “old” outcomes data (which provides a baseline) with “new” outcomes data (e.g., let people know we’re doing an analysis of a problem) before spending a lot of time getting more in-depth data.

Instructor Note: Before we collect data we should be very clear about how we plan to analyze it. Otherwise, it's possible that we'll end up collecting something we don't need (which adds time and cost to the investigation), or we won't get data in the format we need for

Root Cause Analysis


the analysis (which means having to go back and collect it again, also adding time and cost. When looking at cause & effect relationships we can differentiate between two key sources of data, depending on whether it is cause (X-variable) data or effect (Y-variable) data. The following diagram indicates the relationship between the two. Process Variables (Xs) ----> Outcome Measures (Ys) Y data is often available, since it is how we know there is a problem (e.g., a Y variable is not acting as we want it to). And since the data may already be available, we should first think about analyzing it for patterns before investing in collecting of a lot of X data. Data Collection: Existing Data Sources There are usually large sources of existing data in most organizations. These are the records that are being used to control organizational processes while they are being operated, and/or are the outputs of those processes. The following are just a few examples:

• process control records: logs, distributed control system records, chart records

• validation records: testing, calibration, maintenance,and training records

• other process output documents: invoices, purchase orders

[Interactive Car Wash exercise] So we don’t need to spend time and money checking the soap and the equipment (X data), if we can simply look for patterns in the outcomes (Y data) and allow that to point us to the X variable most likely creating the problem. Review Data Collection: Use existing outcome measures (Y data) when possible

• Investigate existing data sources • Learn from the data already collected to identify patterns

Sampling Another key issue to think about before collecting X data is the sampling method. As you know, it is not always possible to collect data on an entire population. Lets assume we will not be measuring every possible unit of a large number of units, but instead we plan to measure some to collect desired data. For example, suppose we wanted to go back and check for patterns in invoicing errors. If we have created 2000 invoices in the past year we probably don’t want to look at all of them. We might instead decide to look at 100 of them (5%), and assume the sample would be representative of what we’d see if we looked at all.

Root Cause Analysis


The accuracy of this assumption could be impacted by the technique we used to decide which 100 invoices to view.

Next we will review or recap three choices for sampling.

Three Types of Sampling 1. Random Sample Take a random sample, which means that each of the 2000 invoices has an equal chance of being selected. Imagine that we took the numbers 1 through 2000, put them in a hat, took one out, shook the hat, and repeated until we had taken out 100. In reality we would probably use a “random number generator” such as is in MicroSoft Excel, but the concept is the same … a lack of bias.

2. Structured Sample We could instead take a structured sample, which means the 100 invoices are spaced apart by a fixed time interval or number of items (sometimes called interval sampling). For example, we might take two invoices from each week of the year, or select sequential invoice numbers that are 20 apart.

3. Stratified Sample We might also consider a stratified sample, which provides us with a balanced representation of different groups that exist in the 2000 invoices. Suppose that of the 2000, 1500 (75%) are for cars and 500 (25%) for trucks. We might then decide that of the 100 we’re going to pull, 75 will be invoices for repairs of cars, and 25 will be of trucks. Of course we will still need to figure out how these groups of 75 and 25 will be pulled, which means considering random or structured sampling within the stratified sample.

[Interactive exercise] Review Data Collection: Collect new data

Determine sampling methodology Random Interval (Structured) Stratified

Sample Size: Larger sample sizes allow us to have greater confidence in our decisions. However, larger sample size also impacts the cost of data collection. There are formulas for calculating sample size based on the desired level of statistical confidence (probability), but the details are beyond the scope of this course. Suffice it to say that we need to be sure that our sample size is sufficient … so involve someone with statistical expertise if you are concerned about this issue. For investigations such as

Root Cause Analysis


auditing, sample size is a case-by-case judgment call. Statistical based sampling is expensive and normally reserved for high risk situations or when required by management. Sample Timeframe: Sample timeframe is the period of time to be represented by the sample. For example, will one month be sufficient or are several months worth of data needed? Some of the issues that need to be considered when determining the timeframe for collected data are the following:

Cyclicality of the process Special events that only occur at particular times of the day/week/month/year,

etc. How often changes have been made to the process and how long back these

may be of concern Review Data Collection: - Determine sample size - Determine timeframe -Consider types of data: Attribute, Variable, Contextual -Consider data characteristics - differentiations Sampling: type of data The specific tool(s) used for collecting the data will depend on two major issues: What type of data is it? and What type of patterns do we want to detect in it at Step 5 of the RCA model? As a review, the types of data are:

Attribute data: (e.g., # or % of times something occurs), also called count or discrete data. That is, the data will consist of whole numbers (integers). Variable or Measured data: (e.g., units of measure such as weight, length, pH),

often called interval, continuous or variable data. That is, the data can be anywhere along a scale, with decimal points, etc. Contextual data: although sometimes not thought of as data, words (e.g., verbal

responses to interviews, words recorded in logs) are also forms of data that can be analyzed.

When collecting the data we typically will be looking at how it relates to other factors (e.g., what patterns exist), for such differentiations as location, time, or classification. We then need to design our data collection forms / process to enable including data characteristics.

Root Cause Analysis


In collecting attribute / count data, we will often use a check sheet (sometimes called a tally sheet). The sheet consists of categories that we want to look for, and record the frequency at which each occurs. Using the hotel reservation example, we added another category, time (by week in handout), which also subdivides the data, to the four causes already identified. [handout example] A Pictogram is another way of collecting attribute data and focuses on the physical location of the problem. This pictogram was used to identify defects on the faces of wheels and shows patterns of where the defects occurred most frequently. Instructor Note: This is the same type of tool used to record falls or accidents (e.g., a layout of the facility showing where each occurred), used for problems with completing forms / records (e.g., a blank form on which a mark has been placed to indicate where errors in completed forms were found), or even used to look for time patterns (e.g., the face of a clock with marks indicating what time each problem occurred). Collect Data: Spreadsheet Sometimes we may want to collect multiple types of data and multiple factors at the same time. We then create a table (a generic data-collection sheet) with columns representing each variable and a row for each entry / occurrence. The following example was used to try to find patterns for why insurance claims were being rejected. Each row would indicate one rejected claim, with the first column being the claim number and the second column being the amount of the claim. The remainder of the columns were factors that might impact whether or not a claim was likely to be rejected.

Data Collection: Interviews Sometimes there is no hard data available to collect and/or we want to collect information based on people’s memories or opinions. This is contextual data, which while likely to be less reliable, can still be a useful form of information. Interviews are a

Root Cause Analysis


key tool for this purpose. When using interviews we much be careful to avoid questions that people can answer with a simple “yes” or “no” unless that is the only option. Such answers do not allow us to understand the “why” behind their answers. We should instead try to phrase questions in a way that allow the interviewee to provide the maximum information. Data Collection: Interview Responses: Instead of who, what, when, where and why questions, we could make statements for the interviewee to respond to. Making statements instead of asking questions is especially useful if we need to ask many people the same questions. Interviewees can respond to statements using a Likert-scale. [handout example] By using such a scale, we can assign numbers to each level of the answer (e.g., Strongly Agree is a 5; Strongly Disagree is a 1), which allows statistical comparisons of answers by question and/or by respondent groups. We must be cautious about using interview data, however, since it can be highly biased by factors such as fear (e.g., respondents say what they think the interviewer wants to hear) or faulty memories. It is most effective when the answers are used to see whether they support or discount other forms of data. Review Data Collection: - Consider data recording methods that are efficient and improve the effectiveness of the investigation. Data Collection: Special Circumstances: Sometimes we need to be able to see things at a more detailed level than is normal. For these situations there are special techniques that can be applied. Detailed failure analysis of components (similar to forensic analysis) and time shifting are two classic examples.

• If a part fails it can often be analyzed through chemical means, microscopic means, etc. to see things that can't be seen through just our normal senses. Think of the analyses that were probably performed on failed Firestone tires off Ford Explorers (e,g., strength & chemical composition of the fibers, rubber, etc.).

• Special equipment may be needed to view processes that operate at speeds too high for the human eye to see. For example, high speed photography can be used, then the video slowed down for analysis.

• Computer simulation or accelerated life testing may also be done in order to try to recreate a problem.

Data Collection Plan:

Root Cause Analysis


The final step before actually collecting data is to develop and communicate a data collection plan. The idea is to ensure that all who are to be involved in collecting the data are properly aware and prepared to carry it out appropriately. The data collection plan should include:

• Who will collect what data and when will they do so

• Which variables are to be collected, what sample size, and in what time frame

• What precision / resolution is needed for each variable, and how should the data be recorded

• How will each piece of data be analyzed

Review Data Collection: - Consider any special circumstances related to collecting the data, issue a Data Collection Plan.

Instructor Note: Make sure that both the equipment and people are calibrated! If you are using past data, verify the calibration status of any measurement devices that were used. When collecting new data, ensure measurement devices are calibrated. Calibration also applies to people ...e.g., if more than one person is going to be conducting interviews, make sure all are in agreement on what questions to ask, in what order, and how to respond (or not respond) to interviewee questions or responses. [Practice: step 4, fast food drive-thru problem, plus handout] The next lesson is about analyzing the data you collected. The 7 basic QC tools will be useful as well as understanding relationships. End of Lesson.

Root Cause Analysis


Lesson 06: Step 5: Analyze Data Find IT* - Diagnostic Phase

Step 5 – Analyze Data In this medium length lesson we will:

• learn about how to analyze discrete data and soft data • address how to handle technical versus organizational problems • discuss statistical tools for data analysis

Step 5: Analyze the Data

This step is all about sorting through the data that was collected and looking for patterns. Patterns might be by time of the problem, by location of the problem, by type of problem, or other variables (e.g., by individual / machine/ supplier).

First perform the analyses planned during Step 4, but then also look to see if there are other ways to “slice and dice” the data.

What we are trying to do is find patterns that either point to, or rule out, certain causes.

We shall review of the tools available to analyze our data.

[graphic] If we’re looking for patterns of count data, a Pareto Diagram will often suffice. Notice that this is the same data from the check sheet shown in Step 4, but is re-organized into a Pareto chart. The Pareto chart not only shows which cause are most prevalent, but it does it much more powerfully (graphically, visually) than the check sheet. If we’re looking for patterns in location data, we might simply use the pictogram. However, the pictogram is just another form of a Pareto chart and can actually be used at multiple levels (e.g., drilling down closer to the cause).

For example, suppose you were looking at post-operative infections in a hospital. We might create a picture of the human body, front and back, and then mark on the picture where the infections have occurred for several patients over a month or so. It quickly becomes clear that it is the back of the legs. However, we can then look closer at where on the back of the leg the infections are occurring. In this case, it is more prevalent near the top of the incision than at the middle or bottom. We could also, instead of just using dots, use different symbols to indicate the type and/or severity of each infection.

Root Cause Analysis


A run chart is most frequently used to look for patterns over time. The X axis represents time (e.g., months, hours, sequence), and the Y axis represents the variable of interest. We then look for when the Y variable changed, and what X variables could create that particular pattern of change. Looking for spikes, trends, shifts, and runs. Important note: also look at a histogram of the data. Another way to look for patterns in variable data is to use a histogram, which has the variable of interest on the X axis, and the frequency with which it falls into a particular value cell on the Y axis. Looking for normality, skewedness, outliers, and multimodality. Important note: also analyze using a run chart, if data is time oriented. Sometimes our data is paired, data where we have both a Y value and an X value, and we want to determine if there is a relationship between the two variables. In this case we use a scatter diagram. If there is wide scatter with no apparent trend we say there is no “correlation” between the variables. If, however, it appears that as one variable increases the other also increases (or decreases), then we say there is correlation between the two variables. Caution: This does not necessarily prove that one variable “causes” the other, since there could be a third variable that causes both of them. Sometimes we’re looking for correlations or differences of a categorical variable and a numeric variable. In this case we’ve used a modified scatter diagram. It’s modified in the sense that the numbers on the X axis are categories (e.g., each stands for a particular technician), but we’ve given each a number so the software (MSExcel) would treat it as variable data and put all the ones for the same technician together.

Note how this allows us to see very quickly that # 3 appears to be distinctly different from the others. Careful … we don’t know whether or not that difference is good or bad … it all depends on why it is different. Keep in mind that data analysis simply points out factors that we need to investigate further.

Root Cause Analysis


Remember the multi-factor, generic data collection form. We’ve used Pivot Tables, another feature of MSExcel, to look at some of that data for patterns. The stacked bar charts allow us to look for similarities and/or differences that we might want to explore. For example, in the first bar graph we can see that insurance companies CL and KG have not rejected any pediatric claims. Is this because they don’t deal with these types of claims, or because they handle them differently? In the second graph we can see that Coder Z has never had a geriatric claim rejected. Does that mean that Z knows better how to code this type of claim? Again, the data simply points out questions that the investigators need to answer.

Review Analyze the Data • Pareto diagram: organize by frequency • Pictograms: look for patterns by location • Run chart: patterns over time • Histograms: patterns in variation • Scatter diagram: identify relationship between variables • Modified scatter diagram: differences of a categorical variable • Stacked bar charts: identify similarities and/or differences

There is a common tool often used to look for patterns and it is called an "Is-Is Not" table. What it asks us to do is identify places / times/ types etc. where the problem is different from other places / times / types. Once we identify these differences, we can then determine what we believe that might mean about the possibility of various causes actually being a contributing factor. [Practice, Handout]

Root Cause Analysis


If the data gathering process includes interviews or other processes that generate textual information, an affinity diagram is one way to analyze the data. This page shows the creation of an affinity diagram where a group of people had first: 1) brainstormed what they believed to be the key issues for maintaining a safe organization, then 2) created a diagram that organized the list into groups of similar items. Each group is then given a category / label that provides a higher-level description of what it contains. Text data may also be analyzed using a time journal (e.g., a list of activities / events listed in time order, along with the exact or estimated date / time it occurred), or using content analysis (counting how many times a particular word, phrase, or concept arose during the interview or in the documents that were reviewed). Creating a safe working environment: Requirements - need to understand risks - need to know regulations Safety Management: - safety as a priority - monitor performance - adapt to changes in processes Work Design: - use safeguards - processes don't allow unsafe actions Exception to the rule: Following is textual information gained from the hotel reservation agents. Data indicates they had not been confirming reservations with customers before ending the conversation.

Do you confirm every reservation? (Yes=3, No=7) If not, what are some reasons for not doing so?

• I trust the computer system to be accurate • I had written down what the customer wanted • It seems like a waste of time • I didn’t want to bother the customer by asking them to go over it again • The phone is very busy and didn’t have time • No particular reason • It takes extra time and doesn’t add value

Root Cause Analysis


In this case we probably wouldn’t actually create an actual affinity diagram, since the data all point to the same issue … the agents do not see value to performing that step of the process. No need to over analyze the obvious.

Questionable Data Analysis: In this course we are focused on graphs and other visual means of identifying patterns. One must be cautious, however, since what may appear to be meaningful may not really be. There are a few simple statistical tests that can be used, most of which MSExcel can do, that allow us to establish statistical probabilities for decisions. Although they are not included in this course, a list is provided below of some of the ones the student may want to explore.

• t test: Testing for differences between two averages • F test: Test for differences between two variances • ANOVA: Tests for differences between more than two averages • Chi-square: Tests for differences between counts • Correlation: Provides a statistical measure of the degree of co-relationship

between two variables • Regression: Provides the formula for the line that represents the relationship

between two variables

Instructor Note: If the process has a high number of variables and there is high probability of statistical interactions, a Six-Sigma approach (e.g., use of multivariate statistics and design of experiments) may be needed! Some things to watch out for when doing data analysis:

• As we said earlier, correlation does not necessarily mean cause & effect. For example, if we plotted the “number of gallons of ice cream sold in the last month” on the X axis, and the “number of people who drowned in the last month” on the Y axis, and did this for several months, we’d likely find a correlation. That is, when stores sell more ice cream more people drown. However, the cause of the drowning isn’t the ice cream, but instead the warmer weather that both causes people to buy more ice cream and to swim more (which raises the probability of someone drowning).

• Data often need to be presented in a normalized manner to take into account inherent bias such as different volumes of output. That is, just because one batch of raw material produced more defects than another doesn’t mean the problem is due to the batch. It might be that one batch was larger than the other, produces more output, but the same percentage of defects. So we may need to use percents, ratios, or a standardized normal distribution (where the average is shifted to zero, and standard deviations are converted to a +/- number of standard deviations, rather than the normal scale).

Root Cause Analysis


• Just because something jumps out, appears to be the cause, don’t stop. Look for more information that confirms or conflicts with it. Don’t stop at the first cause.

As part of sorting through the data and trying to identify which causes did and/or did not create a problem it is useful to have a few reminder questions to trigger our thought process. Following are some examples: Checklist questions

• When did the problem start? • Is the problem continuous or intermittent? • Where does the problem exist and not exist? • What in the process has (been) changed? • Is there a time delay or lag to consider (e.g., might something have changed a

considerable time ago, but the effect just now showed up due to how the system is designed)?

• What assumptions are being made that if they were incorrect, would alter our conclusions?

[practice L6, drive-thru and handouts] At this point in the course we’re going to move forward. However, it is important to understand that at Step 5, what we are often doing is simply establishing the cause at a high level of the process or system the first time through. The next step then is to revise the problem statement (returning to Step 1), then come through the remainder of the steps to get us closer to the specific, detailed cause. The number of times we go through this loop will differ from problem to problem (e.g., depending on complexity), and also depending on whether we are willing to stop upon finding the physical cause, or we also want to find the system root cause. The next lesson provides example projects. It is time to think about what you have learned and how it can be applied. Root cause analysis is not easy but the benefits support organization growth and innovation instead of working on the same problems over and over again. End of Lesson.

Root Cause Analysis


Lesson 07: Example Projects The following screens demonstrate the use of the RCA process on two relatively simple projects. They are provided to allow you to see examples of different ways to use the concepts & tools covered in the course. Before moving forward to look at the examples, please download and review the RCA Guide Sheet that has been provided to help guide you through the use of the RCA model. [handout] Note that the guide includes three major columns. The first is the 10-step RCA model, the second is a list of questions that should be asked / answered at each step, and the third is the outputs that will typically be generated by that step.

Lesson Instructions Use the RCA Model Guide as you review the lesson examples and identify: 1) Whether or not the steps were followed 2) Whether the questions were asked 3) Whether the outputs were generated

In effect, see how well you believe the project was carried out. Note: For now, pay attention to only the first 5 steps on the guide.

Problem #1: Line Downtime Management had requested that a team reduce downtime on a line, which was a continuous process (that is, if one piece of equipment on the line went down, the entire line had to stop). The team had been brainstorming causes and solutions for two meetings. What’s wrong with this? At a minimum, no structured approach for problem solving. An RCA facilitator who met with the team requested that they develop a process flowchart (they probably thought he was crazy … it was so simple!) He then asked, which machine is down most often? Why ask this? The world hunger problem … you can only reduce line downtime by reducing machine downtime, and focusing on the one that is down the most will have the biggest impact.

So the team gathered data (real-time data collected at the line, not maintenance records) and organized it into a Pareto diagram. The diagram indicated that Machine B

Root Cause Analysis


had the most downtime (which was a surprise to the team, since Machine A was much more complex and visible when it went down). The team now had data indicating which machine contributed the most to downtime, and how much it is down within a certain period of time. They can now write a problem statement. The next step was to ask why the machine was down. So the team returned to data collection, but this time focusing now on just how much time, but also classified the causes of downtime. The next Pareto showed that “board changes” was the largest cause. Note: Had the team relied on maintenance database records this cause would not even have shown up, since it was not something that maintenance was involved with. It was an operator-responsible task, similar to replacing a worn tool.

An investigation into the board change process found that when the machine was new, one person could do the task in about 5 minutes. In the current state it required two people and about 20 minutes. This was due to wear of some of the machine component parts (the product being produced was abrasive). The team quantified the cost of the downtime and approached management with a recommendation that the machine be rebuilt. Downtime of the line was reduced by 6% as a result of this repair … saving the company hundreds of thousands of dollars of downtime. [questions]

Problem #2: Noisy Gears

Management stated that too many parts were being returned from the assembly plant back to the machining facility. A Pareto was used to look at the reasons for rejection. Since the largest problem was pinion noise, the original Problem Statement (Step 1) was: 1.5% percent of pinions are being returned from assembly due to noise. In order to get oriented to the process, the team did a high-level flowchart (often called a SIPOC diagram) that showed the relationship of the machining plant to the assembly plant and the primary external supplier. They also redefined the problem not as noisy gears, but as gears with too much run-out (e.g., the tooth pitch diameter was not true to the OD (outside diameter) of the pinion). The first breakthrough occurred when the team visited the assembly plant and talked with the individual on the assembly line who had to disassemble the units if the pinion was bad. This got them linked to and aware of customer needs/concerns.

Root Cause Analysis


The team then did a high-level flowchart of the process used to manufacture a pinion. The team then looked at which steps of the process could cause run-out, and eliminated Finish Grind and Inspect. They brainstormed how each remaining step could create the problem:

• Bar machine not putting center in correctly • Hob off center • Grinding off center • Warping at Heat Treat • Damage during Transport

The team then decided that on the next run of the process they would measure the run-out of every pinion (typical batch was 30 parts) as it came off of each process. Pinions were handled in a way that allowed knowing which one was which, and an example of the data collection form, with the data for two pinions, is shown below. The data for the 30 pinions was then put into a histogram for each step of the process. Below are curves representing the histogram for each step, turned on its side. Note that the data shows that the distribution for pinion run-out got much worse (more run-out on average) at the Rough Grind process, indicating that it was causing a major change in part quality. Based on what was found at Step 5, we now have a more detailed description of the problem, so the team returned to Step 1 and went through the process again.

New problem statement: Excessive pinion run-out is being created at the rough grind operation For Step 2 the team could have flow-charted the rough grind process Step1 . Setup, Step 2. 1st Piece Approval, Step 3 Run. However this did not appear to be adding value. So they decided to go directly to Step 3 and brainstorm possible causes, and then created the following list:

Grinding spindle problems (e.g., worn bearings) Collet slipping Collet not centered Collet not being used

Step 4. Since the collet was listed three times they decided to look at it first, and found it was in the drawer near the machine, no longer being used (the machine could be run by locating the pinion on the center stub instead).

Root Cause Analysis


Step 5. They interpreted the lack of use of the collet as the problem, and went to Steps 6, 7 & 8. They decided to install the collet, ran some parts, and the run-out problem went away (Step 9).

[questions]

The next lesson is about testing cause and effect relationships. If we identify the right cause(s) the solutions will be effective. End of Lesson.

Root Cause Analysis


Lesson 08: Testing Cause and Effect Relationships In this short lesson we are going to cover supplemental information regarding Step 3: Identify Possible Causes of the model and review of the RCA process. There is a test at the end that you must pass to continue to the implementation steps.

• At the core of all RCA is identifying and testing cause & effect relationships … trying to find which one(s) have created the current problem / condition

• In most system/processes the cause & effect relationships are a network consisting of many effects that can be created by many causes, and each of these causes is actually an effect of a deeper cause

• The symptoms (effects) we see are usually at the top end, while the causes we need to get to are far down within the system

Another view is to take the same concept, (a single level only) turn it sideways, and simplify and re-label it.

This says that effects (Y’s or outcomes) are the result of multiple causes (X’s). When Y is not performing as we want it to, we need to find which of the X’s is causing the problem. Review RCA Process This is what we are doing with RCA. We get to the actual cause by stating theories of cause & effect, then testing to see which are true (which leaves it as a possible cause) and which are not true (which eliminates it as a cause).

• At Step 3: we list possible causes • At Step 4: we decide what data we need in order to test each cause • At Step 5: we analyze the data to see which causes should remain, and which

should be eliminated

Root Cause Analysis


We state this by saying that X (a possible cause) is causing Y (the effect). We actually state this for several Xs (i.e., steps in the flowchart or components on the logic tree at the level at which we currently are), then work to determine which are true and which are not true. Think of the fast-food drive-thru problem. We first had three theories (these could be from the flowchart or the logic tree) of what might be causing the problem:

• Theory 1: The order entered into the computer was wrong • Theory 2: The process of getting the meal into the bag wasn’t working • Theory 3: The wrong bag was being handed to the customer

These theories were then tested by collecting data. The data indicated that orders are almost never entered incorrectly, and the wrong bag is seldom given to the customer. What was in the bag was the real problem, so we have narrowed the problem scope to a smaller portion of the process, or only one branch of the logic tree. [Drive-thru handout] Here’s where the power of pattern analysis of data becomes useful. Let’s look at this process: X1 = Granular material being fed automatically from a bulk silo. X2 = Granular material being fed manually from 50# bags. X3 = Mixer with time control before dumping batch to conveyor. Y = Weight of each batch Now imagine that during a shift’s operation you saw each of the following patterns in Y.

Which X (X1, X2, X3) variable would you associate with each pattern? Example A goes with X2 - If the person forgets to put in one bag, or puts in two bags instead, or a bag put in has significant problems (e.g., quality or quantity of the contents), its effect will be relatively short lived...the residence time of one bag.

Root Cause Analysis


Example B goes with X1 - Material in a silo feeds out on a First In, First Out basis. However, there will be a transition time when the previous batch and new batch wil be intermixed / blended, until all the previous batch is gone. This will create a gradual change. Example C goes with X3 - If someone goes up to the equipment and turns the knob, the process will shift in response. It will stay there until someone turns the knob back the other way. [handout discussion] Why is that? In effect, each root cause theory is a “why” for the effect. This is why the term “5-whys” is often used when talking about finding root causes. Five is not a magic number, but if we ask “Why?” enough times, we are slowly working through levels of theories, eliminating those that are not causing the problem. If we get down deep enough in the system we will find the causes, that if fixed, will provide a permanent solution. Next, the 5-whys:

Example: Two flat tires in the past week. Why were the tires flat? Possible causes include a leaking valve, a cracked wheel, a bad seal between wheel and tire, or a punctured tire. We could check for each of these by immersing the wheel and tire in some water. Let's say we found that there was a nail in the tire each time, and it was a roofing nail. Next, Why do we have roofing nails in our tires? We might then remember that last week we visited a relative who recently had his roof replaced. Since there were several other people there we parked in the grass, which means we probably ran over the nails. Note there are other possible causes such as someone intentionally put the nails in our tires, the nails having been there when we bought the tires, etc. Next, Why were there nails in the yard? It could have been that there was an extra bag that was left behind and got knocked over, or that the roofers did not clean up, or that they cleaned up but not adequately. We could ask the owners if they saw the roofers clean up, etc. If they say yes, and that there was no extra container of nails, then we would assume an inadequate cleanup. Then, Why was the cleanup not adequate? It could be that they were in a hurry, the roller magnet they were using had lost its magnetism, or the magnet cannot work well in tall, thick, grass such as the relative has around his home.

Root Cause Analysis


Each level is first a Problem Statement (effect), and through data collection we find which cause created the problem at that level. We then call that cause an effect, and develop theories (causes) for that, then collect data again. The flat tire example demonstrates two major issues:

• 5-whys is the same concept as the logic tree, except with 5-whys we don’t show the ones that were not valid at each level. However, the logic tree helps us better see the multiple causes at each level, and we can then mark off the ones that were found to not have been the cause (see next screen).

• We need to be aware of the causes we can address and those we can not. In the case of the flat tire, the magnet problem is not one over which we have control, and probably not even how the roofers clean up. What we do have control over is whether we drive in areas where debris is more likely to exist. We might then decide to take action at the higher level, and not take our investigation (or at least try to implement a solution) at the next level, unless we are planning to have our own roof replaced!

[handout]

Logic Tree with whys: This is an example (flowchart on next screen) of a logic tree for a situation where the investigation needed to go from the most basic level to the very deep. This includes the Why's at each level. A heat treat oven was not impacting all parts equally, which was due to the fact that the thermocouples (a temperature measuring device) being used were different from the original ones that came with the oven. Why? Because the company had a policy of purchasing MRO (maintenance, repair & operations) items at the lowest possible cost. So Production was doing what they were supposed to (operate the oven), Maintenance was doing what they were supposed to (repair the oven) and Purchasing was doing what they were supposed to (buy replacement parts). However, if the analysis had stopped before it got to the purchasing process, the real root cause (a bad policy) would not have been found.

[heat tree logic diagram]

Instructor Note: The search for root cause could be infinite! like to say that I can prove that either Adam & Eve, or the Big Bang, depending on your belief, is the root cause of everything. That is, if you ask "Why?" enough times, you'll get that far back in time. Obviously, this would not be a good use of your time. We instead want to take our analysis to the level at which we can and should take action.

Root Cause Analysis


Physical versus Root Cause Be clear about whether or not you’re satisfied with solving the physical cause only, or whether you also want to eliminate the root (system) cause. By definition, if you don’t change the system or process (e.g., revise a policy or procedure) then the analysis did not get to the root cause. If the process or system is not changed, the problem will recur. Many organizations consciously decide to work at two different levels … a short term physical fix followed by a longer-term system fix … not necessarily both done at the same time or by the same people. It is not unusual for the system solution to require different people than the physical solution.

[handout of example]

The next lesson is about coming up with the ideal solution and its implementation. You can start the next lesson now by clicking the blue arrow in the upper left of the screen and then clicking on the next lesson. End of Lesson.

Root Cause Analysis


Lesson 09: Coming Up with the Solutions The DO IT2 Problem Solving Model Now that we’ve learned how to find the causes & root causes of problems let’s move on to step 6 (see below). In this lesson we will explore how to implement effective, permanent solutions. For this we’ll move to the 2nd phase (the Solution Phase) of the RCA model. Keep in mind that we’ll do this for both physical causes and for root (system) causes if the project is to go to this depth. In effect we’ll implement solutions for each separately, using parallel paths through the solution phase.

FIX IT* Solution Phase 6. Identify Possible Solutions 7. Select Solution(s) to be Implemented 8. Implement the Solution(s) 9. Evaluate the Effect(s) 10.Standardize the Process

* = IT means root cause of the problem! Solutions Big and Small: A key issue it to push for preventative / breakthrough solutions... not just incremental or temporary ones. It is because of the desire for breakthrough solutions that at this step of the model we want to be at our most creative. Creativity means looking at things from different angles, and getting outside our conventional and restricted thinking boundaries.

Creative Thinking Techniques We will discuss 5 techniques that will help trigger creative ideas:

• Scale up or scale down: Make the problem bigger or smaller • Reverse: Think how you could make the problem much worse • Morph: Start moving the process • WWXD: What would X do • No Limits: If no constraints, what would you do

Five Creative Thinking Techniques • Scale up or scale down (#1): Suppose the problem were either much more

prevalent, or the frequency was much less? Or suppose it (e.g., the machine, the

Root Cause Analysis


process, the part, the defect) was physically much larger or much smaller? Think in orders of magnitude of 1000 times.

Some issues are so complex that they are difficult to grasp unless they are scaled down (software could be an example). Conversely, some issues seem so simple, that they must be scaled up to understand the magnitude (wrong order times the number of franchises in the state, times the number in the country, times the number in the world, and so on.).

• Reverse (#2): Think of what you would do if you wanted to make the problem much worse … now reverse that idea. Example: if an organization is having problems getting forms filled out correctly, how would they make it worse? By making the forms more confusing! So, try making them less confusing (e.g., look at order, layout, font, etc.).

• Morph (#3): Have you ever seen a picture of a young child on TV, then suddenly

the image started changing and a few seconds later it was a much older person? What could you do with the process you’re working on to make it start to move in the right direction, in incremental steps?

If you want to improve customer relations of an auto repair center you could do it all in one fell swoop, or slowly change things over time (e.g., cleanliness, conveniences, language, level of service …). The latter will be easier and less disruptive.

• WWXD (#4): “What Would X Do?” says think about what someone else (an individual or organization) would do if they had this problem. It doesn’t really matter who X is … the idea is to get you to think outside the box. What would Donald Trump, BMW, Ross Perot, a 3-year old child … do?

This can be very powerful when asked what would (or could) our competitor do in this situation? However, we want to think larger than our competitors / their industry

• No Limits (#5): Suppose there were absolutely no limitations (e.g., cost, laws of

physics / chemistry / psychology) on the situation? What would you do.

This is a very good technique. Most everyone on the team in an organization incorporates organizational constraints into their thinking. In any given situation, the limitation may or may not apply. Management may have stated that capital spending will be limited this year. However, if spending had a 6 month payback they would want to invest the money now. Saying that there

Root Cause Analysis


are no limits or asking what is the right thing to do, stimulates more creative ideas.

[Practice: creative thinking with "no limits" handout] Creative Thinking Process Validation - Illumination - Incubation - Saturation Creative thinking is a process. It can be described by the steps: Saturation, Incubation, Illumination, Validation.

• Saturation is the period when you’re brainstorming or using other techniques in order to come up with as many ideas as possible.

• Incubation says that after saturation we should allow some time (e.g., perhaps a couple of days or so) to allow our subconscious to work on the idea list. We’re likely to have some additional insights. So rather than jumping from brainstorming to deciding which idea is best, give your brain a rest.

• Illumination is taking time to look at each of the ideas (e.g., put them “under a spotlight”) and see which have the most potential.

• Validation is testing the best ideas to see how effective they are.

Creative Thinking Environment: There are better and worse times / locations / situations for creativity than others. For Example:

• Time: Think about how much more or less creativity you’re likely to get from a group on a Friday afternoon versus perhaps a Wednesday morning? So when we can be most creative (e.g., time of day, day of week, etc.) should be considered.

• Location: Do you believe people would be more creative in a small, dimly lit room, or a large, open one, with plain or fancy decor? So where we will be most creative is another consideration.

• Relationships: Who is on the team company / organization? We should consider how these sorts of things may impact how creative people will be.

Mistake-Proofing:

Root Cause Analysis


Mistake-proofing, also called Poka-yoke or error proofing, is a vital problem solution perspective that should be incorporated into the final solution. It is especially appropriate for problems that occur infrequently caused by human error. Mistake-proofing (Poka-yoke) comes in different forms:

• prevention (e.g., it is impossible to do it wrong) being the best. You cannot put a round peg in a slot for a triangle. You cannot plug the Ethernet cable in the telephone jack.

• detection (e.g., if the problem occurs it is highly unlikely it will be able to get through the system) being an alternative. If material with a blue dot goes in the blue bin and you have material with an orange dot, through observations (detection) you are unlikely to put it in the blue bin.

Discussion: Think about your automobile. On many cars, it is now impossible to put it in gear unless you have your foot on the brake. This makes it a bit less likely that you’ll back over your neighbor’s cat by accident. The fact that a microwave oven turns off when you push the button to open the door prevents you from microwaving your hand while you’re removing something from the oven. Mistake Proofing is sometimes accomplished through control aids / tools (e.g., jigs or software) or through barriers (e.g., light curtains or timing) or alarms. It is designed to be a low-cost solution to problems. In cognitive situations barriers such as checklists are often used (e.g., think of the checklist used by airline pilots every time they take off).

Learn from Others: Benchmarking is another useful way of coming up with possible solutions. No matter what type of problem we have, chances are that someone else in our company, in our industry, or even in a different industry has dealt with a similar situation. We can save ourselves a lot of grief if we find what others have found to be successful, and see if it might also work for us. Typical benchmarking sources are trade journals, industry groups, conferences, professional societies, on-line databases, benchmarking groups, etc. With Google and other search engines, it is absolutely amazing what one can learn about what others have done.

When benchmarking there are two major questions we have: What process did they use and how effective was it?

Root Cause Analysis


Then, if we find a solution we like, we should be careful to not just copy it, but instead see how we might need to adapt it to our environment (e.g., culture, degree of product risk, etc.).

Review: Identify Possible Solutions, Step 6 • Stimulate creative thinking


• Creative thinking is a process with steps titled: Saturation, Incubation, Illumination, Validation

• Think environment considerations: time, locations, relationships • Mistake-proofing can be prevention or detection

• Benchmarking, how did other solve the problem

Step 7: Select Solution(s) to Implement At this step we put the solutions through a screening process to find out which ones are best. We might consider things such as:

1. Which would be fastest or easiest to do? 2. What is the benefit / cost ratio and/or payback period? 3. What is the probability of success? 4. How will people be impacted (e.g., workload, relationships with work partners,

skills required)? 5. What other problems might be created by the solution?

When risks are moderate to high or the outcomes are less certain, we should consider testing out solutions using a pilot run.

Decision Making Options: You should also consider what level of involvement you want others to have in the decision-making process. Here are three ways you can go about making a decision:

• Autonomous: We (whether an individual or a group) are going to make the decision and announce it to those responsible for carrying it out.

• Consultative: We (again, an individual or a group) are going to make the decision, but before we do so, we are going to ask those who will be impacted by it about ideas they have and advantages/disadvantages of some of the ideas we’re considering.

Root Cause Analysis


• Consensus: We are not going to make the decision alone, but only with full involvement of all who will be impacted by it

Of course each of these three options has advantage and disadvantages, such as the amount of time required, the level of commitment to the decision people will have, etc. When deciding how to go about it, we would consider not only these factors, but also things such as who has the most appropriate knowledge to bear on the problem, as well as who will have ultimate responsibility for the outcomes. Matrix: Effort versus Benefit A student in an RCA class (unfortunately I don't remember his name) provided a simple 2x2 matrix that can sometimes be used for screening possible solutions. On one axis is how much work (resources) will be required, and on the other is the payoff we’ll get (benefit). The upper left quadrant is a low-effort, high-payoff point … great ideas. The lower right quadrant is lots of work with little payoff, so we’re not likely to try them

potential payoff vs. effort required Low vs. High Low vs. Low High vs High High vs. Low Individual Ranking Ranking is another way to screen ideas. We simply list the solutions we want to consider down the left side, then list each person with input across the columns. This example isn’t really a problem solving one, but a simple example of how the Flintstones and Rubbles might decide what they’re going to have for dinner. As you can see, Fred would first prefer steak, his second choice is pizza, etc. You can see the rankings for each person, as well as the totals. The total column indicates that subs or pizza are probably the best compromise solution.

[Example Flinstone chart] Instructor Note: The lowest number in the total box is best overall choice. The best overall choice could be one that was not selected best by anyone. Scientific: Nonlinear A more scientific approach is to use a decision matrix as below. Suppose there were three possible solutions, and each was evaluated based on four criteria. If it was a really good option it got a 9, if it was medium it got a 3, and if it was not deemed very good it got a 1. Note that personnel resistance requires reverse coding.

Root Cause Analysis


This nonlinear scale helps better ideas to pop up quicker than in a linear scale, where the law of averages sometimes causes the ideas to nearly cancel each other out.

Review: Select Solution to Implement, Step 7 • Stimulate creative thinking


• Creative thinking is a process with steps titled: Saturation, Incubation, Illumination, Validation

• Think environment considerations: time, locations, relationships • Mistake-proofing can be prevention or detection • Decision Making options: Autonomous, Consultative, or Consensus • Screen by effort versus payoff

• Screen by individual ranking

• Screen by decision matrix, nonlinear

[Practice: Decision making and two handouts] The next lesson is about implementing the solution, managing the project and being successful. End of Lesson.

Root Cause Analysis


Lesson 10: Implementation and Follow-Up

Implementation and Follow-Up

• Step 8: Implement the Solution(s) • Step 9: Evaluate the Effects • Step 10: Standardize the Process

Step 8: Implement the Solution(s) This step of the process is all about putting the solution(s) into place. Success requires integrating two major topics:

• Project management: For the changes that will be made to the organization’s systems (e.g., who is going to do what and when).

• Change management: How will it impact the organization’s social system, and what can be done to make it go well?

We’re going to cover the last one first, because usually very few organizations do it, and yet it can have the largest impact on the success of the initiative. Not addressing social issues could be the largest hidden barrier to implementing change.

Change Management Organizational change management is the science of understanding that:

• Organizations consist of many individuals • Each individual has his/her own values, beliefs, concerns and priorities • If we don’t consider these issues, we’re likely to have a difficult time because

people resist change!

Change is not wanted. Now before you start thinking about those “irrational people who resist change” keep this in mind that resistance:

• is a normal, rational human response • is driven by fear of the unknown, as well as previous experiences • can be predicted and dealt with in ways that minimize disruption

Note: If you think “resisters” are irrational, you’ll create more resistance due to the way you approach and deal with them. For more information about resistance to change, search in Google for the term “pygmalion effect.”

Root Cause Analysis


There are a few tools that can be useful in determining potential reasons for resistance and ways to counteract them.

Force Field Analysis An example of incentives to change is that many organizations, when implementing lean production, promise that there will be no layoffs for X years due to lean. This is because lean very rapidly can reduce the need for headcount, and if people know they may lose their jobs, they’re likely to make sure that lean isn’t successful! Forces for change (Motivation) include: - job made easier - possible pay raise - learn new skills - job more secure - possible promotion - added benefits - paid vacation Forces against changed (threats) include: - lose job, right-sized - take a pay cut or lose benefits - transferred to another department or shift - need to learn new skills - move away from friends [Practice: Forcefield analysis] Reasons People Resist Change There are several reasons people resist change. They can be categorized by: 1) how things are now, 2) how they’ll be in the future, and 3) how we’ll get from here to there. Most are probably self-evident, but see if you can relate to any of them from personal experience or if you can add some. The first set of reasons relate to current state.

Current State: They like things the way they are now (e.g., it pays the bills), it indicates that what we're doing now is wrong, and/or the individual (or organization) has a poor success rate at change. Change Process: They're just people who don't like change, the reason for change and how it will be done has not been well communicated, and/or untrusted people are leading the change or it's being done at the wrong time or even in a poor way.

Root Cause Analysis


Future State: They're not sure if they can do it, they believe they're losing ground, and/or it's just more work. Distribution of Adopters Several years ago a writer by the name of Everett Rogers wrote a book titled Diffusion of Innovations. He proposed a normal distribution representing the response of society to new ideas / technologies. Most people accept or adopt change, but not at the same time.

The graphic in the lesson is a modified version of Rogers’ description. Basically there are some folks (on the right side) who will jump at the opportunity to try something new. In the middle are most of us who will try it once we’ve heard enough good stuff about it, and on the left are folks who, if they adopt it, will do so long after it is standard usage by the general public.

The same distribution is true within any organization, and we need to consider the mix of early adopters, late adopters and the majority, when putting together a group who is going to work on a problem. If we have all folks from the right side they’re going to run off and leave everyone behind. If we have too many from the left side we’ll make zero progress. So we need to be sure that we get appropriate representation of different styles towards change. A preference would be to have the most representation from the middle, perhaps two or three from the right side, and one from the left (which allows that “group” to be informed).� The circle diagram in the lesson comes from Steven Covey’s “spheres of influence” concept, which states that there are things in life we can neither control nor influence, there are others we can’t control but can influence, and still others that we have total control over. Often some people get hung up because they believe they have neither control nor influence, and helping them take the big picture and break it down into parts will help them see where they can have an impact, even though in some areas they may not be able to. Organizational Change Mechanisms The next diagram in the lesson indicates that there are several mechanisms that can be used to facilitate organizational change. Note that this model was developed by Duke Okes.

• Change the espoused values or organizational policies to provide guidance as to the desired behaviors

• Change the goals and reward system to provide people with direction and feedback

• Ensure the right individuals are in the right jobs • Create connections between individuals / groups that will better facilitate action

Root Cause Analysis


• Change people’s mental models through education, training, job transfers, etc. • Change the environment in which people work (e.g., layout, colors, artifacts) • Change the words used to describe business entities and processes

Project Management Perspectives Now that we’ve discussed the change issues we can talk about project management. Since there are many courses just on this topic we will not spend much time on it, other than to say that we need to have the following four perspectives in planning the implementation of a process change:

1. What actions are necessary? Who has responsibility for each? And when should each be done? Note that timing can be done in a forward planning or a backward planning method.

Forward planning says define the amount of time required for each task, and let that drive when the overall project will be done, while backward planning says to define when you want it all done, then work backwards to develop the timing for each component.

2. Ensure that a systems view is used, considering what impact the changes will have on others not involved, or even in the same area. The idea is that changing one part of an organization will almost for sure have an impact on other parts, and these impacts could be either positive or negative.

3. What monitoring will be done to determine how well things are going (both implementation steps and outcomes)? What contingency plans should be developed to deal with any adverse events / outcomes?

4. How will ultimate outcomes be evaluated?

Action Item Worksheet: The following is typical of an action item worksheet used for tracking implementation progress. If the project is more complex, a pert chart or project management software may be required. One of the keys for using this simple worksheet is to never, ever remove a date from the When column. If the project slips, new dates can be added; however, if old dates are removed, it will be difficult to see which parts of the project are continually slipping.

Root Cause Analysis


Configuration Control The other “change management” issue we need to consider is configuration management (also known as change control). That is, as the system is changed, documentation (e.g., product drawings, procedures, etc.) must be revised to reflect these changes.

This component of the change process should: 1) evaluate risks, 2) plan and document changes to the system, 3) record dates when the change was actually implemented, and 4) contain or disposition any items that become obsolete due to the change.

In most organizations these configuration control issues will be addressed through procedures titled something along the lines of:

• Engineering change control • Process change control • Document control

Review: Implement Solution, Step 8 • Manage as a project • Consider the people side of management of change control

• Force Field Analysis • Reasons people resist

• How things are now • Change process • How things will be in the future

• Distribution of earlier adopters, majority, and late adopters • Sphere of influence • System Actuators: How things will be in the future

• Reasons people resist

Root Cause Analysis


• Perspectives • Worksheets • Configuration management

Evaluate the Effects: Step 9 Now that we’ve implemented a change in the process, we want to know whether or not it worked. Did it have the effect we expected? This is also a good time to think back to the original problem statement. If the solution was effective, the problem statement should no longer be true. If the problem does appear to have been solved, there is always a possibility that it had absolutely nothing to do with our action, but instead may be a random change or due to the Hawthorne effect (e.g., if we pay attention to something it sometimes will get better). A way to verify that it is actually due to our action is to see if we can turn the problem on & off by removing our solution. Of course, in some cases this may be too costly or risky to try, but where confirmation is desired and it is possible to do this, it is a great confidence builder.

Evaluate the Effects: Step 9 Just because it’s better now doesn’t mean it will stay that way. We need to do long-term follow-up. Some organizations will do a review after 30 days, another after 60 days, and a final one after 90 days. If all three reviews show sustained improvement, then the project is closed out.

What do we do if we find at Step 9 that the problem still exists? Work backwards up the model.

In the steps below, work backwards up the model starting from step #8, one step at a time, to see where we may have made an error. For example, did we forget anything during implementation (Step 8)? Did we pick the wrong solution (Step 7) and so on?

1. Define the Problem...was the problem statement properly focused? 2. Understand the Process...is the process understood? Missing steps? 3. Identify Possible Causes...were there causes not fully explored? 4. Collect Data...was the right (relevant) data collected? 5. Analyze the Data...should alternate analysis techniques be used? 6. Identify Possible Solutions...would alternate solutions address the cause? 7. Select Solution(s) to be Implemented...was our evaluation & selection process

valid? 8. Implement the Solution(s)...did we forget anything during implementation? 9. Evaluate the Effect(s)...improvement not sustained?

Root Cause Analysis


Standardize the Process: Step 10

If the solution did work, what do we need to do in order to maintain the gain?

• Determine whether there are any related documents or files (e.g., procedures, instructions, samples, training plans, job descriptions) that need to be revised (Note: in regulated industries where process changes must be validated before implementation, this would have been done as part of Step 8)

• Communicate, communicate, and communicate more, so that people know the results and what their roles are. Make sure that related performance metrics are kept visible

• Audit to confirm that the implementation plan (e.g., changed process activity) is maintained

We should also leverage what we learned from the project, by:

• Adapting the solution to similar processes where it might be useful

• Adding project information to the “lessons learned” records so that others, who were not involved, can gain from the new knowledge

• Identifying what RCA aspects were done well and which were not, and asking how it could have been done better

The final lesson is a summary of what you learned. End of Lesson.

Root Cause Analysis


Lesson 11: Summary of RCA Does RCA Work? Here are results of just a few success stories:

• A team reduced downtime on a continuous process, saving the company $360k/year

• A group reduced rework and scrap off two machines, spending $71k to do so but saving $500k/year

• Downtime on a bottleneck process was eliminated, which meant fewer 7-day work weeks (and related employee burnout and overtime costs)

• A manager reduced product rejects in a finishing department by 50% within 2 weeks

• Elimination of the pinion noise problem saved $100k/year

• A company reduced the number of paperwork errors by 91% within one month

RCA results in processes being more efficient, effective, competitive, and lean.

At the fundamental level, RCA involves breaking down the process / system (as our image below breaks into parts) in order to understand and test for which cause & effect relationships or events have created the particular outcome. Finding the cause, however, is not the same as changing the right things in order to prevent recurrence. While diagnosis is often scientific, solution implementation must deal with issues of organizational dynamics and human behavior. Individual versus Team Effort Not every RCA effort needs a dedicated team but it can add resources and ideas during the find and fix it phases. Some problems can best be solved by a single individual who only involves others as needed. This is more likely the case when the problem is narrow in scope, does not have a large impact, and where the individual has very specialized knowledge of the product / process technology others do not.

Teams:

Root Cause Analysis


Other times a small ad hoc group (e.g., perhaps 3 people) can solve the problem within a few hours or days. This is again more likely appropriate where the problem does not have a significant impact on the organization, and when it is not highly complex. There are times when RCA efforts should involve a dedicated team (e.g., perhaps 5-8 people) who have different perspectives on the problem. This is especially true when the process crosses several boundaries (e.g., departments, organizations), when there is greater risk, when the problem is complex, and when there have been previous unsuccessful attempts to solve the problem. Dedicated teams should usually have a management champion / sponsor (ideally the process owner) who has authority to both make sure resources will be available, and that recommendations will be carried out. If RCA facilitators are used they not only need to have a high level of RCA ability, but also good interpersonal skills. Causes: We should also keep in mind that often there is not a single cause, but several causes that interact, as these examples show:

• In order to have a fire, both a combustible material and an ignition source must be present at the same time

• A machine breakdown may be due to both a component that is predisposed to failure and the lack of preventive / predictive maintenance for that component

• Damage to product may indicate that the process is not sufficiently mistake-proofed and personnel don’t pay sufficient attention to every part

RCA Applied: RCA can be applied to product failures, process failures, audit nonconformities, customer complaints, supplier problems, performance problems and so on. It can be applied at any management level of an organization, dealing with both technical issues as well as management issues. It can also be applied at the meta level. Think of it this way:

• Each RCA situation may itself be an indication of a much larger problem / cause that is difficult to see within narrow boundaries

• To see larger, more systemic problems, look for patterns (e.g., use pareto or trends) of problem types and or root causes. For example, if several problems

Root Cause Analysis


have had RCA applied to them and you find that several of them were caused by inadequate process change management, looking at the process change process itself would likely be worthwhile

• It is possible that the physical cause was corrected for each one, but that each time the decision was made to not take it to the root cause (e.g., system) level

General Rules for RCA: The following are pointers to guide you in your problem-solving efforts.

• Keep an open mind to all possibilities

• Be willing to make mistakes and learn from them

• Don’t jump to conclusions, use all the information available to you

• Remove constraints to thinking

• Look at the whole picture

• Use models (e.g., diagrams or pictures) that help you to visualize how the process works, and how process (X) variables affect the outcome variables (Y)

• Work backwards from a solution to test it. What is the ideal and how can it be reached?

• Don't work in a vacuum, talk those involved in designing, operating, maintaining, and servicing the process, as each has a different perspective

End of Class.

Best of luck to you and your future Root Cause Analysis! Instructor Contact Information

Duke Okes 423-323-7576

[email protected] www.aplomet.com

Other QualityWBT classes that may be of interest to you are: • Improvement Tools and Techniques: (Charts/Improvement Methods/Sampling

Techniques/Controls/Diagrams/Problem Solving), topics taken from ASQ body of knowledge certification programs, 8 hours,

• Measuring and Managing Customer Satisfaction (This course is designed to help you set up a successful customer satisfaction measurement system and one that is compliant with ISO 9001:2000), 180 day access, 9 hours

See catalog for all classes offered by QualityWBT Center for Education

student textbook desk reference - quality web based training...t08rva-studenttextbooka.doc root...

Documents