ken johnston - big bugs that got away - eurostar 2010
DESCRIPTION
EuroSTAR Software Testing Conference 2010 presentation on Big Bugs That Got Away by Ken Johnston . See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/TRANSCRIPT
![Page 1: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/1.jpg)
What We Can Learn from Big
Bugs that Got Away
Ken Johnston, Group ManagerOffice, Internet Platforms & Operation
EuroSTAR 2010
![Page 2: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/2.jpg)
I Want to know more about YOU
• Who wandered in here by accident
• Who is at EuroSTAR for the first time
• How long have you been in Software Testing
• Have you ever missed a bug
• Have you ever heard…
![Page 3: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/3.jpg)
“HOW COULD
YOU MISS
THAT BUG!!!”
![Page 4: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/4.jpg)
Def. – Rolling around in something disgusting
![Page 5: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/5.jpg)
Ken’s Big
Bug Story
![Page 6: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/6.jpg)
It all began one dark and stormy night!
![Page 7: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/7.jpg)
![Page 8: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/8.jpg)
Session Overview
• About you, me and setting the tone• Bug Wallowing 1 – A self reflective journey• Bug Wallowing 2 – Group Therapy• Root Cause Analysis 101
▫ Sentinel Events▫ Pattern Analysis▫ Formal RCA program overview
• Bug Wallowing 3• Five Whys• Bug Wallow 4• Fishbone• Bug Wallowing 5• Crafting a good bug story
P
P
![Page 9: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/9.jpg)
Learning Objectives
1. Be armed to deal with the question, “How did test miss this bug.”
2. Learn a little about formal RCA and the use of the 5 Whys and Fishbone tools
3. Have a number of highly instructive bug stories from within your organization that you can take home
![Page 10: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/10.jpg)
Def. – Roll in something: to lie down and roll around in something
![Page 11: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/11.jpg)
“HOW COULD
YOU MISS
THAT BUG!!!”
![Page 12: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/12.jpg)
Time for some “Group Bug” Therapy
![Page 13: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/13.jpg)
Repeat After Me
• I did not design the bug.
• I did not code the bug.
• I found crashing bugs, data corruption bugs, fit and finish bugs.
• I found hundreds of bugs.
![Page 14: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/14.jpg)
Repeat After Me
•So what if I missed a bug.
• I didn’t write the bug in the first place.
![Page 15: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/15.jpg)
Activity Share your Bug Story
• Take the next 10 minutes
• Groups of 2 or 3
• Think of a bug that got away
• Minimum One Bug story each
• Questions to ask
▫ How long after ship did you see this
▫ How big was the impact
▫ How did it get missed
▫ What did you change because of this bug
![Page 16: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/16.jpg)
That’s Time
![Page 17: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/17.jpg)
Time to Share
• Next 5 minutes or so
• Did you have any Ah Ha moments?
![Page 18: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/18.jpg)
Why do we Wallow in Bugs that got away?
• Take 3-5 minutes to discuss in your groups
![Page 19: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/19.jpg)
That’s time
![Page 20: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/20.jpg)
Time to Share
• What did you come up with?• Why do we wallow?• Why do we RCA bugs?• My List
▫ To learn from mistakes▫ To systematically identify
areas for improvement▫ To prevent repetition of
mistakes▫ Bugs are stories and
organizations are driven by the stories they tell
![Page 21: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/21.jpg)
First we need a commonbaseline to work from
![Page 22: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/22.jpg)
Root Cause Analysis 300 Level• Two approaches to RCA
▫ Sentinel Event
▫ Pattern Analysis
• Formal RCA Program▫ Data Collection
▫ Data Analysis and Assessment
▫ Corrective Actions
• The Pit and the Pendulum
▫ Risks of RCA
▫ Benefits of RCA
Based upon Ch. 11PDF available to EuroSTARattendeeshttp://defectprevention.org
![Page 23: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/23.jpg)
RCA –Sentinel Event Bugs
• How do you know it’s a Sentinel Event Bug?
• If you make the front page of the http://wsj.com
• Production Outage▫ I have a lot of these stories
• Security vulnerabilities
• The last bug taken before ship▫ “How could we have missed this!”
• Any big bug that got away
• Nothing to do with the X-Men
![Page 24: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/24.jpg)
RCA Pattern Analysis
• Pattern Analysis requires a lot of bugs
• Pattern Analysis can be done over time
• Pattern Analysis is best served within a formal RCA Program.
▫ Cut some of the slides from this presentation
▫ The full set of slides can be found in the appendix on the EuroSTAR conference website
![Page 25: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/25.jpg)
Phases of an RCA Program
1. Event Identification
2. Data Collection
3. Data Analysis and Assessment
4. Corrective Action
5. Inform and Apply
6. Follow-up, measurement and reporting
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
P
![Page 26: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/26.jpg)
Phase 2: Data CollectionExercise
• Data Channels 5 Minute Discussion in Groups▫ What are the sources of data in my
organization▫ Which are practical▫ Which are the most costly to
implement▫ Which are most likely to yield results▫ Do you have time to implement these
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 27: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/27.jpg)
That’s time
![Page 28: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/28.jpg)
Phase 2: Data Collection Time to Share
• What sources did you come up with?
![Page 29: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/29.jpg)
Phase 2: Data Collection(Sources of Data)
• Defect and Test Case Management tracking system• Source code repository and Test code coverage
data• Voice of the Customer
▫ Product support and Customer or marketing data▫ Individual surveys and interviews
• Findings from previous RCA Studies• Crash data through Windows Error Reporting• Services have tickets and data center telemetry
▫ Heuristic Data of live site now vs. historic
More about WER @ https://winqual.microsoft.com/
![Page 30: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/30.jpg)
Phase 2: Data Collection(Tracking System)
• Prepare a list of Sentinel Events• Gather and Prepare the Preliminary Data• Route Single Event through Process• Create an RCA Tracking Database
Data Elements of RCA Tracking System
• Event or Study ID, Title & Dates
• Related Defect links
• Failure areas and Source Code
• Timeline of events before and after (vital for services)
• Team Contacts and Owners
• RCA Analysts and Contacts
• Expert Groups and Contacts
• Cause of defect and corrective action
• Survey Data and Results on effectiveness of corrective action
• Log Events in RCA system• Analyze events
• NOTE: Meta Data better suited for lists, documents and shares
![Page 31: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/31.jpg)
Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone)
Good article from ASQ –http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.html
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 32: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/32.jpg)
Phase III: Data Analysis and Assessment(the Five Whys)
• Brief History - http://en.wikipedia.org/wiki/5_Whys
▫ Developed by Sakichi Toyoda▫ First used in Toyota Motor Corporation▫ Common tool within Kaizen, Lean Manufacturing & Six Sigma
• What is it▫ Simply put - ask why 5 times to get to the root cause of a problem
• Fun Example from - http://startuplessonslearned.blogspot.com/2008/11/five-whys.html
▫ why was the website down? The CPU utilization on all our front-end servers went to 100%
▫ why did the CPU usage spike? A new bit of code contained an infinite loop!
▫ why did that code get written? So-and-so made a mistake▫ why did his mistake get checked in? He didn't write a unit test for the
feature▫ why didn't he write a unit test? He's a new employee, and he was not
properly trained in TDD
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 33: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/33.jpg)
Def. – indulge in something excessively: to take pleasure or be immersed in something in a self-indulgent way
![Page 34: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/34.jpg)
Insert Bug Story Videos
![Page 35: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/35.jpg)
Five Whys Exercise
• Take 5-10 minutes• Use one of these bugs or one
of your own• Try the five whys and see if
you can find a root cause
![Page 36: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/36.jpg)
That’s time
One does not worry
about grace or dignity
![Page 37: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/37.jpg)
Time to Share
• Time for about 2 examples
• What about the 5 Whys worked for you
• Where did it fall short?
![Page 38: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/38.jpg)
Phase III: Data Analysis and Assessment(the Five Whys)
•Criticism of five whys▫Not reproducible across
individuals▫Shown that investigators tent do
stop a symptoms rather than root cause
▫Relies upon the investigators knowledge
![Page 39: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/39.jpg)
• Brief History - http://en.wikipedia.org/wiki/Ishikawa_diagram
▫ Developed by Kaoru Ishikawa in the 1960s
▫ One of the 7 basic quality management tools
• Can use with 5 whys
▫ Put each why off the first tree point
▫ Ask why for each one of these issues
▫ Keep going until you find one or more root causes
• Some industries have common causes mapped to the fishbone
▫ Original 4 Ms – Machine, Method, Material, Man power
▫ The 8 Ps (Used in Service Industry) – People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product
▫ Ken’s List – People& Training, Tools, Inspection and supervision, Pressure or Stress, Process & Accountability, Recognition & Awareness
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
Phase III: Data Analysis and Assessment(Fishbone Diagram)
![Page 40: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/40.jpg)
Pressure or Stress
Recognition & Awareness Process & AccountabilityTools
Inspection & SupervisionPeople & Training
Brownout across 3 largest datacenters
![Page 41: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/41.jpg)
• Deployment tool changes
▫ Warn but do not prevent multi-DC deployments
▫ Automatically generate rollback script
▫ Cross service monitors will cancel and roll back a bad deployment automatically • Process changes
▫ Deployment code review
▫ Deployment checklist
▫ Audits and Fire drills
Audited all alerts, escalation aliases and contact #s
Fire drill email and phone
• New Tools
▫ Per-Alert fault injection
• Recognition
▫ SWAT DRI team for most senior DRIs
![Page 42: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/42.jpg)
Fishbone Exercise
• Take 5-10 minutes• Have a handout for you• Use the same bug from the
five whys exercise
![Page 43: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/43.jpg)
That’s time
![Page 44: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/44.jpg)
Time to Share
• Time to share▫ Who did the same bug as
the five whys?
▫ Who did a different bug?
• What about the fishbone worked for you?
• Where did it fall short?
![Page 45: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/45.jpg)
Phase III: Data Analysis and Assessment(the Fishbone)
•Criticism of Fishbone▫Requires a lot of experts for
each branch
▫Cumbersome
![Page 46: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/46.jpg)
Phase V: Inform and Apply
• Host a Management Review
▫ Managers will like RCA more than bugs
▫ You are eliminating a problem not just finding it
• Implementation is a project, treat it that way▫ Assign Owners
▫ Build and Maintain Schedule
▫ Create a Feedback Loop
▫ Establish a Monthly Status Report
▫ Track and correct the corrective action
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 47: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/47.jpg)
Phase VI: Follow-up, Measurement, and
Reporting• More than Just
• Six Sigma type approaches• Longitudinal Analysis
▫ Draws from Longitudinal Data Analysis -http://gseacademic.harvard.edu/alda/
▫ Study Over Time • Develop failure types and risk areas/components• Inspect similar products/areas for baseline• Gather and inspect process data• Examine Data for Trends• Report out
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up, Measurement, and Reporting
![Page 48: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/48.jpg)
Def. – have huge amount of something: to have an ample or excessive supply of something
![Page 49: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/49.jpg)
RCA Pit and Pendulum
![Page 50: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/50.jpg)
Risks of Root Cause Analysis
• Begins with inadequate data
• Go after too much data too early
• Draws incorrect conclusion or makes invalid recommendations▫ Anyone experience this before
• Focus on the wrong set of defects
• Ends at the wrong level – too early or late
• Investment is not always predictable▫ Can be high cost with low ROI
• Over focus on data can detract from the story
![Page 51: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/51.jpg)
Benefits of Structured RCA Study
• Can start as small pilots
• Uses an identical process regardless of type, age or scope of defect
• Avoids repeat failures
• Can be the shortest path to determining and correcting causes of failure
• Lowers Maintenance Costs
• Builds a culture of ▫ Accountability
▫ Continuous Improvement
![Page 52: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/52.jpg)
Achieve Balance
• Full Blow RCA with large pattern analysis rarely meets ROI goals.
• Limit the scope▫ Few Data Sources
▫ Beware of the RCA Tax• Focus on Sentinel Events
▫ Provides opportunity for clear visible winds
▫ If it’s a bug that got away you’ll be doing a Post Mortem anyway
▫ Sentinel events provide an opportunity to change the dialogue
![Page 53: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/53.jpg)
I’ve had enough
![Page 54: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/54.jpg)
Telling a Tall Tale
![Page 55: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/55.jpg)
So why a focus on Bugs that got away
• Bugs that got away are Sentinel Events
• They are great stories▫ There is never an end to bugs
• Bug Stories are Organizational Knowledge
• Tribal Knowledge drives organizations
• Stories are powerful change enablers
![Page 56: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/56.jpg)
Stories Work!
Biographies
Allegories
![Page 57: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/57.jpg)
Gloves on the Boardroom Table
• The Heart of Change▫ Requires an emotional
component▫ What is more emotional than
“How could test miss this bug!”
• Not all change stories involve yelling
• Visual and tactile help too▫ Handout of “Gloves on the
boardroom table”▫ [email protected] “I love your idea. And you have
my permission.”
![Page 58: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/58.jpg)
Organizational Development
• I worked in Engineering Excellence▫ We were Performance Improvement organization
▫ Enterprise Change Management
• Let me bring in some OD concepts
![Page 59: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/59.jpg)
Knowledge Management (KM)
comprises a range of practices used in an organization to identify, create, represent, distribute and enable adoption of insights and experiences.
Such insights and experiences comprise knowledge, either embodied in
individuals or embedded in
organizational processes or practice.
http://en.wikipedia.org/wiki/Knowledge_management
![Page 60: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/60.jpg)
What are Organizations Made of?
PEOPLE
![Page 61: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/61.jpg)
What do people do?
Talk about stuff
![Page 62: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/62.jpg)
Tribal Knowledge
Institutional memory is a collective set of
facts, concepts, experiences and know-how held
by a group of people.
http://en.wikipedia.org/wiki/Institutional_memory
![Page 63: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/63.jpg)
Organizational StorytellingThe study of organizational storytelling, sometimes called
“Narrative Knowledge,” attempts to
recount events in the form of a storywithin the context of an organizationhttp://en.wikipedia.org/wiki/Organizational_Storytelling
![Page 64: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/64.jpg)
So, what is a bug story?
be part of the Organizational
Narrative Knowledge
that should…
![Page 65: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/65.jpg)
Springboard Story
• Very simple, very quick, very brief
▫ Think elevator ride
• Non-threatening
• Enables listener to visualize
• Catalyzes understanding
• Spark new stories in the mind
• Do not transfer large amounts of information
![Page 66: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/66.jpg)
Story Telling Tips
• Brain’s are not computers▫ Brain Movies – “The brain assembles perceptions by the
simultaneous interaction of whole concepts, whole images.”• The Central Movie – a country or organization
▫ Universal Principles – freedom, democracy, constitutional government
▫ Long-term goals – education, “life, liberty, pursuit of happiness”▫ Operating methods – free markets, due process, federal and state
governments• Capture the Audience
▫ “One time there was this bug we missed…”• 3D Story Telling pg 85-87
▫ Details (facts, information)▫ Dialogue (characters)▫ Drama (a bug that got away?)
Brain Movies, The Central Movie, and 3D Story Telling from“The Leader’s Voice”
![Page 67: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/67.jpg)
Our Last Exercise!
• Your own bug story in 10 minutes▫ Take 10 minutes outlining
your story▫ Goal is a 1-2 minute story Think short and tight
• Remember to▫ Hook the audience▫ 3D Storytelling – Details,
Dialogue, Drama▫ RCA – what change do you
want to convey?
![Page 68: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/68.jpg)
My Bug Story - Template• Title
• The Hook
• Details – Who, what, when, product/project
• Dialogue – Yelling, Crying, Funny?
• Drama – What is the tension? Anyone Fired?
• What were the Root Causes
• What did you change and why?
![Page 69: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/69.jpg)
That’s lunch time
![Page 70: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/70.jpg)
Time to Share
• 3 volunteers to come up and tell their bug story
![Page 71: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/71.jpg)
Resources• “The Leader’s Guide to Storytelling” by Steve Denning
▫ Resources – http://www.stevedenning.com/launchgifts.html
▫ Audio Interview - The knowledge-based organization: Using stories to embody and transfer knowledge http://www.storytellingwithchildren.com/2008/01/12/steve-
denning-the-knowledge-based-organization/
• “The Leader’s Voice” by Crossland & Clark▫ http://roncrossland.com/
• Defect Prevention Chapter 11 RCA▫ http://defectprevention.org
• “The Heart of Change” by Cr. John P. Kotter▫ Gloves story can be found on pages 11-12
http://www.linkageinc.com/pdfs/disl/KotterPG.pdf
![Page 72: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/72.jpg)
http://www.hwtsam.com
http://blogs.msdn.com/kenj
http://twitter.com/rkjohnstonChapter 14 (Software + Services Testing) from “How We Test Software at Microsoft” provided on conference CD courtesy of Microsoft Press
Ken Johnston – Microsoft STARWest 2009 Tutorial TJ
What We Can Learn from Big Bugs that Got Away
![Page 73: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/73.jpg)
Appendix• What follows are a series of slides to teach RCA.
• Some of the slides are integrated in this tutorial on Bugs that Got Away but not all.
![Page 74: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/74.jpg)
First we need a commonbaseline to work from
![Page 75: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/75.jpg)
Root Cause Analysis 300 Level• Two approaches to RCA
▫ Sentinel Event
▫ Pattern Analysis
• Formal RCA Program▫ When to do an RCA Study
▫ Staffing for Success
▫ Phases of an RCA Study
• The Pit and the Pendulum
▫ Risks of RCA
▫ Benefits of RCA
Based upon Ch. 11http://defectprevention.org
![Page 76: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/76.jpg)
RCA Sentinel Event
A sentinel event is defined by the Joint
Commission on Accreditation of Healthcare Organizations(JCAHO) as any unanticipated event in a healthcare setting
resulting in death or serious physical or psychological injuryto a person or persons,
http://en.wikipedia.org/wiki/Sentinel_event
![Page 77: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/77.jpg)
RCA – The Sentinel Event of Bugs
• Home Page of http://wsj.com
• Production Outage▫ I have a lot of these stories
• Security vulnerabilities
• The last bug taken before ship
• “How could we have missed this!”
• Big Bugs that Got Away
![Page 78: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/78.jpg)
RCA – Office 14 Sentinel Bug Process
• Why SharePoint as the repository▫ Attachments▫ Collaborating▫ Workflow▫ Reporting Dash▫ Wiki▫ Exchange contacts▫ Offline
• Simple Light Weight Approach• Focus on recall class bugs from O14 Beta 1
▫ Will need the answers anyway to get through triage▫ Usually logged in the bug but not easy to find or learn from▫ No consistent process across teams
• Develop a common template in Word• Track on a SharePoint site with some meta data
![Page 79: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/79.jpg)
Office 14 Root Cause “Template”• Tenets/Best Practices• History/Summary• Bugs
▫ Bug number(s)▫ Bug description
• Root Cause Questions▫ Would this get found in our Test Focus/Pass for this area?▫ When did it get broken?▫ Was ownership confused?▫ Would we have assumed that another team would have also seen it?▫ Would it have been reasonable to assume that the fix that caused the
regression would have broken this?▫ Would a code review have likely identified the issue?▫ Was there a partner team(s) involved?▫ Were there multiple PRs involved?▫ Was the feature "Hot" coming into the close of the milestone?
• Engineering Recommendations:▫ Recommendation(s)/Owners▫ 1.▫ 2.▫ 3.
![Page 80: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/80.jpg)
O14 Example Beta1 End Game
• Word: Japanese Indented Bullets when saved lose their indents▫ Repro:
Set Japanese to be your primary editing language
Create a bulleted list with indents
Save/Close/Re-open
Result: indents are gone
Expect: no loss of indents
▫ Happens with all docs created with that setting in 12 and 14
![Page 81: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/81.jpg)
O14 Example RCA Recommendations
• Engineering Recommendation:▫ Automate this case and use the code change to inform
other automation needed for this area (lists, styles, paragraph props)
▫ Ensure that ICTs dogfood the product
▫ Make new push for testers to use international settingsmore frequently, with an eye on Beta2 languages and risks associated with each language equivalence class – we’ll most likely drive a Mini-pass on all our features with this setting for Beta2
▫ Add this area to testing executed during regression checks on all style-related fixes.
![Page 82: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/82.jpg)
RCA Sentinel Bug Approach
• Big Bugs that got away are Sentinel Events
• On bug is indicative of other risk
• The more big bugs the more patterns
• Nothing to do with X-Men
![Page 83: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/83.jpg)
Formal RCA Program(Sentinel Events and Pattern Analysis)
• Started at any time during SDLC
• Often launched after a single expensive bug▫ Security vulnerabilities
▫ Production Outage I have a lot of these stories
• Can be Resource Intensive - so be deliberate
![Page 84: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/84.jpg)
Staffing for Success – RCA Study Analyst
• A Single Analyst or a Team▫ Could be you after today
• Senior with wide range of development process knowledge
• Component Level and System Level analysis• Work with all types - Development, Testing, Program
Management, Operations, Support▫ May include marketing and field personnel
• Skills▫ Defect and low-level code analysis▫ Efficiency Diagnosis▫ RCA Analysis and even understanding▫ Algorithm and metric development▫ Data analysis and presentation
![Page 85: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/85.jpg)
Phases of an RCA Program
1. Event Identification
2. Data Collection
3. Data Analysis and Assessment
4. Corrective Action
5. Inform and Apply
6. Follow-up, measurement and reporting
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 86: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/86.jpg)
Phase I: Event Identification
• The Sentinel Event▫ Bug that got away and customer found▫ Does not need to be a defect▫ One or multiple
• Often too many bugs to pick from▫ For an RCA program first establish
criteria for a sentinel event
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 87: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/87.jpg)
Phase I: Event Identification (Sentinel Event Criteria)
• Not all bugs will yield a true “root” cause• Focus on most severe/undesirable event
▫ “I remember this one bug…”• Risk based assessment criteria
▫ Severity▫ Risk of recurrence▫ Cost – actual and opportunity
Identify Sentinel Event
Criteria
Identify Data Channels
Route Single Event through
Process
Prepare Data & Map Fields (defect tracking system query)
Log Event in RCA Tracking
Database
Event to Analyze
Sentinel Event Data Chanel Loop
![Page 88: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/88.jpg)
Phase I: Event Identification (Data Chanel – Sources of Data)
• Defect and Test Case Management tracking system• Source code repository and Test code coverage
data• Voice of the Customer
▫ Product support and Customer or marketing data▫ Individual surveys and interviews
• Findings from previous RCA Studies• Crash data through Windows Error Reporting• Services have tickets and data center telemetry
▫ Client and Cloud testing session tomorrow
More about WER @ https://winqual.microsoft.com/
![Page 89: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/89.jpg)
Phase I: Event Identification(Tracking System)
• Prepare a list of Sentinel Events• Gather and Prepare the Preliminary Data• Route Single Event through Process• Create an RCA Tracking Database
Data Elements of RCA Tracking System
• Event or Study ID, Title & Dates
• Related Defect links
• Failure areas and Source Code
• Timeline of events before and after (vital for services)
• Team Contacts and Owners
• RCA Analysts and Contacts
• Expert Groups and Contacts
• Cause of defect and corrective action
• Survey Data and Results on effectiveness of corrective action
• Log Events in RCA system• Analyze events
• NOTE: Meta Data better suited for lists, documents and shares
![Page 90: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/90.jpg)
Phase II: Data Collection
• Use Common Sense and Trust Gut Feel▫ “Hey did you hear about the bug…”▫ “I heard BillG was doing a demon when…”
• Use a survey to gather additional data▫ Was this noticed and ignored▫ Is this a common error type▫ Could this have been prevented
• Gather common data on several sentinel events
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 91: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/91.jpg)
Phase II: Data Collection
• Windows Customized (Visual Studio Team System)▫ Part of Defect Tracking System▫ Connect to source code▫ Attachments▫ Collaborating▫ Workflow
Windows ezRCA Program
The Goal Reduce Defects Throughout the Product Cycle
The Questions •What type of defect?•What phase was the defect introduced?•What was the extent of the fix?•How long did it take to fix the defect?
The Source •Product Studio Extension (Per Bug Report)
Leverage Points •Distributed Workflow•Quick and Easy Data Collection•Aggregate Analysis and Trend Charts•Subcomponent-Level Data Also Available•Focus on Individual Improvement
• Windows Vista ran a full RCA program
• Windows 7 moved to ezRCA▫ Cut many of the
other data sources▫ Focus on meta data
around bugs
![Page 92: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/92.jpg)
Windows “ezRCA” Approach
Windows ezRCA Program
The Goal Reduce Defects Throughout the Product Cycle
The Questions •What type of defect?•What phase was the defect introduced?•What was the extent of the fix?•How long did it take to fix the defect?
The Source •Product Studio Extension (Per Bug Report)
Leverage Points •Distributed Workflow•Quick and Easy Data Collection•Aggregate Analysis and Trend Charts•Subcomponent-Level Data Also Available•Focus on Individual Improvement
![Page 93: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/93.jpg)
Windows EZ RCA DiagnosisAs is New
• Diagnosis is currently required for all bugs and defaults to NA
• This field should only be activated if the bug is resolved “Fixed” or “Won’t Fix”
• There should be no default value
• Change/combine Hardware & No HW to Hardware Issue
NOTE: Items in RED are new or changed
Assignment ErrorBuild ErrorConcurrency ErrorData Checking ErrorData CorruptionDoc ErrorEnvironment ErrorError Handling ProblemHardware IssueIgnored FailureIncorrect Program StateInterface ErrorMissing Method/FunctionLogic ErrorNot ApplicableOtherResource IssueSimple Coding ErrorSystem ErrorUser Misunderstanding
![Page 94: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/94.jpg)
Windows ezRCA Values
• Initial classification of root causes• Root cause helps us identify the nature of
the kinds of mistakes we are making• This will be a required field for Developers
when resolving a bug that is ‘Fixed’ or ‘Won’t Fix’
• This will be a single-select dropdown list and developers will be expected to select the item that is most applicable
• This field is not intended to replace deep RCA studies and more information will likely be required based on analysis of this data
• For gathering further information, use the Prevention Tab, Test Follow-up Tab, and Bug Analysis Tabs in Product Studio or Soapbox (NOTE: Much of this will be consolidated in the future)
![Page 95: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/95.jpg)
Windows Additional RCA data• Symptom and Prevention categorization
• Link to more info
• Anonymous submission
![Page 96: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/96.jpg)
ezRCA Pivot Points
ezRCA
• Data on Lots of Bugs
• Few Questions & Answers
• Quick, Easy
• Fully Distributed
Traditional RCA
• Data on Select Fixed Bugs
• Detailed Analysis of Defect
• Multiple-Data Sources
• Significant Investment
• Can be Resource-Limited
![Page 97: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/97.jpg)
Phase II: Data Collection Keys to
Success• For Sentinel Events open template is fine• For ezRCA Extend bug tracking system with ezData
Collection▫ Keep system light weight▫ Limit required fields▫ Provide opportunity to expand within bug
• For Formal RCA will need multiple data sources and extensible schema
• Recommend you start with Sentinel Events and progress to a formal program
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 98: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/98.jpg)
Keep going with formal RCA
• Some tools you can use with Sentinel Events and ezRCA
• What good tester doesn’t make you wallow in the details.
![Page 99: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/99.jpg)
Phase III: Data Analysis and Assessment
• Analysis Performed by▫ RCA Team▫ Research Team▫ Related experts
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
• Log all outputs in RCA System• Be judicious with Experts
time
![Page 100: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/100.jpg)
Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone)
Good article from ASQ –http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.html
![Page 101: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/101.jpg)
Phase III: Data Analysis and Assessment(the Five Whys)
• Brief History - http://en.wikipedia.org/wiki/5_Whys
▫ Developed by Sakichi Toyoda▫ First used in Toyota (Kaizen), Six Sigma tool
• What is it▫ Simply put - ask why 5 times to get to the root cause of a problem
• Fun Example from - http://startuplessonslearned.blogspot.com/2008/11/five-whys.html
▫ why was the website down? The CPU utilization on all our front-end servers went to 100%
▫ why did the CPU usage spike? A new bit of code contained an infinite loop!▫ why did that code get written? So-and-so made a mistake▫ why did his mistake get checked in? He didn't write a unit test for the feature▫ why didn't he write a unit test? He's a new employee, and he was not properly
trained in TDD• Criticism of five whys
▫ Not reproducible across individuals▫ Shown that investigators tent do stop a symptoms rather than root cause▫ Relies upon the investigators knowledge
![Page 102: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/102.jpg)
Phase III: Data Analysis and Assessment(the Five Whys)
• Brief History - http://en.wikipedia.org/wiki/5_Whys
▫ Developed by Sakichi Toyoda▫ First used in Toyota Motor Corporation▫ Common tool within Kaizen, Lean Manufacturing & Six Sigma
• What is it▫ Simply put - ask why 5 times to get to the root cause of a problem
• Fun Example from - http://startuplessonslearned.blogspot.com/2008/11/five-whys.html
▫ why was the website down? The CPU utilization on all our front-end servers went to 100%
▫ why did the CPU usage spike? A new bit of code contained an infinite loop!
▫ why did that code get written? So-and-so made a mistake▫ why did his mistake get checked in? He didn't write a unit test for the
feature▫ why didn't he write a unit test? He's a new employee, and he was not
properly trained in TDD
![Page 103: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/103.jpg)
• Brief History - http://en.wikipedia.org/wiki/Ishikawa_diagram
▫ Developed by Kaoru Ishikawa in the 1960s
▫ One of the 7 basic quality management tools
• Can use with 5 Whys
▫ Put each why off the first tree point
▫ Ask why for each one of these issues
▫ Keep going until you find one or more root causes
• Some industries have common causes mapped to the fishbone
▫ Original 4 Ms – Machine, Method, Material, Man power
▫ The 8 Ps (Used in Service Industry) – People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product
▫ Ken’s List – People, Process, Tools, Accountability, Training, Recognition and awareness, Inspection and supervision, Pressure or Stress
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
Phase III: Data Analysis and Assessment(Fishbone Diagram)
![Page 104: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/104.jpg)
Trending Per-Subcomponent
• Trends Matter
▫ Uptick Warrants More Investigation?
▫ Perform a Traditional RCA for That Set of Events
• Profile
▫ The State of the Code
▫ Personal Improvements
▫ Identify Key Events
Last 5 Weeks
![Page 105: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/105.jpg)
Analysis is not yet at solutions
• Five Whys and Fishbone Diagram help get to root causes
• Data and trending can provide timely alerts and catches regressions
• Root causes are then analyzed for corrective actions
![Page 106: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/106.jpg)
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
Phase III: Analysis is not the solution(Fishbone Diagram)
• Five Whys and Fishbone Diagram are tools to get to root causes
• Data and trending of bugs can provide timely alerts and catches regressions
• Root causes are then analyzed for corrective actions
![Page 107: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/107.jpg)
Phase IV: Corrective Actions
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
• Identify Trends and Group Them into Corrective Themes
▫ May be solutions related to Fishbone Diagram mapping buckets
• Meet with the experts again
▫ Remember my warning not to burn out your experts
• Determine Prioritization Factors and Costing for Corrective Actions
▫ Consider Return on Investment (ROI) Should have capture direct cost and opportunity cost during Data Collection
▫ Speed to implement
▫ Likelihood of solution being highly effective
▫ Simplicity of solution
▫ Is the solution automatable or process driven
![Page 108: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/108.jpg)
Bug Wallow #3: Our Corrective
Actions
•Email and Provisioning used Production Data
•Both sanitized the data
•Both impacted production
•What did we change?▫ Stress Tests have no Internet Access
▫ Sanitized Date Diff feature
![Page 109: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/109.jpg)
Phase V: Inform and Apply
• Host a Management Review
▫ Managers will like RCA more than bugs
▫ You are eliminating a problem not just finding it
• Implementation is a project, treat it that way▫ Assign Owners
▫ Build and Maintain Schedule
▫ Create a Feedback Loop
▫ Establish a Monthly Status Report
▫ Track and correct the corrective action
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up,
Measurement,and Reporting
![Page 110: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/110.jpg)
Phase VI: Follow-up, Measurement, and
Reporting• More than Just
• Six Sigma type approaches• Longitudinal Analysis
▫ Draws from Longitudinal Data Analysis -http://gseacademic.harvard.edu/alda/
▫ Study Over Time • Develop failure types and risk areas/components• Inspect similar products/areas for baseline• Gather and inspect process data• Examine Data for Trends• Report out
Event Identification
Data Collection
Data Analysis and
Assessment
Corrective Actions
Inform and Apply
Follow up, Measurement, and Reporting
![Page 111: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/111.jpg)
Flatonium 2007
• Need to insert video
• 20 new machines added to the data center
• 5 machines put into production early
• Machines needed to be Nuked-N-Paved (NNP)
• Oops
![Page 112: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/112.jpg)
RCA Pit and Pendulum
![Page 113: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/113.jpg)
Risks of Root Cause Analysis
• Begins with inadequate data
• Go after too much data too early
• Draws incorrect conclusion or makes invalid recommendations▫ Anyone experience this before
• Focus on the wrong set of defects
• Ends at the wrong level – too early or late
• Investment is not always predictable▫ Can be high cost with low ROI
• Over focus on data can detract from the story
![Page 114: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/114.jpg)
Benefits of Structured RCA Study
• Can start as small pilots
• Uses an identical process regardless of type, age or scope of defect
• Avoids repeat failures
• Can be the shortest path to determining and correcting causes of failure
• Lowers Maintenance Costs
• Builds a culture of ▫ Accountability
▫ Continuous Improvement
![Page 115: Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010](https://reader034.vdocuments.site/reader034/viewer/2022042715/5593168a1a28abe27b8b45c7/html5/thumbnails/115.jpg)
I’ve had enough