benefiting from a quality problem management program
DESCRIPTION
IT can have immediate benefits throughout the service and support organization, yet many organizations still struggle with creating / maintaining an effective Problem Management process and group. Pete’s presentation covers: 1.The discipline of creating relevant categorization / taxonomy (incident typing / grouping) 2.Formal and preferred Root Cause Analysis methodologies 3.Integrating Problem / Change management to execute the recommendation / long-term solution. 4.Continuous measuring / reporting / marketing of the ‘actual’ eliminated calls 5.Best practices of ‘getting into the detail’ and what to ‘Get out of the detail’TRANSCRIPT
“Benefiting from a Quality
Problem Management
Program”
“Benefiting from a Quality
Problem Management
Program”
Eliminating recurring problems from impacting the organization, employee productivity and
increasing the total cost of support.
Peter McGarahan
President / Founder
McGarahan & Associates
About The Speaker
• 12 years with PepsiCo/Taco Bell IT and Business Planning
• Managed the Service Desk and all of the IT Infrastructure for 4500 restaurants, 8 zone offices, field managers and Corporate office
• 2 years as a Product Manager for Vantive
• Executive Director for HDI
• 6 years with STI Knowledge/Help Desk 2000
• Founder, McGarahan & Associates (7 years)
• McGarahan & Associates delivers service and support best practice consulting delivered through assessment / findings / recommendations / continuous improvement roadmap.
• Retired Chairman, IT Infrastructure Management
22
Pay It Forward
33
“PLAN THE WORK & WORK THE PLAN
44
A quality Problem Management program can reap many benefits throughout the service and support organization as well as the
business.
Many organizations still struggle with creating/maintaining an effective a Problem Management program and process.
Problem Management greatly depends on corporate culture and the integration of tools, process and people group to generate
measurable and consistent results. It takes patience, discipline, collaboration and analytical expertise.
Data Analysis and Trend Reporting are critical to the quality of the Problem Management program, but it is the resulting action,
measurable benefits and communication to all stakeholders that defined the value of the PM program.
Problem Management – Break the Cycle!
App.Dev.
DBAsNetwork Techs.
ITAP Desktop Support
Service Desk
Know Call Types.
Right Work, Right People & Right Reasons.
Provide Resolutions Close to Customer.
Reduce Customer Contacts.
Make Knowledge Work.
Speed to Resolution.
6
Design customer-centric
services that improve service
delivery (service level
management), enhancing the
customer experience through
organizational flexibility, tool
integration, process efficiency &
people effectiveness.
Service Strategy & Design - A Fresh Start
Deliver service excellence
Develop business aligned goals & objectives
Create a service strategy
7
Mean time to resolution
Cost
Technologist/DevelopersTechnologist/Developers
Categorize CallTypes in Level they are Resolved in
Categorize CallTypes in Level they are Resolved in
Escalated callEscalated call
Call EliminationCall Elimination
Automated self-serviceAutomated self-service
First contactresolutionFirst contactresolution
Cost s
avings, SLA & Cust.
Sat improvements
Cost s
avings, SLA & Cust.
Sat improvements
Level-3Self-service Level-2Level-1
Bring visibility to repetitive, costly issues, questions and requests
$100 +$100 +
$50-$75$50-$75
$18-$23$18-$23
$5-$10$5-$10
Characteristics: Business-focused, Virtual, On-demand, Cost-effective, Responsive, Predictable, Consistent and Adaptive
Exiting Customers
Employees
Partners
New Customers
1-800.HELPM
Ew
ww
.HELPM
E.COM
HELP@
YOU
RCOM
PANY.CO
M
Cust
omer
Sup
port
Cen
ter
Level-2Specialist
IS OPS Services
Level-2Specialist
Network & Technical Services
Level-2Specialist
QA/Security
Level-2SpecialistWeb/DB/
Level-3Vendor Support
Applications
Level-2Specialist
CRM / Architecture
Level-2Specialist
Business SystemServices
Level-2Specialist
Bus. Analysis & Process Services
Level-2Specialist
Business Portal Services Level-3
Vendor SupportInfrastructure
Level-3Vendor Support
Programs
SLAOLA
UC
Reduce / EliminateDispatch
Reduce Escalations
Increase First ContactRes (FCR)
Call Elimination / Self-service
The “Shift Left” Structure
• Require Analysts to search the KB before escalating to L2.• Target call types for FCR that are currently being escalated.• Target FCR for Self-service Portal.• Analyze incidents with no Knowledge for PM / RCA.• Work with L2 managers to provide training, access, and knowledge.
Source: HDI Practice & Salary Guide / Gartner Research
Know Your Cost
Either the time will be consumed with the “groundhog day”
approach to fighting fires,
Or spent in a structured and organized manner identifying
the true and underlying cause(s) of problems that generate
calls and impact business and eliminate the cause thus
preventing future reoccurrences.
Step1: Finding the root cause of issues, documenting and communicating the “work-arounds” to Level-1 & Level-2.
Step 2: Indentifying / Justifying a permanent solution or “fix” targeted at the problem’s root cause, approved by Change
and Released into the production environment eliminating the incidents / calls caused by the problem.
How the Service Desk Adds Value:– Focus on accurate and complete incident record logging.
– Categorize / Prioritize accurately for easy reporting.
– Flag the escalated incidents with no workaround / knowledge.
– Analyze the high priority (business impact) incidents, the most frequently escalated and the longest Mean Time to Resolve (MTTR) as PM targets.
What To Do, Simply!
INC
LO
GIT
T
ec
hs
Pr o
b
Ow
ne
rR
CF
A a
dm
in.
IT M
gm
t.
NWF (Not Worth Fixing)
Y
N
Change Mgmt.
Incident Mgmt.
Approved (and funded)
Domain mgr. rates candidate
RCFA bus impact and
cost to fix as "large, med. and small"
Assign reactive
RCFAs at daily ops. meeting
ConsolidateRCFAspreparemetrics
Create parent record incident history
Create template
RCFAs forcandidates
with high incident rates
Assembles team(weekly RCFA update req'd)
Establishbus
impact
Post work- around (if
applicable)
Root causedetermination
Define fix
Submitchange
Executechange
Verifychange
successful
End
Close incident or route
Problem EliminationBoard (PEB)
1. Review metrics2. Select candidate RCFAs and assign
to prob. owner3. Investigate errant
RCFAs or domains4. Ensure resource bandwidth is utilized
Incidentrecords
Start
Submit intuitive RCFAs
Agent may submit intuitive RCFA
Newincident
DomainMgr.
Link incidentto RCFA
Enact work-around
(if posted)
Match to open RCFA
Root Cause Analysis
Reduce Downtime• Escalation
• Customer communication
• Documentation of workaround
Call Avoidance• Root cause analysis
• Trend analysis
• Problem review board
Who in the IT organization is responsible for critical problem
processes and resolution?
0%
10%
20%
30%
40%
50%
60%
IT S
ervi
ce
Des
k
IT O
pera
tions
Proc
ess
Team Oth
er
Res
pons
ibilit
y no
t for
mal
ized
2008
2009
1414
Purposeful Problem Management
1. Develop guidelines for escalating a problem.
2. Ensure someone is in charge.
3. Create a problem management team with stakeholders from the service desk, operations, app. dev. and business.
4. Schedule regular meetings with team to review outstanding problems.
5. Develop a plan for calling emergency meetings.
6. Designate a "war room" or audio bridge.
7. Coordinate who will contact the customer with status, and how often.
8. Develop a severity coding system.
9. Keep an updated "on-call" list.
10.Hold a postmortem.
Develop A Response!
Do you know…….The desired end-result?
What works well?
What doesn’t, why not?
If you know, then what are you doing about it?
Find Out …….Who is Calling?
Why they are Calling?
Who resolves their Issues?
How long (AHT / Effort / Resolve) does it Take?
Take Action!
17
• Categories grow organically
– It’s easy to add a new one - even with change management.
– Categories are not organized and consistent.
– Don’t remove old or irrelevant categories.
– What ends-up in a categorization schema “decision-tree” may not reflect the true nature of the incident / problem.
• Categories should drive reporting
– It is challenging to query and design reports for root cause analysis when you are sure of what to ask but not confident in what you get back.
• Categories try to be everything at once / end-up being nothing at all
– Intentions are good, but relevance / simplicity to the support professionals who are tagging the incident is absent.
• Categories levels when linked should tell the story
– Not knowing root cause, how resolved and what the symptoms are as expressed by the customer leaves the problem manager literally blind when querying data for analysis / action.
Categorization / Taxonomy
• Histograms that summarize recurring problems.
• Focus efforts on the most frequent issues.– Problems coded incorrectly appear as individual and infrequent events not
registering on the radar screen and not given further attention (RCA).
– These seemingly unrelated issues, could be traced to a small number of common root causes, and if grouped together, would have shown up as a high frequency problem and therefore given higher priority.
1818
80% of Call Volume / 20% Call Types
• The creation of the categorization schema takes into account the following perspective:
– The customer (REPORTER).
– The Level-1 Analyst (RECORDER / RESOLVER).
– The Level-2 / Level-3 technician (RESOLVER).
• Good Categorization leads to effective Root Cause Analysis which will provide insights / action / results in the following areas:
– Trend Analysis.
– Knowledge Management.
– Problem Prevention.
– High Impact Training.
– Call Avoidance / Elimination Methodologies.
– Feedback loops to Help Desk training and Client Educational Programs.
A Matter of Perspective!
• Know the Impact– Tech vs. Non-Tech
– New vs. repetitive
– High call volume / High talk time
• Have a plan– Direct to Self-service
– Publish Knowledge Articles
– Improve Training
– Route to Problem Mgmt
– Improve diagnostic / trouble-shooting skills / tools
• Measure Impact– Take baseline measurements
– Measure actual
– Report progress / impact / reduction
Targeting Call Types
• Review historical data for persistent and common problems (categorization / RCA).
• Observe patterns of temperamental infrastructure equipment (Configuration).
• Ask 1st, 2nd, 3rd level analysts and business where theysee patterns (focus / debriefs).
• Communicate to employees well-known issues and fixes (self-service).
• Develop strong processes (change and release management).
Call Avoidance (Deflection / Elimination)
• Develop measurements for every component of SLA.
• You can't improve what you can't measure.
• Use historical measurements to observe trends.
• Communicate SLA metrics and measurements to business.
SLA = MTTR in 8 hours or less for 90% of requests
Resolutions that met SLA
Resolutions that didn’t meet SLA
Week 1
Week 2
Week 3
95%
98.5%
70%
Focus on What Matters!
Time to Repair
# o
f P
rob
lem
s Focus onOutliers
?1
WHY?
Fact with…
Analysis, Action
2323
Actionable Reporting
Work to establish measurable business value credibility.
1. Lower the total cost of ownership of all services
• Build them with serviceability, usability and maintainability in the design of all new applications, systems and services).
2. Increase business value
• Achieve business benefits (lower operational costs, increased revenues, improved customer experience).
3. Minimize business impact
• Reduce change-related outages / incidents.
• Reduce number of problems / incidents / calls.
• Reduce the number of requests / training-related calls / inquiries.
• Speed to resolution based on business prioritization model.
4. Improved and frequent Communication (Launch / On-going)
Create shared ‘devops’ goals / objectives.In the end, these should help us identify, link and realize how to translate IT objectives / metrics into tangible business benefits /
value.
Detect problems and trends–Detecting recurring problems, analyzing trends, and identifying areas in need of improvement.
Prevent problems (elimination)–Performing Root Cause Analysis, determining the source of the problem, will provide long-term prevention. Preventing 10% of problems is the same as solving 80% of all problems immediately.
Re-Solve issues–When issues are solved quickly and efficiently, productivity increases.
Position for Self-Service (deflection)–Even though FCR is a great metric – it still says we are solving a high percentage of repetitive calls – over and over again!
–Position for self-service based on issue, question and audience.
In Summary!
"Being a Service Leader is about positively
impacting the world around you! It’s not
about you, it's about all that you can do to make other people successful.“
Thank You!Pete McGarahan
McGarahan & [email protected]
714.694.1158
26