lessons learned in defect triage
TRANSCRIPT
-
8/6/2019 Lessons Learned in Defect Triage
1/3
Lessons Learned in Defect Triage
By Michael Kelly
Date: Feb 23, 2009
Return to the article
Michael Kelly shares experiences from a project team whose development process implemented some sweeping changes, with significant
improvements (and a few missteps) along the way.
On a recent project, I joined a team that was struggling to deliver reliable software on time. The team consisted of around 20 developers
and testers, mostly collocated, and was going through a lot of changes at the time I came on board. The biggest change was a switch to
Scrum as the team's development process. In the new process, the programmers and testers would work from two sources: New
features would be developed from the product backlog, and defects and issues would be worked from the defect triage process.
Because we changed that process, we were required to change many processes. We knew immediately that we needed to tackle the way
in which we managed defects. I t was a mix of different project databases, different ticket workflows, different release schedulesand no
formal way to track it all. In short, we had a mess.
Major, sweeping process changes, while ideal in a situation like this, can be painful and timeconsuming. We didn't have the time to stop,
restructure everything, and train everyone on a new process. We had a large number of commitments, some of which were already
behind schedule, and had to keep everyone focused on delivery. Therefore, we decided to start planning small incremental changes that
we could roll out over time. While that was going on, we would implement defect triage meetings on a projectbyproject basis. This
article describes how we made those defect triage meetings effective.
Geng the Right People in the Room
Two primary factors drove our team's triage process:
Ticket priority (How important is it?)
Ticket severity (How much pain does it cause?)
From time to time, estimates for how long a fix might take could play a role. Other factors also came into play on occasion: a deadline
commitment, dependency on another issue, availability of a resource with specialized knowledge, work already taking place on similar or
related issues, etc. To define priorities and severities efficiently, we needed to get the right people in the room. For this team, those people
consisted of the following:
The various project managers for each of our clients
The technical leadership for the development team
Select representatives from the quality assurance team
One area of struggle was in finding representation for some of our "faceless" stakeholders: internal endusers and the technical
operations department. Both of those teams had a hard time getting regular attendees at the meetings. From time to time, we would ask
quality assurance team members to act as representatives for those missing stakeholders.
The goal in selecting the audience is to keep it small (no more than 510 people), but with enough people to make the right decisions. We
also wanted to build a team of people who understood the deadlines and commitments, knew how to assess severity and impact, and
could estimate at a high level the work that was needed to handle an issue. The team we assembled would determine which issues were
fixed first, and had to have enough credibility and authority to ensure that decisions made at the meetings were carried out.
Once we knew who the right people were, we set a meeting schedule that worked for everyone. If you hold meetings too often, people
won't attend. If meetings are too infrequent, the team won't be effective. We decided to meet four days a week, alternating our meetings
between clientfacing issues and platformfacing issues. Because each of these sets of issues pulled a slightly different audience, some
people needed to meet four times a week, and others needed to attend only two meetings a week. We also found that scheduling the
meeting at the same time each day reduced confusion.
Managing the Process
Once we had the right people in the room, the next step was to figure out how we were all going to work together, finding a way to
es http://www.informit.com/articles/printerfriendly.aspx?p=
11/8/2009
-
8/6/2019 Lessons Learned in Defect Triage
2/3
balance the clientfacing issues and the core platform issues. The first thing we did was work to get consistency across project databases.
We needed all the projects to use the same workflows, track releases in the same way, and use automated "top ten" lists that populated
from different projects. Those topten lists became the primary view of relative priority across projects. For example, we created a client
topten and a platform topten. The client topten looked across all the clientfacing projects, and the platform topten looked across all
the technicalfacing projects that might affect internal operations or multiple clients.
With the lists set up, the next step was to clear out the work in the developers' personal queues. Because a developer might have a
number of tickets currently assigned without regard to relative priority, we asked everyone to pool any tickets not actively being worked.
Then, when each developer was ready for the next ticket, he or she would simply pull from a topten list, regardless of which individual
project would normally get his or her focus.
Once a ticket is in a topten list, you have to think about where it goes next. The same team that populated the topten lists also planned
for how those tickets got out to production. One technique we found helpful was to tie the priority of a ticket to the release schedule. We
started with some simple heuristics: A blocker issue equated to a hotfix, a critical issue would go in the next scheduled release, and
everything else (major, minor, trivial, etc.) would be worked and slated for release as resources allowed.
The team that reviewed the tickets on a regular basis was also the team that managed any issues resulting from investigation. They
coordinated getting work done across teams. They also determined which tickets qualified for rejection or were returned for more
information. By escalating issues as needed, this group helped resolve blocking issues for the people who were working the tickets.
Geng the Message Out
Regardless of how you decide to manage your issues (topten lists, assigning tickets out to individual developers, bruteforce
spreadsheets, and so on), everyone needs to know how you're going to communicate what the issues are and what progress is being
made. You need to deliver information that's simple to understand and difficult to ignore. I got in the habit of sending daily email
messages to the product development team, summarizing the topten lists and ticket progress.
Another important aspect to communicate is the upcoming releases. We had a number of small production releases each week, and a
large release every couple of months. The development staff found it helpful to know the upcoming key dates. On a whiteboard central to
all the product development team locations, we kept a calendar showing the next two months. On the calendar, we posted key dates such
as initial release candidates, code freeze, customer useracceptance testing (UAT) windows, and release dates. At a glance, a developer
would know not only what the issues were, but when they needed to be completed.
Using Metrics to Improve the Process
Once we began the triage process, we were able to collect data on how the process was working. We started by tracking where
development time was spent. Each developer would log his or her time by ticket, which made it easy for us to see time spent by project,
by priority, by ticket type, and by release. This feedback mechanism let us know which projects were getting the most attention and which
were being slighted.
We looked at the number of tickets created or resolved in a project, how many commits were done for a project, and the number of
developers working tickets in a project. By examining where the activity was taking place, we were able to better understand where we
might need to focus refactor efforts, spend more attention on testing, or be more careful when pulling together final release information.
We also examined what we were not doing. We looked at the data to understand where the least amount of churn was. We looked at
ticket aging reports. We looked at projects that didn't get many hours logged. In the short term, when focusing on triage, such decisions
may make sense. But we didn't want to get trapped into shortterm thinking. We also wanted to be sure that ignoring certain areas of the
code or certain projects was a conscious decision, not an accident.
Since our development team was using Scrum, at the end of each sprint (well, most sprints), we also tried to provide details about whichissues were resolved in that sprint. At the same time we showed off features, we tried to show how much work was done to resolve
issues from production, or issues that got pulled into the sprint via the triage process. We provided process metrics to make sure that we
were focused on the right problems.
Where to Go from Here
Our team and our process still aren't perfect, but we've learned a lot, and we'll continue to improve as we learn more. Sometimes we
don't get the meeting attendance we want, but we've found that problem to be cyclical. People attend for a while and then stop coming.
When their tickets are no longer getting attentionsurprise, surprisethey attend again.
As we work this plan, we continue to make small changes in the overall processes of software development and defect management. We
try to use the information we gather to inform our decisions. As you think about your own defectmanagement process, I hope that some
es http://www.informit.com/articles/printerfriendly.aspx?p=
11/8/2009
-
8/6/2019 Lessons Learned in Defect Triage
3/3
of these ideas help.
Daily Defect Meetings
Our daily defect meetings had the following format:
Review the current items in the topten list(s):
Are any issues not making progress? Are the appropriate people involved in working the issue?a.
Are there any blockers around working the issues that need to be escalated?b.
Have circumstances changed in any way that would cause us to drop any of these issues from this list?c.
1.
Review new items created in any of the projects that feed the list(s):a. Review the tickets created since the last meeting from each project database feeding the lists you're reviewing.a.
b. If appropriate, update priority and severity as a team. Be sure to capture key comments about priority and severity in
the ticket.
b.
c. Pull any new tickets into the list as appropriate.c.
2.
Review the most critical outstanding existing items from the projects that feed the list(s):
When there are no new issues to pull in, and the list has room for new items, look at the highestpriority tickets from each
feeding project database.
a.
Ask attendees if any tickets within their projects need focus, but the ticket's priority may not appropriately represent the
urgency of the issue.
b.
Pull tickets into the list as appropriate, based on feedback and discussion by the team.c.
3.
Discuss any other issues related to the projects that the team may need to know.4.
If a topten list has more than 10 tickets, ask the team whether that's okay. Sometimes it is (related tickets, requirements for an
upcoming release, and so on), but make sure that the team knows why the buffer was extended and is clear on what that means.
Try not to let it happen too often.
5.
2009 Pearson Education, Inc. All rights reserved.
800 East 96th Street Indianapolis, Indiana 46240
es http://www.informit.com/articles/printerfriendly.aspx?p=
11/8/2009