lessons learned in defect triage

8/6/2019 Lessons Learned in Defect Triage

1/3

Lessons Learned in Defect Triage

By Michael Kelly

Date: Feb 23, 2009

Return to the article

Michael Kelly shares experiences from a project team whose development process implemented some sweeping changes, with significant

improvements (and a few missteps) along the way.

On a recent project, I joined a team that was struggling to deliver reliable software on time. The team consisted of around 20 developers

and testers, mostly collocated, and was going through a lot of changes at the time I came on board. The biggest change was a switch to

Scrum as the team's development process. In the new process, the programmers and testers would work from two sources: New

features would be developed from the product backlog, and defects and issues would be worked from the defect triage process.

Because we changed that process, we were required to change many processes. We knew immediately that we needed to tackle the way

in which we managed defects. I t was a mix of different project databases, different ticket workflows, different release schedulesand no

formal way to track it all. In short, we had a mess.

Major, sweeping process changes, while ideal in a situation like this, can be painful and timeconsuming. We didn't have the time to stop,

restructure everything, and train everyone on a new process. We had a large number of commitments, some of which were already

behind schedule, and had to keep everyone focused on delivery. Therefore, we decided to start planning small incremental changes that

we could roll out over time. While that was going on, we would implement defect triage meetings on a projectbyproject basis. This

article describes how we made those defect triage meetings effective.

Geng the Right People in the Room

Two primary factors drove our team's triage process:

Ticket priority (How important is it?)

Ticket severity (How much pain does it cause?)

From time to time, estimates for how long a fix might take could play a role. Other factors also came into play on occasion: a deadline

commitment, dependency on another issue, availability of a resource with specialized knowledge, work already taking place on similar or

related issues, etc. To define priorities and severities efficiently, we needed to get the right people in the room. For this team, those people

consisted of the following:

The various project managers for each of our clients

The technical leadership for the development team

Select representatives from the quality assurance team

One area of struggle was in finding representation for some of our "faceless" stakeholders: internal endusers and the technical

operations department. Both of those teams had a hard time getting regular attendees at the meetings. From time to time, we would ask

quality assurance team members to act as representatives for those missing stakeholders.

The goal in selecting the audience is to keep it small (no more than 510 people), but with enough people to make the right decisions. We

also wanted to build a team of people who understood the deadlines and commitments, knew how to assess severity and impact, and

could estimate at a high level the work that was needed to handle an issue. The team we assembled would determine which issues were

fixed first, and had to have enough credibility and authority to ensure that decisions made at the meetings were carried out.

Once we knew who the right people were, we set a meeting schedule that worked for everyone. If you hold meetings too often, people

won't attend. If meetings are too infrequent, the team won't be effective. We decided to meet four days a week, alternating our meetings

between clientfacing issues and platformfacing issues. Because each of these sets of issues pulled a slightly different audience, some

people needed to meet four times a week, and others needed to attend only two meetings a week. We also found that scheduling the

meeting at the same time each day reduced confusion.

Managing the Process

Once we had the right people in the room, the next step was to figure out how we were all going to work together, finding a way to

es http://www.informit.com/articles/printerfriendly.aspx?p=

11/8/2009


2/3

balance the clientfacing issues and the core platform issues. The first thing we did was work to get consistency across project databases.

We needed all the projects to use the same workflows, track releases in the same way, and use automated "top ten" lists that populated

from different projects. Those topten lists became the primary view of relative priority across projects. For example, we created a client

topten and a platform topten. The client topten looked across all the clientfacing projects, and the platform topten looked across all

the technicalfacing projects that might affect internal operations or multiple clients.

With the lists set up, the next step was to clear out the work in the developers' personal queues. Because a developer might have a

number of tickets currently assigned without regard to relative priority, we asked everyone to pool any tickets not actively being worked.

Then, when each developer was ready for the next ticket, he or she would simply pull from a topten list, regardless of which individual

project would normally get his or her focus.

Once a ticket is in a topten list, you have to think about where it goes next. The same team that populated the topten lists also planned

for how those tickets got out to production. One technique we found helpful was to tie the priority of a ticket to the release schedule. We

started with some simple heuristics: A blocker issue equated to a hotfix, a critical issue would go in the next scheduled release, and

everything else (major, minor, trivial, etc.) would be worked and slated for release as resources allowed.

The team that reviewed the tickets on a regular basis was also the team that managed any issues resulting from investigation. They

coordinated getting work done across teams. They also determined which tickets qualified for rejection or were returned for more

information. By escalating issues as needed, this group helped resolve blocking issues for the people who were working the tickets.

Geng the Message Out

Regardless of how you decide to manage your issues (topten lists, assigning tickets out to individual developers, bruteforce

spreadsheets, and so on), everyone needs to know how you're going to communicate what the issues are and what progress is being

made. You need to deliver information that's simple to understand and difficult to ignore. I got in the habit of sending daily email

messages to the product development team, summarizing the topten lists and ticket progress.

Another important aspect to communicate is the upcoming releases. We had a number of small production releases each week, and a

large release every couple of months. The development staff found it helpful to know the upcoming key dates. On a whiteboard central to

all the product development team locations, we kept a calendar showing the next two months. On the calendar, we posted key dates such

as initial release candidates, code freeze, customer useracceptance testing (UAT) windows, and release dates. At a glance, a developer

would know not only what the issues were, but when they needed to be completed.

Using Metrics to Improve the Process

Once we began the triage process, we were able to collect data on how the process was working. We started by tracking where

development time was spent. Each developer would log his or her time by ticket, which made it easy for us to see time spent by project,

by priority, by ticket type, and by release. This feedback mechanism let us know which projects were getting the most attention and which

were being slighted.

We looked at the number of tickets created or resolved in a project, how many commits were done for a project, and the number of

developers working tickets in a project. By examining where the activity was taking place, we were able to better understand where we

might need to focus refactor efforts, spend more attention on testing, or be more careful when pulling together final release information.

We also examined what we were not doing. We looked at the data to understand where the least amount of churn was. We looked at

ticket aging reports. We looked at projects that didn't get many hours logged. In the short term, when focusing on triage, such decisions

may make sense. But we didn't want to get trapped into shortterm thinking. We also wanted to be sure that ignoring certain areas of the

code or certain projects was a conscious decision, not an accident.

Since our development team was using Scrum, at the end of each sprint (well, most sprints), we also tried to provide details about whichissues were resolved in that sprint. At the same time we showed off features, we tried to show how much work was done to resolve

issues from production, or issues that got pulled into the sprint via the triage process. We provided process metrics to make sure that we

were focused on the right problems.

Where to Go from Here

Our team and our process still aren't perfect, but we've learned a lot, and we'll continue to improve as we learn more. Sometimes we

don't get the meeting attendance we want, but we've found that problem to be cyclical. People attend for a while and then stop coming.

When their tickets are no longer getting attentionsurprise, surprisethey attend again.

As we work this plan, we continue to make small changes in the overall processes of software development and defect management. We

try to use the information we gather to inform our decisions. As you think about your own defectmanagement process, I hope that some


11/8/2009


3/3

of these ideas help.

Daily Defect Meetings

Our daily defect meetings had the following format:

Review the current items in the topten list(s):

Are any issues not making progress? Are the appropriate people involved in working the issue?a.

Are there any blockers around working the issues that need to be escalated?b.

Have circumstances changed in any way that would cause us to drop any of these issues from this list?c.

1.

Review new items created in any of the projects that feed the list(s):a. Review the tickets created since the last meeting from each project database feeding the lists you're reviewing.a.

b. If appropriate, update priority and severity as a team. Be sure to capture key comments about priority and severity in

the ticket.

b.

c. Pull any new tickets into the list as appropriate.c.

2.

Review the most critical outstanding existing items from the projects that feed the list(s):

When there are no new issues to pull in, and the list has room for new items, look at the highestpriority tickets from each

feeding project database.

a.

Ask attendees if any tickets within their projects need focus, but the ticket's priority may not appropriately represent the

urgency of the issue.

b.

Pull tickets into the list as appropriate, based on feedback and discussion by the team.c.

3.

Discuss any other issues related to the projects that the team may need to know.4.

If a topten list has more than 10 tickets, ask the team whether that's okay. Sometimes it is (related tickets, requirements for an

upcoming release, and so on), but make sure that the team knows why the buffer was extended and is clear on what that means.

Try not to let it happen too often.

5.

2009 Pearson Education, Inc. All rights reserved.

800 East 96th Street Indianapolis, Indiana 46240


11/8/2009

lessons learned in defect triage

Documents