week 3 presentation istehad chowdhury cisc 864 mining software engineering data
TRANSCRIPT
Research Paper
Who Should Fix This Bug?
John Anvik, Lyndon Hiew and Gail C. MurphyDepartment of Computer Science
University of British Columbia{janvik, lyndonh, murphy}@cs.ubc.ca
Problem with Open Bug Repository Overall, to cope with the surge of bugs in
large open source projects. “Everyday, almost 300 bugs appear that need
triaging. This is far too much for only the Mozilla programmers to handle.”
Many bug reports are invalid or duplicate of another bug report
Eclipse, 36%
Every bug report should be triaged To check validity and duplicity To assign the bug to an appropriate developer
Problem cont..
Triager may not be sure whom to assign the bug.
Lot of time is wasted in reassigning and regaining 24% reports in Eclipse are re-assigned
The research work Goal:
suggest whom to assign this bug to
Technique: Using data mining and machine
learning
Result: 60% precision and 10% recall
Approach to the problem
Semi automated1. Characterizing bug reports2. Assigning a label to each report3. Choosing reports to train the supervised
machine learning algorithm4. Applying the algorithm to create the
classifier for recommending assignments.
Heuristics on labeling bug reports FIXED (who provided last approved
patch), Firefox
FIXED (whoever marked report as resolved), Eclipse
DUPLICATE: whoever resolved the report is duplicate. Eclipse and Firefox
WORKSFORME (Firefox) -- unclassifiable.
Validating Results with GCC
Why so poor result? Why recall is low in all cases, esp. gcc? Shows need of similarity in project natures.
Trying Alternatives cont..
Unsupervised Machine learning
Incremental Machine learning
Incorporating Additional sources of Data
Component based classifier
Points to Ponder cont..
Are new developers assigned any bug?
“Needs further study to context of which it can be applied”-empirical research
Points to Ponder cont..
Was there enough instances to evaluate using Cross Validation? For firefox 75%, gcc 86% developers have
less than 100 reports
Why was the labeling mechanism more successful in case of gcc and Eclipse than firefox? 1% for Eclipse, 47% for firefox
Points in favor The research work was very intense
Thoroughly studied
Honest in identifying the limitations and smart pointing out of the future works
It opens up interesting doors of future research
Points Against The study may not be suitable for a
environment where there is a frequent change in the active set of developers
The findings are too project specific and works well on “actual bugs” reports
Points Against cont..
If there is any naivety in the heuristics it also propagates to the filtering process based on the heuristics to train the classifier.
I liked the way included the lesson learned section. However, the authors should have explained in more details how the mappings were done .
Concluding Remarks It shows promise for improving the bug
assignment problem for OSS
“Coordination bug reports and CVS is challenging”
The effort is worth praising
Identifies need for further research