drone: predicting priority of reported bugs by multi-factor analysis
DESCRIPTION
This is my presentation on ICSM 2013.TRANSCRIPT
![Page 1: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/1.jpg)
DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis
Yuan Tian1, David Lo1, Chengnian Sun2 1Singapore Management University 2National University of Singapore
![Page 2: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/2.jpg)
Bug tracking systems allow developers to prioritize which bugs are to be fixed first
¡ Manual process
¡ Depend on other bugs
¡ Time consuming
What is priority and when it is assigned?
New
Assigned
300 reports to triage daily!
Validity Check, Duplicate Check,
Priority Assignment Developer Assignment
Bug Triager
2
![Page 3: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/3.jpg)
Priority Vs Severity
3
“Severity is assigned by customers [users] while priority is provided by developers . . . customer [user] reported severity does impact the developer when they assign a priority level to a bug report, but it’s not the only consideration. For example, it may be a critical issue for a particular reporter that a bug is fixed but it still may not be the right thing
for the eclipse team to fix.” Eclipse PMC Member
Importance: P5 (lowest priority level), major (high severity)
![Page 4: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/4.jpg)
q Background q Approach
Overall Framework Features Classification Module
q Experiment Dataset Research Questions Results
q Conclusion
Outline
4
![Page 5: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/5.jpg)
q Background q Approach
Overall Framework Features Classification Module
q Experiment Dataset Research Questions Results
q Conclusion
Outline
5
![Page 6: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/6.jpg)
(1) summary (2) description (3) product (4) component (5) author (6) severity (7) priority.
1
2
34
5
6Time-related Info.
7
6
Bug Report
![Page 7: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/7.jpg)
q Background q Approach
Overall Framework Features Classification Module
q Experiment Dataset Research Questions Results
q Conclusion
Outline
7
![Page 8: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/8.jpg)
Training Reports
Related Reports
Model
Predicted Priority
Testing Reports
Temporal Textual
Author
Severity
Product
Model Builder Model
Application
Feature Extraction Module
Classifier Module
Training Phase
Testing Phase
Overall Framework
8
![Page 9: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/9.jpg)
Temporal Factor TMP1 Number of bugs reported within 7 days before the reporting of BR.
TMP2 Number of bugs reported with the same severity within 7 days before the reporting of BR.
TMP3 Number of bugs reported with the same or higher severity within 7 days before the reporting of BR.
TMP4-6 The same as TMP1-3 except the time duration is 1 day.
TMP7-9 The same as TMP1-3 except the time duration is 3 days.
TMP10-12 The same as TMP1-3 except the time duration is 30 days.
Textual Factor TXT1-n Stemmed words from the description field of BR excluding stop
words.
Severity Factor SEV BR’s severity field. 9
![Page 10: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/10.jpg)
Author Factor AUT1 Mean priority of all bugs reports made by the author of BR prior to the
reporting of BR. AUT2 Median priority of all bugs reports made by the author of BR prior to the
reporting of BR. AUT3 The number of bug reports made by the author of BR prior to the reporting
of BR.
Related Reports Factor [REP-, Sun et al.] REP1 Mean priority of the top-20 most similar bug reports to BR as measured
using REP- prior to the reporting of BR.
REP2 Median priority of the top-20 most similar bug reports to BR as measured using REP prior to the reporting of BR.
REP3-4 The same as REP1-2 except only the top 10 bug reports are considered.
REP5-6 The same as REP1-2 except only the top 5 bug reports are considered.
REP7-8 The same as REP1-2 except only the top 3 bug reports are considered.
REP9-10 The same as REP1-2 except only the top 1 bug reports are considered. 10
![Page 11: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/11.jpg)
Product Factor PRO1 BR’s product field Note: categorical feature PRO2 Number of bug reports made for the same product as that of BR prior to
the reporting of BR. PRO3 Number of bug reports made for the same product of the same severity as
that of BR prior to the reporting of BR. PRO4 Number of bug reports made for the same product of the same or higher
severity as those of BR prior to the reporting of BR. PRO5 Proportion of bug reports made for the same product as that of BR prior to
the reporting of BR that are assigned priority P1. PRO6-9 The same as PRO5 except they are for priority P2-P5 respectively.
PRO10 Mean priority of bug reports made for the same product as that of BR prior to the reporting of BR.
PRO11 Median priority of bug reports mad for the same product as that of BR prior to the reporting of BR.
PRO12-22 The same as PRO1-11 except they are for the component field of BR.
11
![Page 12: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/12.jpg)
Training Reports
Related Reports
Model
Predicted Priority
Testing Reports
Temporal Textual
Author
Severity
Product
Model Builder Model
Application
Feature Extraction Module
Classifier Module
Training Phase
Testing Phase
Overall Framework
12
![Page 13: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/13.jpg)
Model Building
Data
Training Features
Linear Regression Model
GRAY: Thresholding and Linear Regression to Classify Imbalanced Data.
13
Map feature values to real numbers
Training Phase
![Page 14: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/14.jpg)
Model Building
Data
Training Features
Linear Regression
Model Application
Model
Validation Data
Thresholding
Thresholds
GRAY: Thresholding and Linear Regression to Classify Imbalanced Data.
14
Training Phase
![Page 15: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/15.jpg)
Model Building
Data
Training Features
Linear Regression
Model Application
Model
Validation Data
Thresholding
Thresholds
GRAY: Thresholding and Linear Regression to Classify Imbalanced Data.
15
• Thresholding process maps real numbers to priority levels.
Training Phase
![Page 16: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/16.jpg)
Thresholding Process
16
BR1 1.2
BR2 1.4
BR3 3.1
BR4 3.5
BR5 2.1
BR6 3.2
BR7 3.4
BR8 3.7
BR9 1.3
BR10 4.5
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
Sort
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
P1
P2
P3
P4
P5
1.2
1.4
3.4
3.7
![Page 17: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/17.jpg)
Model Building
Data
Training Features
Linear Regression
Model Application
Testing Features
Model
Predicted Priority
Validation Data
Thresholding
Thresholds
GRAY: Thresholding and Linear Regression to Classify Imbalanced Data.
17
Training Phase
Testing Phase
![Page 18: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/18.jpg)
q Background q Approach
Overall Framework Features Classification Module
q Experiment Dataset Research Questions Results
q Conclusion
Outline
18
![Page 19: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/19.jpg)
q Eclipse Project § 2001-10-10 to 2007-12-14,
§ 178,609 bug reports.
Dataset
DRONE Testing DRONE Training
Model Building Validation
REP-
4.50% 6.89%
85.45%
1.95% 1.21%
P1 P2 P3 P4 P5
19
![Page 20: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/20.jpg)
? Accuracy (Precision, Recall, F-measure)
Compare with SEVERISprio [Menzies & Marcus], SEVERISprio+
? Efficiency (Run time)
? Top features (Fisher score)
Research Questions & Measurements
20
![Page 21: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/21.jpg)
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
P1 P2 P3 P4 P5
F-m
easu
re
DRONE SEVERISprio SEVERISprio+
RQ1: How accurate?
21
29.47%
18.75%
1. Baselines predict everything as P3 !
2. Average F-measure improves from 18.75% to 29.47%
3. A relative improvement of 57.17%.
![Page 22: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/22.jpg)
Approach
Run Time (in seconds) Feature
Extraction(train) Model
Building Feature
Extraction(test) Model
Application
SEVERISprio <0.01 812.18 <0.01 <0.01
SEVERISprio+ <0.01 773.62 <0.01 <0.01
DRONE 0.01 69.25 <0.01 <0.01
RQ2: How efficient?
22
Our approach is much faster in Model Building!
![Page 23: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/23.jpg)
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
23
![Page 24: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/24.jpg)
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
6 out of the top-10 features belong to the product factor family.
24
![Page 25: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/25.jpg)
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
3 out of the top-10 features come from the related-report factor family.
25
![Page 26: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/26.jpg)
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
1) org.eclipse.ui.internal.Workbench.run(Workbech.java:1663)
2) Appears in 15% P5 reports.
26
![Page 27: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/27.jpg)
Conclusion [email protected]
q Priority prediction is an ordinal + imbalance classification problem ->linear regression + thresholding is one option.
q DRONE can improve the average F-measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%.
q Product factor features are the most discriminative features, followed by related-reports factor features.
![Page 28: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/28.jpg)
Conclusion [email protected]
q Priority prediction is an ordinal + imbalance classification problem ->linear regression + thresholding is one option.
q DRONE can improve the average F-measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%.
q Product factor features are the most discriminative features, followed by related-reports factor features.
![Page 29: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/29.jpg)
Conclusion [email protected]
q Priority prediction is an ordinal + imbalance classification problem ->linear regression + thresholding is one option.
q DRONE can improve the average F-measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%.
q Product factor features are the most discriminative features, followed by related-reports factor features.
![Page 30: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/30.jpg)
30
I acknowledge the support of Google and the ICSM organizers in the form of a Female Student Travel Grant, which enabled me to attend this conference.
Thank you!
![Page 31: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/31.jpg)
Conclusion [email protected]
q Priority prediction is an ordinal + imbalance classification problem ->linear regression + thresholding is one option.
q DRONE can improve the average F-measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%.
q Product factor features are the most discriminative features, followed by related-reports factor features.
![Page 32: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/32.jpg)
APPENDIX
![Page 33: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/33.jpg)
33
![Page 34: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/34.jpg)
P1:10% P2:20% P3:40% P4:20% P5:10%
Proportions of each priority levels in Validation Data:
After applying Linear Regression Model on Validation Data:
BR1 1.2
BR2 1.4
BR3 3.1
BR4 3.5
BR5 2.1
BR6 3.2
BR7 3.4
BR8 3.7
BR9 1.3
BR10 4.5
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
Sort
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
Initial
P1
P2
P3
P4
P5
1.2
1.4
3.4
3.7
Predicted priority level
34
![Page 35: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/35.jpg)
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
P1
P2
P3
P4
P5
1.2
1.4
3.4
3.7
Initialized Thresholds:
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
P3
P4
P5
Tune one threshold
Compute F-measures
1.2
1.4
3.4
3.7
1.1
1.3
Compute F-measures
35
![Page 36: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/36.jpg)
36
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
1.2
1.4
3.4
3.7
Tune one threshold
1.1
1.3
Compute F-measures
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
P3
P4
P5
1.4
3.4
3.7
1.3
Update threshold value
P2
P1
Higher
![Page 37: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/37.jpg)
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
1.4
3.4
3.7
1.3
Threshold 1 is fixed
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
Tune for next threshold
P3
P4
P5
1.4
3.4
3.7
1.3 P2
P1
37
![Page 38: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/38.jpg)
q Menzies and Marcus (ICSM 2008) ¡ Analyze reports in NASA
¡ Textual features +feature selection+ RIPPER
q Lamkanfi et al. (MSR 2010, CSMR 2011)
¡ Predict coarse-grained severity labels
¡ Severe vs. non-severe
¡ Analyze reports in open-source systems
¡ Compare and contrast various algorithms
q Tian et al.(WCRE 2012)
¡ Information retrieval + k nearest neighbour
Previous Research Work: Severity Prediciton
38
![Page 39: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/39.jpg)
q Tokenization Spliting document into tokens according to delimiters.
q Stop-word Removal eg: are, is, I, he
q Stemming eg: woking, works, worked->work
Text Pre-processing
39
![Page 40: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/40.jpg)
40
q Textual Features
§ Compute BM25Fext scores
§ Feature1: Extract unigram
§ Feature2: Extract bigrams
q Non-Textual Features
§ Feature3: Product field
§ Feature4: Component Field
Appendix: Similarity Between Bug Reports (REP-)
Note: Weights are learned from duplicate bug reports.
![Page 41: DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis](https://reader036.vdocuments.site/reader036/viewer/2022081515/558cc896d8b42afe7b8b468e/html5/thumbnails/41.jpg)
Feature PRO5 Proportion of bug reports made for the same product as that of BR prior to the
reporting of BR that are assigned priority P1. PRO16 Proportion of bug reports made for the same component as that of BR prior to
the reporting of BR that are assigned priority P1. REP1 Mean priority of the top-20 most similar bug reports to BR as measured using
REP- prior to the reporting of BR. REP3 Mean priority of the top-10 most similar bug reports to BR as measured using
REP prior to the reporting of BR. PRO18 Proportion of bug reports made for the same component as that of BR prior to
the reporting of BR that are assigned priority P3. PRO10 Mean priority of bug reports made for the same product as that of BR prior to
the reporting of BR. PRO21 Mean priority of bug reports made for the same component as that of BR prior
to the reporting of BR. PRO7 Proportion of bug reports made for the same product as that of BR prior to the
reporting of BR that are assigned priority P3. REP5 Mean priority of the top-5 most similar bug reports to BR as measured using REP
prior to the reporting of BR. Text “1663”
41