recommendation system for code bugs
DESCRIPTION
Recommendation System that recommends Similar Bugs and estimates efforts required. Each Defect is broken into set of Keywords and applying ML algorithms to calculate a similarity coefficient.TRANSCRIPT
Predicting Bug Fixing Efforts for Open-source Software SystemsPrashant Raghav, Jenny WangCS 846
OUTLINE
1.Problem and Our Solution2.Initial Setup 3.COS4.Effort Estimation 5.References
Traditional Approach
LOC/Avg LOC ph by a developer = Total number of developer hours● Doesn’t account for
○ Project complexity○ Developer Proficiency
Our Tool ...
Tell me who the bug is Assigned to I will tell you How much time it gonna take ?
1.Selecting Dataset
Choices : BugZilla, JBoss project, Linux
Apache Hadoop Common Issue Tracking System.
Issues 30 Day Summary
Hadoop Common is the common library for Apache Hadoop
Issues 30 Day Summary.Issues: 114 created 66 resolved
2.Data Extraction
Download Data Extract Developers Bug Fix Activity
ID, Title, Description, Status, Detail, Developer
3. Database
Store each Defect in DB with defect information.
3.New Defect
Compare with Previous Defects.
● Duplicate Defect● New Feature
Bug : Incomplete Closing of FirefoxHadoop : Bug-12435Unable to run Hadoop (2.2.0) commands on Cygwin (2.831) on Windows XP 3
Bug-239223 (Ghostproc) – Hadoop version 2.2.0 command while running on Windows XP3 using Cygwin(2.831)
Bug : Incomplete Closing of FirefoxHadoop : Bug-12435Unable to execute Hadoop (2.2.0) commands on Cygwin (2.831) running on Windows XP SP3
Bug-239223 (Ghostproc) – Hadoop version 2.2.0 command could not run on Windows XP (service pack 3) using Cygwin(2.831)
More Bugs
Bug-244372: "Document contains no data" message on continuation page of NY Times article
Bug-219232: random "The Documentcontains no data." Alerts
4.Coefficient of Similarity
CoS : Depends on various factors.a) Are the code files similar?b) Input Files Similar ?c) Fraction of common keywords ?d) Which component ?
5.COS
More the similarity Higher the CoS. Exact Duplicate Defect CoS =1 CoS = w1*TS + w2*FS + w3*CS + w4*IFS* TS : Bug Report Similarity FS : Source Files Similarity SC : Component Similarity IS : Input Files Similarity where Wi are weight to be determined by experiments.
6. Programmer Proficiency
4 Buckets : ● Beginner ● Intermediate● Seasoned● Expert
7.Bracket Determination
Bracket Adjustment Factor(BAF)
● Commits to Software(Features)● 6 months time Frame.
○ No of Defects Solved○ No of Defects Reopened
8.Priority of Bug
Priority Adjustment Factor● High● Medium● Low
9.Comparing CoS for all defects
Each New Defect Compared against all defects defined in the database. Those with Highest CoS are extracted .
10. Effort Estimate
COS > threshold
ES -> ∑ CoSi * DTi / nWhere n - > No Of Similar Defects DTi : Developer Time
Programmer ProficiencyPriority of Defect
If CoS = 1 : Duplicate Defect Discard
References
● https://issues.apache.org● http://menzies.us/pdf/11ase.pdf● Local vs. Global Models for Effort Estimation and Defect
Prediction , Tim Menzis , Andrian Marcus● Towards Improving Bug Tracking Systems with Game
Mechanisms ,Leonardo Pasos ,University of Waterloo