automated bug localization

40
Automated Bug localization Xin Ye /sin/ /jeə/ (seen yeah) Ph.D. Candidate Electrical Engineering and Computer Science Ohio University Reach me at [email protected] http://xinye-ohio.github.io /

Upload: xin-ye

Post on 13-Apr-2017

504 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Automated bug localization

Automated Bug localization

Xin Ye /sin/ /jeə/ (seen yeah)Ph.D. CandidateElectrical Engineering and Computer ScienceOhio UniversityReach me at [email protected]://xinye-ohio.github.io/

Page 2: Automated bug localization

Software BugsIntroduction to software bugs and

bug triage

Page 3: Automated bug localization

What is a software bug?A software bug or defect is a coding mistake that may cause an unintended or unexpected behavior of the software component [3].

Upon discovering an abnormal behavior of the software project, the user or developer will report it in a document, called a bug report.

[3] Bernd Bruegge and Allen H. Dutoit. 2009. Object-Oriented Software Engineering Using Uml, Patterns, and Java (3rd ed.). Prentice Hall Press, NJ, USA.

Page 4: Automated bug localization

A bug report example

https://bugs.eclipse.org/bugs/show_bug.cgi?id=339286

The screenshot of Eclipse bug report #339286 got from the Bugzilla website.

summary

metadata

descriptions

Page 5: Automated bug localization

The bug-fix process

1. A new bug report is received.

2. Is it a bug or not? If yes, then go to 3. If not, then go to 6.

3. Assign the bug report to a developer.

4. The develop assigned with the bug report tries to find the cause and fixes it.

5. Is the bug fixed? If yes, then go to 6. If not, then go to 4.

6. Close the bug report.

Page 6: Automated bug localization

Software Engineering Tools

1. Integrated development environment (IDE)• An IDE usually contains a text editor, compilers, and debugger.• Eclipse is an IDE developed by the Eclipse Foundation. • https://eclipse.org/2. Version control system• A version control tool manages changes to a set of files in the

repository over time.• Git is a free and open source version control system that manages

and records changes to the repository. It allows groups of people to work on the same documents at the same time.

• https://git-scm.com/3. Issue tracking system• An issue tracking system is used to create, update, and resolve any

issues reported by the customers and the developers.• Bugzilla is a free issue tracking system that allows individual or

groups of developers to keep track of outstanding bugs in their product effectively.

• https://www.bugzilla.org/

Page 7: Automated bug localization

A scenario of using these tools

remote reposit

ory

Bugzilla

git pull

git push

report a bug

assigned closed

local repository

local repository

User

Developer A

Developer B

Page 8: Automated bug localization

A Git commit that contains the fix

1. A Git commit refers to a snapshot of the repository at this moment. It stores the files changed in this snapshot, compared with the last snapshot.

2. From the log message we know that this commit fixes bug #339286.

3. Totally 5 files were fixed.

log message

commit ID

files changed in this commit

https://git.eclipse.org/c/platform/eclipse.platform.ui.git/commit/?id=7cb5c12e774aa1bd97c383baab6baabf35d6374d

Page 9: Automated bug localization

Statement of the problemo A large number of bugs can be found during the

software development and maintenance process.

o There are more than 5,041 bug reports created for the Eclipse project in this year (from 01-01-2015 to 11-22-2015). This results in an average of 15 bug reports per day. https://bugs.eclipse.org/bugs/

o A developer who is assigned with a bug report usually needs to reproduce the abnormal behavior and perform code reviews in order to find the cause [4]. This manual process is tedious and time consuming.

[4] A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proc. ICSE ’13, pp. 712–721, 2013.

Page 10: Automated bug localization

Our taskQ: Can we help them find the buggy files faster?A: Yes.Q: What can we do?A: Narrow their search space from thousands of files to just ten files.Q: How do we do it?A: Rank all the files automatically based on their relevance to the bug report. The higher position in the ranked list indicates larger chance that it contains the bug. Finally we recommend the top ten to them.

Page 11: Automated bug localization

A use scenario

1. We say the query.2. The system returns a list of files.3. We check these files and decide if they are relevant

to the bug or not.4. This works like a web search engine.

Page 12: Automated bug localization

The ranking model

Given a bug report, how to rank all the source code files

automatically?

Page 13: Automated bug localization

The ranking problemSource files (documents) are ranked with respect to their relevance to a given bug report (query).

When a bug report is received, we assign a file score to every source code file . We rank all based on their .

The higher position of in the ranked list, the larger probability that is relevant to the bug report .

Page 14: Automated bug localization

The ranking function

A feature is a type of information that measures the relevance between and . is a weighted combination of different

, = • -- a bug report• -- a source code file• -- a feature that measures the

relevance between and • -- the weight parameter of

Page 15: Automated bug localization

The ranking function

Given at test time, we calculate a for every in the project.Given and , if > , then is ranked higher than .

, = • -- a bug report• -- a source code file• -- a feature that measures the

relevance between and • -- the weight parameter of

Page 16: Automated bug localization

The ranking function

is learned automatically based on previously fixed bug reports

, = • -- a bug report• -- a source code file• -- a feature that measures the

relevance between and • -- the weight parameter of

Page 17: Automated bug localization

System architecture

System architecture for training and testing

The ranking model parameters are trained on previously fixed bug reports using a learning-to-rank technique named [5, 6].

For a fixed bug report r, its relevant (buggy) files are known.Let denotes a relevant file.Let denotes an irrelevant file.

The learning-to-rank algorithm tries to optimize so that > for all fixed bug reports used in training.

[5] T. Joachims. Optimizing search engines using clickthrough data. In Proc. KDD '02, pages 133 - 142, 2002.[6] T. Joachims. Training linear SVMs in linear time. In Proc. KDD '06, pages 217 - 226, 2006.

Page 18: Automated bug localization

Feature EngineeringIntroduction of

Page 19: Automated bug localization

Text similarity

Eclipse bug report 339286

Bug ID: 339286Summary: Toolbars missing icons and show wrong menus.Description: The toolbars for my stacked views were: missing icons, showing the wrong drop-down menus (from others in the stack), showing multiple drop-down menus, missing the min/max buttons ...

public void handleEvent(Event event) {…ToolBarManager parent = getManager((MToolBar) obj);ToolBar tb = parent.getControl();

…}…IContributionItem[] items = menuManager.getItems();

ToolBarManagerRenderer.java

The bug report on top and the source code file below share some common key words.

Intuition: The larger text similarity between two documents indicates the larger chance that they are relevant.

Page 20: Automated bug localization

Vector representation of a document

Given a document , we use a vector of real numbers to represent it.Let denotes the Vector Space Model (VSM) vector representation of. denotes the term frequency of term t in (how many times t appears in ) denotes the document frequency of t (how many documents contain t) is the inverse document frequency of t is called the term weight of t in = [, , …, , …, ] for all t in the vocabulary with a size of N

Page 21: Automated bug localization

Text similarity = =

• -- a bug report• -- a source code file• -- the vector representation

of • -- the vector representation

of • -- the cosine similarity between

and • -- the text similarity between and

Page 22: Automated bug localization

feature 1 - Surface lexical similarity

𝜙1 (𝑟 ,𝑠 )=max ( {𝑠𝑖𝑚 (𝑟 ,𝑠 )}∪ {𝑠𝑖𝑚 (𝑟 ,𝑚 )∣𝑚∈𝑠 ))

• -- a bug report• -- a source code file• -- a method in • -- the text similarity between and • -- the text similarity between and

Intuition:If a source code file share many common key

words with the bug report , it is very likely that is relevant to .

Page 23: Automated bug localization

feature 2 – API-enriched lexical similarity

Bug ID: 339286Summary: Toolbars missing icons and show wrong menus.Description: The toolbars for my stacked views were: missing icons, showing the wrong drop-down menus (from others in the stack), showing multiple drop-down menus, missing the min/max buttons ...

public class PartRenderingEngine implements IPresentationEngine {private EventHandler trimHandler = new EventHandler() {public void handleEvent(Event event) { ...MTrimmedWindow window =(MTrimmedWindow) changedObj;... } ... } ... }

Eclipse bug report 339286

PartRenderingEngine.java

Page 24: Automated bug localization

feature 2 – API-enriched lexical similarity

Interface MUILabelAll Known Subinterfaces: MTrimmedWindow, ...Description: A representation of the model object 'UI Label'. This is a mix in that will be used for UI Elements that are capable of showing label information in the GUI (e.g. Parts, Menus / Toolbars, Perspectives, ...). The following features are supported: Label, Icon URI, Tooltip ...

Eclipse bug report 339286

API description of the MUILabel interface

Bug ID: 339286Summary: Toolbars missing icons and show wrong menus.Description: The toolbars for my stacked views were: missing icons, showing the wrong drop-down menus (from others in the stack), showing multiple drop-down menus, missing the min/max buttons ...

Page 25: Automated bug localization

𝜙2 (𝑟 ,𝑠 )=max ( {𝑠𝑖𝑚 (𝑟 , 𝑠 .𝑎𝑝𝑖) }∪ {𝑠𝑖𝑚 (𝑟 ,𝑚 .𝑎𝑝𝑖 ) ∣𝑚∈𝑠))

• -- a bug report• -- a source code file• -- a method in • -- a document that concatenates the

corresponding API descriptions for all API entries used in

• -- a document that contains all for

feature 2 – API-enriched lexical similarity

Intuition:If a source code file uses many API entities, and if

these API entities are relevant to the bug report , it is very likely that is relevant to .

Page 26: Automated bug localization

feature 3 – Collaborative filtering score

Eclipse bug report 378535 ()

Bug reports () for which StackRenderer.java (s) was fixed

Bug ID: 378535Summary: “Close All" and “Close Others" menu options available when right clicking on tab in PartStack when no part is closeable. Description: If I create a PartStack that contains multiple parts but none of the parts are closeable, when I right click on any of the tabs I get menu options for “Close All“ and “Close Others". Selection of either of the menu options doesn't cause any tabs to be closed since none of the tabs can be closed. I don't think the menu options should be available if none of the tabs can be closed ...

Bug ID: 329950Summary: “Close All" and “Close Others" may cause bundle activation.

Bug ID: 325722Summary: “Close"-related context menu actions should show up for all stacks and apply to all items.Bug ID: 313328Summary: Close parts under stacks with middle mouse click.

Page 27: Automated bug localization

feature 3 - Collaborative filtering score𝜙3 (𝑟 ,𝑠 )=𝑠𝑖𝑚 (𝑟 ,𝑅 (𝑟 ,𝑠 ))

• -- a bug report• -- a source code file• -- a set of previous bug reports for which

was fixed• -- the lexical similarity between and

Intuition:If a source code file has been fixed many times

before for other similar bug reports, it is very likely that is responsible for this new bug report.

Page 28: Automated bug localization

feature 4 - Class Name Similarity𝜙4 (𝑟 ,𝑠)={|𝑠 .𝑐𝑙𝑎𝑠𝑠|𝑖𝑓 𝑠 .𝑐𝑙𝑎𝑠𝑠∈𝑟0 h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒

• -- a bug report• -- a source code file• -- the top-level public class name

of • -- the length of the class name

Intuition:If the bug report directly mentions a source code

file , it is very likely that is relevant to .

Page 29: Automated bug localization

feature 5 - Bug-fixing Recency

𝜙5 (𝑟 ,𝑠 )= 1𝑟 . h𝑚𝑜𝑛𝑡 −𝑙𝑎𝑠𝑡 (𝑟 ,𝑠 ) . h𝑚𝑜𝑛𝑡 +1

• -- a bug report• -- a source code file• -- the month when is created• -- the most recent bug report for which was

fixed• -- the month when was solved

Intuition:If was recently fixed, it may still contain bugs.

Page 30: Automated bug localization

feature 6 - Bug-fixing Frequency𝜙6 (𝑟 , 𝑠)=|𝑅 (𝑟 ,𝑠 )|

• -- a bug report• -- a source code file• -- a set of previous bug reports for which

was fixed• -- the number of bug reports in

Intuition:If was frequently fixed, it may still contain bugs.

Page 31: Automated bug localization

Feature Scaling

Why:Feature scaling helps bring all features to the

same scale so that they become comparable with each other.

Page 32: Automated bug localization

Combine different features

[8]

, = • -- a bug report• -- a source code file• -- a feature• -- the weight parameter of

In [7], = 6 (we introduce six features in a conference paper).In [8], = 19 (we introduce more features in a journal paper).

Page 33: Automated bug localization

Evaluation

Page 34: Automated bug localization

BENCHMARK DATASETS

• AspectJ: an aspect-oriented programming extension for Java.• http://eclipse.org/aspectj/

• Birt: an Eclipse-based business intelligence and reporting tool.• https://www.eclipse.org/birt/

• Eclipse Platform UI: the user interface of an integrated development platform.• http://projects.eclipse.org/projects/eclipse.platform.ui

• JDT: a suite of Java development tools for Eclipse.• http://www.eclipse.org/jdt/

• SWT: a widget toolkit for Java.• http://www.eclipse.org/swt/

• Tomcat: a web application server and servlet container.• http://tomcat.apache.org

Page 35: Automated bug localization

Combine different features

Accuracy@k -- the percentage of bug reports for which our model can help by recommending K files each time

If we recommend the top 10 files for every bug report received, we can help for over 70% Eclipse bug reports.

Page 36: Automated bug localization

Conclusiono Software repositories, issue tracking systems,

and software documents contain useful information that can help software development.

o Applications of machine learning and information retrieval techniques in automated software engineering are promising.

o A good coding style helps make the code readable by both humans and information retrieval systems.

Page 37: Automated bug localization

Future worko Use runtime execution information in additional

features.

o Use word embeddings to measure word similarities.

o Measure not only the lexical similarity but also the semantic similarity between documents.

Page 38: Automated bug localization

Software visualizationA large cylinder is a file.

A small cylinder within a large cylinder is a version of the file.

Different colors refer to different developers.

Blue bases are folders.

The smoke indicates a conflict.

TeamWATCH at https://www.youtube.com/watch?v=xPDilTwfySU Chang Liu; Xin Ye; En Ye, "Source Code Revision History Visualization Tools: Do They Work and What Would it Take to Put Them to Work?,"

in Access, IEEE , vol.2, no., pp.404-426, 2014

Page 39: Automated bug localization

3-D mobile educational games

iPad Game available on Apple App Store

Teach science For in-class use For high school

studentsA student driven the boat in the Ohio River finds fish kill.

A high school student is using VBEE during a class.

Download the Virtual Boat for Environmental Education (VBEE) game at AppStore by searching “VBEE”

Youtube introduction at https://www.youtube.com/watch?v=TUmlz_OFOIA

Page 40: Automated bug localization

THANKS!Any questions?You can find me at

[email protected]