reda: a web-based visualization tool for analyzing modern code review dataset
DESCRIPTION
ReDA(http://reda.naist.jp/) is a web-based visualization tool for analyzing Modern Code Review (MCR) datasets for large Open Source Software (OSS) projects. MCR is a commonly practiced and lightweight inspection of source code using a support tool such as Gerrit system. Recently, mining code review history of such systems has received attention as a potentially effective method of ensuring software quality. However, due to increasing size and complexity of softwares being developed, these datasets are becoming unmanageable. ReDA aims to assist researchers of mining code review data by enabling better understand of dataset context and identifying abnormalities. Through real-time data interaction, users can quickly gain insight into the data and hone in on interesting areas to investigate. A video highlighting the main features can be found at: http://youtu.be/ fEoTRRas0UTRANSCRIPT
A Web-based Visualization Tool for Analyzing Modern Code Review Dataset
Patanamon ThongtanunamXin Yang, Norihiro Yoshida, Raula Gaikovina Kula,
Ana Erika Camargo Cruz, Kenji Fujiwara, Hajimu Iida
1
Modern Code Review is a tool-based, and occurs regularly in practice nowadays at companies and OSS
projects [Bacchelli et. al.]
Gerrit Code Review
Rietveld
2
Modern Code Review
Developer
3
Modern Code Review
Developer
Gerrit Code ReviewCode change
3
Modern Code Review
Developer
Gerrit Code ReviewCode change
3
Modern Code Review
Reviewers
Developer
Gerrit Code ReviewCode change
3
Modern Code Review
Determine Quality
Find DefectsComment
Reviewers
Developer
Gerrit Code ReviewCode change
3
Modern Code Review
Determine Quality
Find DefectsComment
Reviewers
Developer
Gerrit Code ReviewCode change
Patch Sets (Commits)
3
Modern Code Review
Determine Quality
Find DefectsComment
Reviewers
Developer
Gerrit Code ReviewCode change
Patch Sets (Commits)
Modified Files
3
Modern Code Review
Determine Quality
Find DefectsComment
Reviewers
Developer
Gerrit Code ReviewCode change
Review Scores
Patch Sets (Commits)
Modified Files
3
Modern Code Review
Determine Quality
Find DefectsComment
Reviewers
Developer
Gerrit Code ReviewCode change
Comments
Review Scores
Patch Sets (Commits)
Modified Files
3
Modern Code Review
4
Modern Code Review
New &
Rich
source
4
Modern Code Review
New &
Rich
sourceCode Quality
4
Modern Code Review
New &
Rich
sourceCode Quality Popular
4
Modern Code Review
?
New &
Rich
sourceCode Quality Popular
Reliable?
4
Modern Code Review
?
New &
Rich
sourceCode Quality Popular
Reliable?
“Publicly available data from support tools is a rich source on mining, various potential perils
should also be taken into consideration.” [Kalliamvakou et. al.]
4
Modern Code Review
?
New &
Rich
sourceCode Quality Popular
Reliable?
“Publicly available data from support tools is a rich source on mining, various potential perils
should also be taken into consideration.” [Kalliamvakou et. al.]Can we easily find a characteristic in the dataset?
4
is for facilitating researchers
5
is for facilitating researchers
Extracting a dataset
5
is for facilitating researchers
Extracting a dataset
Showing basic statistical summary
5
is for facilitating researchers
Extracting a dataset
Showing basic statistical summary
Observing interesting patterns & Identify problems
5
provides three visualizations
6
Review Statistic
provides three visualizations
6
Review Statistic Activity Statistic
provides three visualizations
6
Review Statistic Activity Statistic
Contributor Activities
provides three visualizations
6
Demo http://reda.naist.jp
7
Android Open Source Software Project (AOSP) dataset* Data was captured from October 2008 to January 2012
Example Findings8
Android Open Source Software Project (AOSP) dataset
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peaks in Graphs
Data Observation using #1
9
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peaks in Graphs
Data Observation using #1
9
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peak in Graphs
Data Observation by #1
Submit wrong code version
Accidentally submit changes
Gerrit Code Review
Developers mistakes
10
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peak in Graphs
Data Observation by #1
Submit wrong code version
Accidentally submit changes
Gerrit Code Review
Developers mistakes
11
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peak in Graphs
Data Observation by #1
Submit wrong code version
Accidentally submit changes
Gerrit Code Review
Developers mistakes Reviews related to VCS transactions
Gerrit Code Review
E.g., Merging branch transactions
11
Review Statistic
Activity Statistic
Review Statistic
Reviews without code changes
Unusual Peak in Graphs
Data Observation by #1
Submit wrong code version
Accidentally submit changes
Gerrit Code Review
Developers mistakes Reviews related to VCS transactions
Gerrit Code Review
E.g., Merging branch transactions
F1: 10% of all reviews were created but not for code review
11
Review Statistic
Data Observation using #2
12
Review Statistic
Missing data
Data Observation using #2
12
Review Statistic
Missing dataGit and Gerrit servers
were down.Reported by Google Developers
Incomplete dataset can bias the results.
Data Observation using #2
12
Review Statistic
Missing dataGit and Gerrit servers
were down.Reported by Google Developers
Incomplete dataset can bias the results.
Data Observation using #2F2: Review history in AOSP is incomplete.
12
Activity Statistic
Data Observation using #3
Contributor
Contributor Activities13
Activity Statistic
Data Observation using #3
Low activity number
Contributor
Contributor Activities13
Activity Statistic
Data Observation using #3
Weekend
Contributor
Contributor Activities13
Activity Statistic
Data Observation using #3
F3: Developers usually do code reviews on weekdays.
Weekend
Contributor
Contributor Activities13
Activity Statistic
Data Observation using #3
F3: Developers usually do code reviews on weekdays.
Weekend
Contributor
Contributor Activities13
Activity Statistic
Data Observation using #3
F3: Developers usually do code reviews on weekdays.
Weekend
Contributor
Contributor Activities
Main Contributor
13
Activity Statistic
Data Observation using #3
F3: Developers usually do code reviews on weekdays.
Weekend
Contributor
Contributor Activities
Main Contributor
13
Activity Statistic
Data Observation using #3
F3: Developers usually do code reviews on weekdays.
Weekend
Contributor
Contributor Activities
Main Contributor
F4: Main contributors are from Google and Android teams.
13
14
Modern Code Review is a tool-based, and occurs regularly in practice nowadays at companies and OSS
projects [Bacchelli et. al.]
Gerrit Code Review
Rietveld
14
Modern Code Review is a tool-based, and occurs regularly in practice nowadays at companies and OSS
projects [Bacchelli et. al.]
Gerrit Code Review
Rietveld
14
Modern Code Review
?
New &
Rich
source
Code QualityPopular
Reliable?
“Publicly available data from support tools is a rich source on mining, various potential perils
should also be taken into consideration.” [Kalliamvakou et. al.]
Can we easily find a characteristic in the dataset?
4
Review Statistic Activity Statistic
Contributors Activities
provides three visualizations
Modern Code Review is a tool-based, and occurs regularly in practice nowadays at companies and OSS
projects [Bacchelli et. al.]
Gerrit Code Review
Rietveld
14
Modern Code Review
?
New &
Rich
source
Code QualityPopular
Reliable?
“Publicly available data from support tools is a rich source on mining, various potential perils
should also be taken into consideration.” [Kalliamvakou et. al.]
Can we easily find a characteristic in the dataset?
4
F2: Review history in AOSP is incomplete.
Review Statistic Activity Statistic
Contributors Activities
provides three visualizations
Modern Code Review is a tool-based, and occurs regularly in practice nowadays at companies and OSS
projects [Bacchelli et. al.]
Gerrit Code Review
Rietveld
F1: 10% of all reviews were created but not for code review
Example Findingsfrom Android Open Source Software Project (AOSP) dataset* Data was captured from October 2008 to January 2012
Gerrit
F3: Developers usually do code reviews on weekdays.
F4: Main contributors are from Google and Android teams.
14
Modern Code Review
?
New &
Rich
source
Code QualityPopular
Reliable?
“Publicly available data from support tools is a rich source on mining, various potential perils
should also be taken into consideration.” [Kalliamvakou et. al.]
Can we easily find a characteristic in the dataset?
4