user-perceived source code quality estimation based on static analysis metrics

20
User-Perceived Source Code Quality Estimation based on Static Analysis Metrics 1 Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki Intelligent Systems & Software Engineering Labgroup, Information Processing Laboratory Thessaloniki, Greece Email: {mpapamic, thdiaman}@issel.ee.auth.gr, [email protected] User-Perceived Source Code Quality Estimation based on Static Analysis Metrics Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Upload: issel

Post on 13-Apr-2017

65 views

Category:

Software


0 download

TRANSCRIPT

Page 1: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

1

User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

Michail Papamichail, Themistoklis Diamantopoulos and Andreas SymeonidisElectrical and Computer Engineering Dept., Aristotle University of Thessaloniki

Intelligent Systems & Software Engineering Labgroup, Information Processing LaboratoryThessaloniki, Greece

Email: {mpapamic, thdiaman}@issel.ee.auth.gr, [email protected]

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Page 2: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

2 Outline

The concept of user-perceived quality. Research objectives. Key implementation points. The designed system. Evaluation. Conclusion and Future work.

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 3: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

3 Why to evaluate code quality?

Various open source software projects. Numerous online software repositories.

Source Code Quality Evaluation

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Code Reuse

Is a software component suitable for

reuse?

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 4: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

4 User-Perceived source code quality

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Idea: Use of software components popularity as a quality

indicator.But: Popularity cannot be used as a sole quality criterion.

- Is based on current trends.- Depends on the programming language.

Popularity Static Analysis Metrics

Recommended Coding Practices+¿ +¿ Measure of

quality

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 5: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

5 The ideaUser-Perceived Quality Estimation

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Idea Tools Used Proposed System Use of software

components popularity as ground truth – GitHub number of stars

Use of static analysis metrics and violations of “good” coding practices

Apply machine learning techniques for estimating user-perceived source code quality

Static Analysis

Quality Evaluation

Models

Quality Score

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 6: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

6 Key implementation points

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Qualitative evaluation of the selected repositories.

Training set formation. Target set formation. Quality estimation models.

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 7: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

7 Training dataset

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Top 100 Repositories

GitHub

Metrics Report

Violations Report

24930 files

Training Dataset

Qualitative Evaluation

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 8: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

8 Selected repositories qualitative evaluation

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

PMD Ruleset

Percentage (%) of files containing severe

violations

PMD Ruleset

Percentage (%) of files containing severe

violations

Priority 1 Priority2 Priority 1 Priority2

Unused Code 0.0% 0.0% Coupling 0.0% 0.0%Basic 0.015% 0.337% Design 3.37% 3.9%

Braces 0.0% 0.0% Empty 0.0% 0.0%Comments 0.0% 0.0% Finalizers 0.0% 0.0%

Naming 14.11% 0.45% Optimizations

0.0% 0.0%

Clone 0.0% 0.0% Strict Exception

4.99% 0.0%

CodeSize 0.0% 0.0% Strings 0.0% 0.06%Controversia

l1.75% 1.58% Unnecessary 0.0% 0.0%

Very small percentage of files contain

severe violations

Page 9: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

9 Target set formation Use of GitHub stars as ground truth.

But: GitHub stars per repository (NOT per file) Every source code file is of different

importance Big differences in the number of files

between repositories

10000

x stars

y stars

z stars

Dependency Analysis

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 10: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

10 Target set formation

For the i-th file of the j-th repository, the target if formulated as follows:

Smoothing factor A base score to

all files in the same repository

Added value according to the significance of

the source code file

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 11: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

11 Quality Evaluation Models

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

ANNs Model Input: The values of 73 static analysis metrics. Output: User-Perceived source code quality estimation Applicable only for source code files that exceed

minimum quality threshold SVMs - One Class Classifier Used to rule out low quality code.

One Class Classifier

ANNs Model

AcceptedStatic

Analysis Quality Estimation

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 12: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

12 ANNs Model

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Two-layer feedforward network. Levenberg-Marquardt algorithm

(LMA) for adjusting the weights and the biases.

(Training, Validation, Test) = (70%, 15%, 15%).

Applicable only for source code files that exceed minimum quality threshold.

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 13: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

13 SVMs One Class Classifier

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Used to rule out low quality code. Gaussian radial basis kernel function. Training involved the use of 7 metrics:

Average Block Depth, Average Cyclomatic Complexity Average Depth of Inheritance Hierarchy Average Line of Codes Per Method Comments Ratio Distance Lines Of Code

(nu, gamma, tolerance) = (0.1, 0.01, 0.01)

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

1124 false-

positives

Page 14: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

14 System Evaluation

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Results validation: Quantitative: Using PMD Qualitative: Examination of a representative sample of files

and their metricsEvaluation on three main axes:1. The system's ability to distinguish high quality source code

files.2. The effectiveness of the model for estimating the quality of

files exceeding a quality threshold.3. The accuracy of predicting the popularity of Java repositories

given their source code files.

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 15: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

15 System Evaluation

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Repositories selected: 8 random typical GitHub projects chosen independently. lines-of-code-per-file ratio around 100, including also several

extreme cases. Both human and auto-generated code. The auto-generated projects are expected to be of high quality.

Follow all coding conventions. Are architecturally and functionally complete.

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 16: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

16 Evaluation – One Class Classifier

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

The percentage of the rejected files that contained

severe violations is very high

Page 17: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

17 Evaluation – ANNs Model

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

The quality score reflects the

characteristics of the repositories

Page 18: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

18 Evaluation – Popularity Prediction

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 19: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

19 Conclusions and future work

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

Conclusions: Reliable determination of the area of high quality source code

based on static analysis metrics. Effective user-perceived source code quality estimation.Future Work: Further investigation of the response of our model in different

scenarios. Expansion of the ground truth coverage by using more metrics. Application of feature selection techniques in order to drop

overlapping metrics.

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis

Page 20: User-Perceived Source Code Quality Estimation based on Static Analysis Metrics

20

Thank you!

IEEE International Conference on Software Quality, Reliability & Security – QRS 2016

User-Perceived Source Code Quality Estimation based on Static Analysis MetricsMichail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis