May 1, 2023SAL: AN EFFECTIVE METHOD FOR
SOFTWARE DEFECT PREDICTION
Paper ID: 116
Sadia Sharmin, Md Rifat Arefin, M. Abdullah-Al Wadud, Naushin Nower, Mohammad Shoyaib
Institute of Information Technology (IIT),University of Dhaka, Bangladesh
May 1, 2023
2
CONTENTS Background Motivation Problem Specification Literature Review Methodology Result Analysis and Discussion Future Work
May 1, 2023
3
BACKGROUNDSoftware Defect
Any flaw or imperfection in a software work product or software process
Software Defect PredictionAn approach to find out the defected part earlier before
testing/releasing the product
May 1, 2023
4
AN OVERVIEW OF SOFTWARE DEFECT PREDICTION PROCESS
Data Set
Pre-processing
Attribute Selection
Testing Data
Prediction Result
Training Data
Prediction Model
Training
May 1, 2023
5
MOTIVATION
Identifying the software bugs in an early stage
Allocating the test resources efficiently
Minimizing the cost of software development
Improving the quality and productivity of software
May 1, 2023
6
WHY NEED PRE-PROCESSING Noisy Data Outliers Missing value or Conflicting value Inconsistency
May 1, 2023
7
WHY NEED ATTRIBUTE SELECTION Attributes are not equally important Some are highly responsible Some may decrease the performance
May 1, 2023
8
LITERATURE REVIEW Pre-processing
Log filtering ( Song et al. [2])Min-max normalization (Nam et al. [1])Z-score normalization (Nam et al. [1])
May 1, 2023
9
LITERATURE REVIEW Attribute Selection Method
A threshold-based feature selection(Wang et al. [3])A hybrid attribute selection method (Gao et al. [4])A general defect prediction framework ( Song et al. [2])
May 1, 2023
10
METHODOLOGYSAL: Selection of Attribute with Log filtering
Pre-processing
ln (n+ ) where = 0.01
ATTRIBUTE SELECTION PROCESS May 1, 2023
11
Attribute Set
Attribute
Ranking
Best Set Selectio
n
May 1, 2023
12
ATTRIBUTE RANKINGA1A2A3A4A5………An
May 1, 2023
13
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….
An 0.789
Individual Balance value
May 1, 2023
14
ATTRIBUTE RANKINGA1A2A3A4A5………An
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
A1 0.564A2 0.764A3 0.685A4 0.798A5 0.892… …….… …….
An 0.789
May 1, 2023
15
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
Pair wise Balance value
May 1, 2023
16
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
Average Balance value
For each attribute
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897
May 1, 2023
17
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
Average Balance value
For each attribute
Average Balance Value = (Individual value +
Average value of n pair)/2
A1 0.765 A2 0.534A3 0.679A5 0.987A4 0.869… .…..… .…..An 0.897
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
May 1, 2023
18
ATTRIBUTE RANKINGA1A2A3A4A5………An
A1 0.034A2 0.034A3 0.456A4 0.348A5 0.784… …….… …….
An 0.789
Individual Balance value
A1A2A3A4A5………An
A1A2A1A3…….…….A3A1A3A2…….…….AmAn
Pair wise combinatio
n
Pair wise Balance value
A1 0.765 A2 0.534A3 0.679A5 0.887A4 0.869… .…..… .…..An 0.897
Average Balance value
For each attribute A5 0.887
A4 0.869A10 0.765A8 0.750A9 0.696… .…..… .…..An 0.523
SortedBalance value in
decreasing order
A1A2 0.896
A1A3 0.734
…… …..…… …..A3A1 0.587A3A2 0.669…… …..…… …..
AmAn 0.897
May 1, 2023
19
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
20
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
21
SELECT BEST SET OF ATTRIBUTESA5
A4 A10 A8 A9
.….. .…..An
Ranking of Attributes
Best Set of Attributes
May 1, 2023
22
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
23
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
24
SELECT BEST SET OF ATTRIBUTESA4 A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
May 1, 2023
25
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
May 1, 2023
26
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
A5A4
May 1, 2023
27
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)Combined Balance value
May 1, 2023
28
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887 (previous)
A4 2nd ranked
A5A4 0.891 (new)Combined Balance value
new value > previous value
May 1, 2023
29
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5
Best Set of Attributes
A5 1st ranked 0.887
A4 2nd ranked
May 1, 2023
30
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
May 1, 2023
31
SELECT BEST SET OF ATTRIBUTES
A10 A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
May 1, 2023
32
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
May 1, 2023
33
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10
May 1, 2023
34
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked
A5A4A10 0.856 (new)Combined Balance value
May 1, 2023
35
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A40.891
(previous)
A10 3rd ranked
A5A4A10 0.856 (new)Combined Balance value
new value < previous value
May 1, 2023
36
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of Attributes
A5A4 0.891
A10 3rd ranked Discarde
d
May 1, 2023
37
SELECT BEST SET OF ATTRIBUTES
A8 A9
.….. .…..An
Ranking of Attributes
A5,A4
Best Set of AttributesContinue this process…….
May 1, 2023
38
SELECT BEST SET OF ATTRIBUTES
A5,A4,A9,A12,A7
Best Set of Attributes
May 1, 2023
39
RESULT AND DISCUSSIONS Data set : NASA MDP repository and PROMISE repository Classifier : Naïve Bayes Performance Metrics : Balance , AUC Programming Language : Java Machine Learning Tool : WEKA
May 1, 2023
40
PERFORMANCE MEASUREMENT SCALES
Confusion MatrixPredicted
Actual TP FNFP TN
May 1, 2023
41
RESULT AND DISCUSSIONS
Comparison of AUC values of
different methods
Date set
[5] [6] [7]
Lowest Highest CM1 0.702 0.723 0.550 0.724 0.7946KC1 0.79 0.790 0.592 0.800 0.8006KC2 - - 0.591 0.796 0.8449KC3 0.677 - 0.569 0.713 0.8322KC4 - - - - 0.8059MC1 - - - - 0.8110MC2 0.739 - - - 0.7340MW1 0.724 - 0.534 0.725 0.7340PC1 0.799 - 0.692 0.882 0.8369PC2 0.805 - - - 0.8668PC3 0.78 0.795 - - 0.8068PC4 0.861 - - - 0.9049PC5 - - - - 0.9624JM1 - 0.717 - - 0.7167AR1 - - - - 0.8167AR3 - - 0.580 0.699 0.8590AR4 - - 0.555 0.671 0.8681AR5 - - 0.614 0.722 0.925AR6 - - - - 0.7566
May 1, 2023
42
RESULT AND DISCUSSIONSDataset [2] [8] [9]
CM1 0.695 0.663 0.5500 0.680JM1 0.585 0.678 - 0.6152KC1 0.707 0.718 - 0.7244KC2 - 0.753 - 0.7835KC3 0.708 0.693 0.6037 0.7529KC4 0.691 - - 0.7036MC1 0.793 - - 0.6904MC2 0.614 0.620 - 0.6847MW1 0.661 0.636 0.7202 0.6577PC1 0.668 0.688 0.5719 0.7040PC2 - - 0.7046 0.7468PC3 0.711 0.749 0.7114 0.7232PC4 0.821 0.854 0.7450 0.8272PC5 0.904 - - 0.9046AR1 0.411 - - 0.6651AR3 0.661 - - 0.8238AR4 0.683 - - 0.7051AR6 0.492 - - 0.5471
Comparison of Balance values
of different methods
May 1, 2023
43
FUTURE WORK Cross-project defect prediction Using other publicly available datasets
May 1, 2023
44
REFERENCES[1] Nam, Jaechang, Sinno Jialin Pan, and Sunghun Kim. "Transfer defect learning."
In Proceedings of the 2013 International Conference on Software Engineering, pp. 382-391. IEEE Press, 2013.
[2] Song, Qinbao, Zihan Jia, Martin Shepperd, Shi Ying, and Shi Ying Jin Liu. "A general software defect-proneness prediction framework." Software Engineering, IEEE Transactions on 37, no. 3 (2011): 356-370
[3] Wang, Huanjing, Taghi M. Khoshgoftaar, and Naeem Seliya. "How many software metrics should be selected for defect prediction?" In FLAIRS Conference. 2011
[4] Gao, Kehan, Taghi M. Khoshgoftaar, and Huanjing Wang. "An empirical investigation of filter attribute selection techniques for software quality classification." In Information Reuse & Integration, 2009. IRI'09. IEEE International Conference on, pp. 272-277. IEEE, 2009.
[5] Wahono, Romi Satria, and Nanna Suryana Herman. "Genetic Feature Selection for Software Defect Prediction." Advanced Science Letters 20, no. 1 (2014): 239-244.
[6] Abaei, Golnoush, and Ali Selamat. "A survey on software fault detection based on different prediction approaches." Vietnam Journal of Computer Science 1, no. 2 (2014): 79-95.
May 1, 2023
45
REFERENCES [7] Ren, Jinsheng, Ke Qin, Ying Ma, and Guangchun Luo. "On software defect
prediction using machine learning." Journal of Applied Mathematics 2014 (2014). [8] Wang, Shuo, and Xin Yao. "Using class imbalance learning for software defect
prediction." Reliability, IEEE Transactions on 62, no. 2 (2013): 434-443. [9] Khan, Jobaer, Alim Ul Gias, Md Saeed Siddik, Md Hafizur Rahman, Shah Mostafa
Khaled, and Mohammad Shoyaib. "An attribute selection process for software defect prediction." In Informatics, Electronics & Vision (ICIEV), 2014 International Conference on, pp. 1-4. IEEE, 2014
May 1, 2023
46