msr2012 - explaining software defects using topic models
TRANSCRIPT
Explaining Software Defects Using Topic Models
Tse-Hsun (Peter) Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan
2
int readFile(String filePath){ fp =
readFile(filePath)if fp == NULLreturn -1
elsereturn fp
}
3
int readFile(String filePath){ fp =
readFile(filePath)
if fp == NULLreturn -1
elsereturn fp
}
int manageMemory(int index){
if mem[index] is not NULL{
freeInd = findFreeMemoryLoc()
goto(freeInd) }
}
4
int readFile(String filePath){ fp =
readFile(filePath)
if fp == NULLreturn -1
elsereturn fp
}
int manageMemory(int index){
if mem[index] is not NULL{
freeInd = findFreeMemoryLoc()
goto(freeInd) }
}
More Risky Concern
5
int readFile(String filePath){ fp =
readFile(filePath)
if fp == NULLreturn -1
elsereturn fp
}
int manageMemory(int index){
if mem[index] is not NULL{
freeInd = findFreeMemoryLoc()
goto(freeInd) }
}
More Risky Concern
Can we use concerns to study defects?
Capturing Concerns Using Topic Models
manage memory index mem free ind find free memory loc
read file file path fp file path fp
Topics Models(LDA)
Topic 1
Topic 2
read, file, path, fp, file, index,
ind
6
manage, memory, mem, free, find, loc
7
How defect prone are topics?
Can topics explain software defects?
Case Studies
3 versions of each system
0.4 - 8.8 MLOC
2.8 - 17 K files
1,300 ~ 6,500 post-release defects
8
9
How defect prone are topics?
Can topics explain software defects?
If some topics are more defect-prone than others...
We can allocate MORE testing resources on these
topics!
10
If some topics are more defect-prone than others...
We can allocate MORE testing resources on these
topics!
11
12
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
13
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
14
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
15
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
16
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
17
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
18
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
19
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
20
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
21
What is Relationship Between Defects and Topics?
22
What is Relationship Between Defects and Topics?
T3 T2 T1 T4
23
What is Relationship Between Defects and Topics?
T3 T2 T1 T4 T3 T2 T1 T4
24
What is Relationship Between Defects and Topics?
T3 T2 T1 T4
25
Few Topics are Defect-prone
26
Few Topics are Defect-prone
Task, Eclipse, Eclipse Mylyn, Task ui, Core,Repository
27
Few Topics are Defect-prone
Lower color,Jface,Comparison check
Task, Eclipse, Eclipse Mylyn, Task ui, Core,Repository
28
How defect prone are topics?
Can topics explain software defects?
Few Topics are Defect-prone!
29
How defect prone are topics?
Can topics explain software defects?
Few Topics are Defect-prone!
Explaining Defects
30
Explaining Defects
31
Static
Explaining Defects
32
Lines of CodeStatic
Explaining Defects
33
Lines of CodeStatic
Historical
Explaining Defects
34
Lines of Code
Pre-release DefectsCode Churn
Static
Historical
Explaining Defects
35
Lines of Code
Pre-release DefectsCode Churn
Static
Historical
TopicsTopic Metrics
36
F1
F2
F3
T1
T2
T3
T4
Using Topics to Explain Defects
37
F1
F2
F3
T1
T2
T3
T4
Using Topics to Explain Defects
38
F3
T1
T2
T3
T4
Using Topics to Explain Defects
F1
F2
Explainability of Metrics
39
Static
Explainability of Metrics
40
Static
Explainability of Metrics
41
Deviance Explained(D1)andAIC1
Static
Explainability of Metrics
42
Deviance Explained(D1)andAIC1
Static
Topics
Explainability of Metrics
43
Deviance Explained(D1)andAIC1
Static
StaticTopics
Explainability of Metrics
44
Deviance Explained(D1)andAIC1
D2 and AIC2
Static
StaticTopics
Explainability of Metrics
45
Deviance Explained(D1)andAIC1
D2 and AIC2
Improvement in Explainability = D2 – D1 and AIC2 – AIC1
Static
StaticTopics
More Topics More Defects in File
46
%A
vg. I m
p. in
D2
47
F1
F2
F3
T1
T2
T3
T4
Topic Membership Metrics:Few Topics are Defect-prone
Dealing with Large # of Metrics
48
Dealing with Large # of Metrics
49
Topic membership metrics may have as many as
500 variables!
Dealing with Large # of Metrics
50
Solution:Use PCA to reduce the number of metrics
Topic membership metrics may have as many as
500 variables!
Topic Memebership Metrics Explain Defects Even More
51
% A
vg. Im
p. in
AIC
52
How defect prone are topics?
Can topics explain software defects?
Few Topics are Defect-prone! YES!
Limitations
53
Limitations
54
1. Parameter Choices
Limitations
55
1. Parameter Choices•Number of topics
•Thresholds
Limitations
56
1. Parameter Choices•Number of topics
•Thresholds
2. Used Baseline Metrics
Static Historical
Limitations
57
1. Parameter Choices•Number of topics
•Thresholds
2. Used Baseline Metrics
3. Studied Three Subject Systems
Static Historical
Summary
Summary
Summary
Summary
Summary