studying software quality using topic models

34
Studying Software Quality Using Topic Models Tse-Hsun (Peter) Chen

Upload: sailqu

Post on 13-Jan-2017

34 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Studying Software Quality Using Topic Models

Studying Software Quality Using Topic Models

Tse-Hsun (Peter) Chen

Page 2: Studying Software Quality Using Topic Models

Related Publications

2

Explaining Software Defects Using Topic Models, Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan, 9th Working Conference on Mining Software Repositories (MSR). Zurich, Switzerland. June 2-3, 2012 (acceptance rate: 18/64 (28%))

Studying the Effect of Testing on Code Quality using Topic Models, Tse-Hsun Chen, Stephen W. Thomas, Hadi Hemmati, Meiyappan Nagappan, Ahmed E. Hassan, under review for the Journal of Empirical Software Engineering. Springer Press (Impact Factor 1.854).

An Empirical Study of Concerns and Their Ability to Explain Defects in Large Software Systems, Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan, to be submitted for IEEE Transactions on Software Engineering (Impact Factor 1.98).

Page 3: Studying Software Quality Using Topic Models

Thesis Statement

3

Topics, which are approximations of software concerns, can be used to study software quality by better explaining the quality of code and helping allocate software quality assurance efforts effectively.

Page 4: Studying Software Quality Using Topic Models

4

int readFile(String filePath){ // reading filefp =

readFile(filePath)if fp == NULLreturn -1

elsereturn fp

}

int manageMemory(int index){

if mem[index] is not NULL{

// find free // memory

freeInd = findFreeMemoryLoc()

goto(freeInd)}}

More Risky Concern

Can we use concerns to study software quality?

Page 5: Studying Software Quality Using Topic Models

Capturing Concerns Using Topic Models

manage memory index mem free ind find free memory loc

read file file path fp file path fp

Topics Models(LDA)

Topic 1

Topic 2

read, file, path, fp, file

5

manage, memory, mem,

free

Topic 3Index, ind, find,

loc

60 %0 %40 %

0 %55 %45 %

Page 6: Studying Software Quality Using Topic Models

6

Studying code quality using topics

Studying code coverage using topics

CodeThings to

test

Page 7: Studying Software Quality Using Topic Models

7

Studying code quality using topics

Studying code coverage using topics

CodeThings to

test

Page 8: Studying Software Quality Using Topic Models

8

How defect prone are topics?

Can topics help explain software defects?

Studying Code Quality Using Topics

Page 9: Studying Software Quality Using Topic Models

Are Topics Equally Defect-prone?

9

If they are, then we CANNOT use topics to study code quality

[MSR 2012]

Page 10: Studying Software Quality Using Topic Models

10

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

[MSR 2012]

Page 11: Studying Software Quality Using Topic Models

11

F1

F2

F3

T1

T2

T3

T4

Measuring Topic Defect-proneness

[MSR 2012]

Page 12: Studying Software Quality Using Topic Models

12

Few Topics are Defect-prone

Jface,Comparison check

Task, Eclipse, Task ui,Repository

[MSR 2012]

Topi

c D

efec

t Den

sity

Page 13: Studying Software Quality Using Topic Models

Explaining Defects

13

Lines of Code

Pre-release DefectsCode Churn

Static

Historical

Topics Topic Metrics

[MSR 2012]

Page 14: Studying Software Quality Using Topic Models

Explainability of Metrics

14

Deviance Explained(D1)

D2

Improvement in Explainability = D2 – D1

Static

StaticTopics

[MSR 2012]

Page 15: Studying Software Quality Using Topic Models

15

F1

F2

F3

T1

T2

T3

T4

Using Topics to Explain DefectsNumber of Topics

[MSR 2012]

Page 16: Studying Software Quality Using Topic Models

16

F1

F2

F3

T1

T2

T3

T4

Using Topics to Explain DefectsNumber of Topics

[MSR 2012]

Page 17: Studying Software Quality Using Topic Models

17

F3

T1

T2

T3

T4

Using Topics to Explain DefectsNumber of Defect-prone

Topics

F1

F2

[MSR 2012]

Page 18: Studying Software Quality Using Topic Models

More Topics More Defects in File

Series10

10

20

30

40

50

60

30 %

48 %

18

Avg.

% Im

prov

emen

t in

D2

[MSR 2012]

Series10

5

10

15

20

25 21 % 21 %

49 %

0 %

7 % 6 %

Number of Topics

Number of Defect-prone Topics

Page 19: Studying Software Quality Using Topic Models

Compare with Other Cohesion/Coupling Metrics

19

# of topics and other topic-based metrics, which one is better?

# of topics?

[TSE 201X]

Page 20: Studying Software Quality Using Topic Models

# Topics Outperforms Others

20

Series10

5

10

15

20

25

30

35

40

45

%Av

g. Im

prov

emen

t in

D2

over

bas

e

# topics (our metric) State-of-the-arts metrics

39 %

3 % 3 %

20 %

[TSE 201X]

Page 21: Studying Software Quality Using Topic Models

21

Studying code quality using topics

Studying code coverage using topics

CodeThings to

test

Page 22: Studying Software Quality Using Topic Models

22

Studying code quality using topics

Studying code coverage using topics

CodeThings to

test

Page 23: Studying Software Quality Using Topic Models

We found only a few topics are defect-prone…

C an we allocate MORE testing resources on low tested but defect prone

topics?

23

Page 24: Studying Software Quality Using Topic Models

24

Can we predict low unit tested and high defect-prone topics?

Studying Code Coverage Using Topics

Relationship between code coverage and quality?

Page 25: Studying Software Quality Using Topic Models

Measuring Topic Testedness

25

F1

T1

T1

T2

[EMSE 201X]

Topic Testedness: how much a topic is tested

Page 26: Studying Software Quality Using Topic Models

More Unit Tested, Less Defect Prone

26[EMSE 201X]

Page 27: Studying Software Quality Using Topic Models

Predict LTHD Topics Accurately

27

Series10.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0.82

Avg.

F-M

easu

re

[EMSE 201X]

0.8

0.76

0.68

Page 28: Studying Software Quality Using Topic Models

Can We Give Improvements to Existing Approach?

Tester usually test at concern level…but existing approaches do not satisfy it

28

Can we HELP existing test allocation approach?

[EMSE 201X]

Page 29: Studying Software Quality Using Topic Models

Low Overlap With Existing Approach – Prediction Model

29

Top N buggy files that may need more test

Top N buggy files found

On average, only 5.3% overlapping files

[EMSE 201X]

Our ApproachPrediction–based

Approach

Page 30: Studying Software Quality Using Topic Models

File Defect DensityNumber of Bugs

30

Lines of CodeFile Defect Density =

A measure for estimating efforts for finding bugs

Page 31: Studying Software Quality Using Topic Models

Files We Found Have Higher Defect Density

31

Series10

50

100

150

200

250

300

Avg.

% D

efec

t Den

sity

Impr

ovem

ents

[EMSE 201X]

64 %

242 %

30 %

Page 32: Studying Software Quality Using Topic Models

32

Studying code quality using topics

Studying code coverage using topics

CodeThings to

test

Page 33: Studying Software Quality Using Topic Models

Thesis Statement

33

Topics, which are approximations of software concerns, can be used to study software quality by better explaining the quality of code and helping allocate software quality assurance efforts effectively.

Page 34: Studying Software Quality Using Topic Models

34

Code

Study Code Quality using Topics

Relationship between defects and topics

Use topicsTo explaindefects

Study Code Coverage using Topics

Relationship between topic testedness and defects

Predict low unit tested and defect prone topics

Things to

test