esem2014 traceability

17
Tracing Back the History of Commits in Low-tech Reviewing Environments Yujuan Jiang, Bram Adams, Daniel German and Foutse Khomh 1

Upload: yujuan-jiang

Post on 16-Jan-2017

134 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Tracing Back the History of Commits in Low-tech Reviewing Environments

Yujuan Jiang, Bram Adams, Daniel German and Foutse Khomh

1

3

linux-usb

linux-scsi

lkml

subsystemmaintainer1

subsystemmaintainer1

Reviewing: patch Integration: commit

maintainer Linus Torvalds

Email-based Reviewing Environment

contributor

contributor

contributor

SS MS

MM

Data Collection

Emails from mailing listsCommits from Git Repo

5

Linking

CCFinder (token-level)

+/- line-based (line-level)

Checksum-based (Chunk-level)

Research Questions

RQ1: Can commits be linked accurately to emails containing the corresponding patch version?

RQ2: Can emails containing different patch versions be linked accurately to each other?

RQ3: What are the characteristics of the reviewing history in a low-tech reviewing environment?

6

Evaluation: Precision

7384 samples of a technique

A BRelative Recall A

= 4/6=67%

Relative Recall B

= 3/6=50%

8

Evaluation: Relative Recall

Evaluation: Ground Truth

Fix one security bug [v2] Fix one security bug [v3] Fix one security bug

Case Study Result

RQ1: Can commits be linked accurately to emails containing the corresponding patch version?

RQ2: Can emails containing different patch versions be linked accurately to each other?

RQ3: What are the characteristics of the reviewing history in a low-tech reviewing environment?

10

Plus/Minus line technique has highest F-measure to link email patch to commit

Table1: statistics of email-commit links

+/- line result has highest relative

recall

checksum result has highest precision

+/- result has highest F-measure

Plus/Minus line technique has highest F-measure to link email patch to commit

take up more than 85%

Research Questions

RQ1: be linked accurately to emails containing the corresponding patch version?

RQ2: Can emails containing different patch versions be linked accurately to each other?

RQ3: What are the characteristics of the reviewing history in a low-tech reviewing environment?

13

Checksum technique has highest F-measure to link email to email

Table2: statistics of email-email links

checksum result has highest precision

+/- line result has highest relative recall

+/- line result has highest F-measure

Checksum technique has highest F-measure to link email to email

checksum&+/- takes up around

95%

Research Questions

RQ1: be linked accurately to emails containing the corresponding patch version?

RQ2: containing different patch versions be linked accurately to each other?

RQ3: What are the characteristics of the reviewing history in a low-tech reviewing environment?

16

17

25% of the MM patches has “hidden” reviewing history of more than four weeks.

Larger and impact more files

A new thread is started if too much time has passed

More bug-prone Higher acceptance rate