dipenta msr2011-csbf

17
Gerardo Canfora, Luigi Cerulo, Marta Cimitile, Massimiliano Di Penta [email protected] Social Interactions around Cross-System Bug Fixings: The Case of FreeBSD and OpenBSD

Upload: massimiliano-penta

Post on 25-Dec-2014

360 views

Category:

Technology


0 download

DESCRIPTION

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

TRANSCRIPT

Page 1: Dipenta msr2011-csbf

Gerardo Canfora, Luigi Cerulo, Marta Cimitile, Massimiliano Di Penta

[email protected]

Social Interactions around Cross-System Bug Fixings:

The Case of FreeBSD and OpenBSD

Page 2: Dipenta msr2011-csbf

Context

Source code is often reused across different systems

Unixes (FreeBSD, OpenBSD, Linux)

Office applications (NeoOffice, OpenOffice)

Desktop environment apps (KDE or GNOME apps)

Maintenance might require to propagate bug fixings

We call this “Cross System Bug Fixing” (CSBF)

Example:

FreeBSD, 1996/01/19, file ip_icmp.h:

–“Added definitions for ICMP router discovery. Reviewed by: wollman

OpenBSD, 1996/08/02, file ip_icmp.h:

–“ICMP Router Discovery definitions; from FreeBSD”

Page 3: Dipenta msr2011-csbf

What we propose

A method to track CSBFs A study on the social characteristics

and development activity made by CSBF committersdegree, betweenness, brokeragecommits, lines changed

Page 4: Dipenta msr2011-csbf

Detecting CSBF - I

Step 1: mining cross-referencing commitsopenbsd, atphy.c,2008/09/25 20:47:16,brad,

Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

Step 2: mine commits previously performed on files with same name in the other system freebsd,atphy.c,2008/05/19 01:12:10,yongari,

Add Attansic/Atheros F1 PHY driver.

openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

Page 5: Dipenta msr2011-csbf

Detecting CSBF - II

Step 3: compute file similarity with clone detectionCCFinder

Threshold: at least 10% of cloned lines

Step 4: take the previous change with the highest textual similarity in the commit noteUse of Vector Space models

Cosine similarity; threshold (0.20) to filter out unrelated commits

Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

==

Add Attansic/Atheros F1 PHY driver.

0.72

Page 6: Dipenta msr2011-csbf

Building Committers' Network

We extract communication from mailing listsBug fixing mailing lists

Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emailsAlso, to map committer Ids to mailing list

names/emails

Nodes of the network labeled as:Committer / other mailing list contributorsCSBFs committer

Page 7: Dipenta msr2011-csbf

Empirical Study

Goal: analyze the phenomenon of CSBFs

Purpose: understanding its relevance with respect to the social characteristics of the involved developers

Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSDPeriod: 1993-2009 (FreeBSD), 1998-2009

(OpenBSD)Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)

Page 8: Dipenta msr2011-csbf

Research Questions

RQ1: How do the source code committers and contributors of the two systems overlap?

RQ2: How frequent is the phenomenon of CSBFs?

RQ3: Who are the contributors involved in CSBFs?

RQ4: Are mailing list contributors involved in CSBFs more active than others?

Page 9: Dipenta msr2011-csbf

RQ1 – Team overlap

FreeBSD OpenBSD Both

Committers 383 211 26

Mailing list contribs 8035 3843 359

Committers and mailing list contributors

213 122 17

The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different

Page 10: Dipenta msr2011-csbf

RQ2 – Commit filtering

FreeBSD OpenBSD0

100

200

300

400

500

600

700

800

900

1000

439

933

133

296

59120

Referring commits Cloned files Linked commits

At the end of the filtering not that many but...

Page 11: Dipenta msr2011-csbf

RQ2 – Cloned lines in CSBF files

Percentage smaller for .h files

Use of preprocessor conditional to make header files system-dependent

#if defined(__FreeBSD__)#if defined(__FreeBSD__)

C source files header files

Page 12: Dipenta msr2011-csbf

RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common

Page 13: Dipenta msr2011-csbf

RQ3: social characteristics Importance in terms of

(in/out) degree: number of (incoming/outcoming) communication links

Betweenness: number of communications for which the node is in the short path

Brokerage metrics: useful to analyze the communication between two clusters

B is a coordinator

B is a gatekeeper

B is a representative

Page 14: Dipenta msr2011-csbf

All differences statistically significant

High effect size (Cohen d>1)

Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems

Row 1 Row 2 Row 3 Row 40

2

4

6

8

10

12

Column 1Column 2Column 3

Degree

In-degree

Out-degree

Betweenness / 1000

Coordinator /10

Gatekeeper

Representative

0 5 10 15 20 25 30 35 40 45 50

CSBF Others

RQ3 – social characteristics

Page 15: Dipenta msr2011-csbf

RQ3 – committers with highest social metrics

Page 16: Dipenta msr2011-csbf

RQ4 – change activity of CSBF committers and others

FreeBSD OpenBSD0

20000

40000

LOC added/removed

CSBF Others

FreeBSD OpenBSD0

500

1000

1500

Commits

CSBF Others

All differences statistically significant

High effect size (Cohen d∼1)

Contributors involved in CSBF are more active than others

Page 17: Dipenta msr2011-csbf

Conclusions and Work-in-Progress

We proposed method to mine CSBF

We reported a study on FreeBSD and OpenBSD where:Development team is almost disjoint

There is a small, though not negligible portion of CSBF

Committers involved in CSBF have– Higher social importance

– Higher brokerage level

– Higher activity in source code commits

Work-in-progress:Better approaches to identify implicit CSBF, tracking and

linking changes occurring on both systems

More extensive study on less obvious cases