dipenta msr2011-csbf

Post on 25-Dec-2014

361 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Social Interactions around Cross-System Bug Fixings: the Case of FreeBSD and OpenBSD

TRANSCRIPT

Gerardo Canfora, Luigi Cerulo, Marta Cimitile, Massimiliano Di Penta

dipenta@unisannio.it

Social Interactions around Cross-System Bug Fixings:

The Case of FreeBSD and OpenBSD

Context

Source code is often reused across different systems

Unixes (FreeBSD, OpenBSD, Linux)

Office applications (NeoOffice, OpenOffice)

Desktop environment apps (KDE or GNOME apps)

Maintenance might require to propagate bug fixings

We call this “Cross System Bug Fixing” (CSBF)

Example:

FreeBSD, 1996/01/19, file ip_icmp.h:

–“Added definitions for ICMP router discovery. Reviewed by: wollman

OpenBSD, 1996/08/02, file ip_icmp.h:

–“ICMP Router Discovery definitions; from FreeBSD”

What we propose

A method to track CSBFs A study on the social characteristics

and development activity made by CSBF committersdegree, betweenness, brokeragecommits, lines changed

Detecting CSBF - I

Step 1: mining cross-referencing commitsopenbsd, atphy.c,2008/09/25 20:47:16,brad,

Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

Step 2: mine commits previously performed on files with same name in the other system freebsd,atphy.c,2008/05/19 01:12:10,yongari,

Add Attansic/Atheros F1 PHY driver.

openbsd, atphy.c,2008/09/25 20:47:16,brad, Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

Detecting CSBF - II

Step 3: compute file similarity with clone detectionCCFinder

Threshold: at least 10% of cloned lines

Step 4: take the previous change with the highest textual similarity in the commit noteUse of Vector Space models

Cosine similarity; threshold (0.20) to filter out unrelated commits

Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@

==

Add Attansic/Atheros F1 PHY driver.

0.72

Building Committers' Network

We extract communication from mailing listsBug fixing mailing lists

Heuristic similar to the one of Bird et al. [2006] to map inconsistent namings / emailsAlso, to map committer Ids to mailing list

names/emails

Nodes of the network labeled as:Committer / other mailing list contributorsCSBFs committer

Empirical Study

Goal: analyze the phenomenon of CSBFs

Purpose: understanding its relevance with respect to the social characteristics of the involved developers

Context: CVS repositories and mailing lists archives of FreeBSD and OpenBSDPeriod: 1993-2009 (FreeBSD), 1998-2009

(OpenBSD)Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)

Research Questions

RQ1: How do the source code committers and contributors of the two systems overlap?

RQ2: How frequent is the phenomenon of CSBFs?

RQ3: Who are the contributors involved in CSBFs?

RQ4: Are mailing list contributors involved in CSBFs more active than others?

RQ1 – Team overlap

FreeBSD OpenBSD Both

Committers 383 211 26

Mailing list contribs 8035 3843 359

Committers and mailing list contributors

213 122 17

The two projects have less than 10% of common contributors → the development team of Free and Open BSD is really different

RQ2 – Commit filtering

FreeBSD OpenBSD0

100

200

300

400

500

600

700

800

900

1000

439

933

133

296

59120

Referring commits Cloned files Linked commits

At the end of the filtering not that many but...

RQ2 – Cloned lines in CSBF files

Percentage smaller for .h files

Use of preprocessor conditional to make header files system-dependent

#if defined(__FreeBSD__)#if defined(__FreeBSD__)

C source files header files

RQ3 – CSBF Graph (excerpt)Blue/cyan: FreeBSDRed/orange: OpenBSDYellow: common

RQ3: social characteristics Importance in terms of

(in/out) degree: number of (incoming/outcoming) communication links

Betweenness: number of communications for which the node is in the short path

Brokerage metrics: useful to analyze the communication between two clusters

B is a coordinator

B is a gatekeeper

B is a representative

All differences statistically significant

High effect size (Cohen d>1)

Contributors involved in CSBF have a higher importance in the communication and in the flow of communication between systems

Row 1 Row 2 Row 3 Row 40

2

4

6

8

10

12

Column 1Column 2Column 3

Degree

In-degree

Out-degree

Betweenness / 1000

Coordinator /10

Gatekeeper

Representative

0 5 10 15 20 25 30 35 40 45 50

CSBF Others

RQ3 – social characteristics

RQ3 – committers with highest social metrics

RQ4 – change activity of CSBF committers and others

FreeBSD OpenBSD0

20000

40000

LOC added/removed

CSBF Others

FreeBSD OpenBSD0

500

1000

1500

Commits

CSBF Others

All differences statistically significant

High effect size (Cohen d∼1)

Contributors involved in CSBF are more active than others

Conclusions and Work-in-Progress

We proposed method to mine CSBF

We reported a study on FreeBSD and OpenBSD where:Development team is almost disjoint

There is a small, though not negligible portion of CSBF

Committers involved in CSBF have– Higher social importance

– Higher brokerage level

– Higher activity in source code commits

Work-in-progress:Better approaches to identify implicit CSBF, tracking and

linking changes occurring on both systems

More extensive study on less obvious cases

top related