studying the impact of social structures on software quality

Studying the Impact of Social Structures on Software Quality

1

Nicolas Bettenburg and Ahmed E. HassanQueen’s University, Kingston, Canada

James LarusMicrosoft Research

“If software developers were angels, debugging would be unnecessary!”

(in ‘Why programs fail’, Elsevier 2009)

2

Software errors cost companies millions of dollars

each year!

3

4

Where will the errors be?

Measure the source code...

complexitycohesioncouplingchurn

...bugs

5

6

...and create a Prediction Model

Measurements

Risk of Bugs

New Source Code

7

We can look inside the Box and study the model!

8

vs8

We can also use a prediction model to understand factors!

In this work: SOCIAL STRUCTURES

9

10

CASE STUDY

300,000 Bug Reports (BugZilla)

224,643 Transactions (CVS)

Linked throughChange-Log Analysis

11

Eclipse 3.0time

6 months

MeasureDiscussionMetrics

6 months

MeasurePost-Release

Bugs

12

DiscussionCONTENT

SocialSTRUCTURES

CommunicationDYNAMICS

Measures of WORKFLOW

4 Dimensionsof Measures

13

Bug 137808

Summary: Exceptions from createFromString lock-up the editor

Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>

Component: Core Assignee: Marcelo Paternostro <[email protected]>

Status: VERIFIED FIXED QA Contact:

Severity: normal

Priority: P3 CC: [email protected]

Version: 2.2

Target Milestone: ---

Hardware: PC

OS: Windows XP

Whiteboard:

Description: Opened: 2006-04-20 14:25 -0400

As discussed on the newsgroup under the Thread with the same name I am opening

this bug entry. Here is a history of the thread.

-- From Ed Merks

Patrick,

The value is checked before it's applied and can't be applied until it's valid.

But this BigDecimal cases behaves oddly because the exception thrown by

new BigDecimal("badvalue")

has a null message and the property editor relies on returning a non-null

message string to indicate there is an error.

Please open a bugzilla which I'll fix like this:

### Eclipse Workspace Patch 1.0

#P org.eclipse.emf.edit.ui

Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java

===================================================================

RCS file:

/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v

retrieving revision 1.10

diff -u -r1.10 PropertyDescriptor.java

--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006

16:42:30 -0000 1.10

+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006

11:59:10 -0000

@@ -162,7 +162,8 @@

}

catch (Exception exception)

{

- return exception.getMessage();

+ String message = exception.getMessage();

+ return message == null ? exception.toString() : message;

}

}

Diagnostic diagnostic =

Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);

Patrick Sodre wrote:

Hi,

It seems that if the user inputs an invalid parameter that gets created from

"createFromString" the Editor locks-up until the user explicitly calls "restore

Default Value".

Is this the expected behavior or could something better be done? For

instance if an exception is thrown restore the value back to what it was before

after displaying a pop-up error message.

I understand that for DataTypes defined by the user he/she should take care

of catching the exceptions but for the default ones like BigInteger/BigDecimal

I think the EMF runtime could do some of the grunt work...

If you think this is something worth pursuing I could post an entry in

Bugzilla.

Regards,

Patrick Sodre

Below is the stack trace that I got from the Editor...

java.lang.NumberFormatException

at java.math.BigDecimal.<init>(BigDecimal.java:368)


at

org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)

at

org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)

at

org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)

at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)

at

org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)

at

org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)

at

------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------

The fix has been committed to CVS. Thanks for reporting this problem.

------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 -------

Fixed in the I200604270000 built

------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 -------

Move to verified as per bug 206558.

Source Code

Metadata

Bug 137808





Severity: normal


Version: 2.2


Hardware: PC

OS: Windows XP

Whiteboard:




-- From Ed Merks

Patrick,










===================================================================

RCS file:





16:42:30 -0000 1.10


11:59:10 -0000

@@ -162,7 +162,8 @@

}


{




}

}




Hi,



Default Value".








Bugzilla.

Regards,

Patrick Sodre





at


at


at



at


at


at



------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 -------

Fixed in the I200604270000 built

------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 -------

Move to verified as per bug 206558.

Patch

Stack Trace

DiscussionCONTENT

SocialSTRUCTURES




16

# PARTICIPANTS IN DISCUSSION

ROLE OF PARTICIPANTS

CENTRALITY

Bug 137808





Severity: normal


Version: 2.2


Hardware: PC

OS: Windows XP

Whiteboard:




-- From Ed Merks

Patrick,










===================================================================

RCS file:





16:42:30 -0000 1.10


11:59:10 -0000

@@ -162,7 +162,8 @@

}


{




}

}




Hi,



Default Value".








Bugzilla.

Regards,

Patrick Sodre





at


at


at



at


at


at



REPUTATION

17

DiscussionCONTENT

SocialSTRUCTURES




18

Bug 137808





Severity: normal


Version: 2.2


Hardware: PC

OS: Windows XP

Whiteboard:




-- From Ed Merks

Patrick,










===================================================================

RCS file:





16:42:30 -0000 1.10


11:59:10 -0000

@@ -162,7 +162,8 @@

}


{




}

}




Hi,



Default Value".








Bugzilla.

Regards,

Patrick Sodre





at


at


at



at


at


at



# MESSAGES

MESSAGELENGTH

INTERESTINGNESS

REPLY TIME

19

Bug 137808





Severity: normal


Version: 2.2


Hardware: PC

OS: Windows XP

Whiteboard:




-- From Ed Merks

Patrick,










===================================================================

RCS file:





16:42:30 -0000 1.10


11:59:10 -0000

@@ -162,7 +162,8 @@

}


{




}

}




Hi,



Default Value".








Bugzilla.

Regards,

Patrick Sodre





at


at


at



at


at


at


The fix has been committed to CVS. Thanks for reporting this problem. 20

# MESSAGES

MESSAGELENGTH

INTERESTINGNESS

REPLY TIME

Bug 137808





Severity: normal


Version: 2.2


Hardware: PC

OS: Windows XP

Whiteboard:




-- From Ed Merks

Patrick,










===================================================================

RCS file:





16:42:30 -0000 1.10


11:59:10 -0000

@@ -162,7 +162,8 @@

}


{




}

}




Hi,



Default Value".








Bugzilla.

Regards,

Patrick Sodre





at


at


at



at


at


at



# MESSAGES

MESSAGELENGTH

INTERESTINGNESS

REPLY TIME

DiscussionCONTENT

SocialSTRUCTURES




23

24

2.3 Life Cycle of Bug Reports 7

UNCONFIRMED

ASSIGNED

FIXED

WORKSFORME

WONTFIX

INVALID

DUPLICATE

NEW / REOPEN VERIFIEDRESOLVED

CLOSED

Figure 2.1: The life cycle of a bug in the BUGZILLA system.

2.2 Life Cycle of Bug Reports

In order to manage bug reports and changes effectively, reports have a lifecycle. We illustrate the life cycle of a bug for the MOZILLA project. For

other projects like ECLIPSE the states may vary slightly (e.g. ECLIPSE has

no UNCONFIRMED state). Figure 2.1 shows the series of states bugs move

through over their lifetime. We will now describe these states in detail:

When a new report is filed from a user, it is assigned an initial status

UNCONFIRMED. Once it has either been confirmed by someone with appro-

priate rights, or it has received enough votes through the BUGZILLA system,

its status is changed to NEW. Once the responsibility for a bug report has

been decided, the status of the report is changed to ASSIGNED.

When the bug has been addressed, the bug report’s status is set to RESOLVED.

The bug report can be further marked to reflect the internal treatment: when

the fix has been verified by the quality assurance of the project, it will be

marked as VERIFIED. If it has been closed for good, its state is set to CLOSED.

The resolution status is used to record how a report was resolved. If the

developer made a change to the source code that fixes the bug, the resolution

status is set to FIXED. If the reported bug was already filed previously, the

report is marked as being identical to another report and the resolution status

is set to DUPLICATE. If the developer was unable to reproduce the reported

problem, the resolution status is set to WORKSFORME. If the described issue is

not considered to be an actual bug, the report’s resolution status is set to

INVALID.

If a previously resolved bug report needs to be addressed again later on, it

can be reopened. The bug report’s status is changed accordingly to REOPENED

and the cycle starts again until its next resolution.

REGRESSION MODEL

the amount of post-release defects into relation of our pre-release measures. The complete linear regression model hasthe form

Defects =�

i

αi · ContentMeasurei

+�

j

βj · StructuralMeasurej

+�

k

γk · ComDynMeasurek

+�

l

δl ·WorkflowMeasurel + �

Based on this model, we will investigate the statisticalrelationships between the social interaction measures, whichare represented by the regression variables in the model, andpost release defects, represented by the dependent variablein the model. We start with a preliminary analysis of theregression variables using descriptive statistics, to illustrategeneral properties of the measurements we collected. Next,we perform a correlation analysis to consider possible inter-relations between measurements. We then construct severallogistic regression models to investigate the relative impactthat each of the four dimensions of social measures has onpost-release defects. Our approach is similar to the work byCataldo and Mockus [10], [20].

We follow a hierarchical modelling approach when cre-ating these models: we start out with a baseline modelthat uses classical defect predictors as regression variables.We then build subsequent models to which we step-by-stepadd our content, structure, communication dynamics andworkflow measures, and report for each statistical modelthe explanatory power, χ2, of the model and the percentageof deviance explained. The deviance for each model Mi isdefined as D(Mi) = −2 · LL(Mi), with LL(Mi) denotingthe log-likelihood of the model, and the deviance explainedas a ratio between D(M0) = D(Defects ∼ Intercept)and D(Mi). In addition we test for each subsequent model,whether the difference from the model it is derived from isstatistically significant and present the corresponding p-level.A hierarchical modelling approach has the advantage overa step-wise modelling approach that it minimizes artificialinflation of errors and thus overfitting.

B. Preliminary Analysis of Social Interaction MeasuresOur four groups of social interaction measures (content,

structure, dynamics, workflow) represent different charac-teristics of collaborative development on work items thatare represented by issue reports. While content measuresare more explicit in capturing the information exchangedbetween developers and users, our measures of social struc-ture are more implicit and capture the latent relationshipsand roles of stakeholders. Table I presents a summary ofour measures in the form of descriptive statistics.

Due to a relatively high amount of skew, we apply a stan-dard log transformation to each social interaction measure-

Mean SD Min Max SkewPOST 1.16 2.28 0.00 35.00 5.00

NSOURCE 0.86 2.48 0.00 48.00 7.14NSCOM 0.27 0.49 0.00 5.00 2.77NPATCH 0.02 0.24 0.00 5.00 17.17PATCHS 0.01 0.11 0.00 3.00 13.26

NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04

NLINK 0.20 0.91 0.00 8.00 7.02NPART 3.61 3.89 1.00 40.00 7.48NDEVS 2.94 1.46 1.00 12.00 2.78

NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43

NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17

REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60

DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94

INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68

WAE 0.17 0.19 0.00 1.00 0.65

Table IDESCRIPTIVE STATISTICS OF SOCIAL INTERACTION MEASURES

ment to even out the skewing effects during modelling [9].Figure 1 summarizes the pairwise correlations between our20 regression variables and our dependent variable in acorrelogram visualization [14]. A correlogram reports foreach unique pair of variables the strength of the correlationas a colour-coded field (red for positive correlation, blue fornegative correlation) and the p-level at which the correlationis significant. This visualization technique allows us toidentify “hotspots” that need our attention.

We identify the following types of intercorrelations inour dataset that could pose problems in our statisticalmodelling. First, we observe correlations between measuresfrom different concepts. For example, the measure of in-terestingness (INT) has a moderate to high correlation withour measures for number of users (NUSERS), number ofparticipants (NPART), number of developers (NDEV), andnumber of links (NLINK). We believe the first of thesethree intercorrelations stem from a default setting in theissue tracking systems that puts contributors automaticallyon the notification lists, but can not offer an explanation forthe correlation between the URLs provided by users in thediscussion contents and interestingness.

Second, we observe correlations through redundancy. Weexpected such correlations when designing social interactionmeasures: naturally the number of participants (NPART) ishighly correlated with the number of users (NUSERS) andnumber of developers (NDEVS). However, our motivationfor incorporating such redundancy is to investigate, whethersplitting up the information into more specialized represen-tations helps to improve our model. The same intuition holdsfor the measure of centrality (SNACENT).

The third type of observed hotspots are moderate corre-lations between quantitative measures and qualitative mea-sures, e.g., between the number of patches and the average

25

REGRESSION MODEL


Defects =�

i


+�

j


+�

k


+�

l









NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04


NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43

NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17

REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60

DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94

INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68

WAE 0.17 0.19 0.00 1.00 0.65






Dependent Variable(what is predicted)

25

REGRESSION MODEL


Defects =�

i


+�

j


+�

k


+�

l









NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04


NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43

NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17

REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60

DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94

INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68

WAE 0.17 0.19 0.00 1.00 0.65







RegressionCoefficients

25

REGRESSION MODEL


Defects =�

i


+�

j


+�

k


+�

l









NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04


NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43

NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17

REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60

DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94

INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68

WAE 0.17 0.19 0.00 1.00 0.65







RegressionCoefficients

IndependentVariables

25

APPROACHHierarchical Analysis

(2) Build linear model step-by-step, each time adding new independent variables.

(3) At each step, evaluate how well the set of independent variables explains the data.

Goal: Investigate the effect of each variable on the model.

Successfully used in the past! (Cataldo, Mockus)

(1) Eliminate variables with high correlation.

26

APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.

!"

!#

$%&'

()*+,-.

()-*!

(/01-2

/01-2)

(1,0-.

1,0-.)

(34(5

(/0,1

(6.7)

(+).,)

)(0-.(1

(!)8

,./39

,./39.

63.(

63.(.

4(1

4(1.

:0

:0.

;;;;;;;;;;;

;;;;;;;;

;;;;;;;;;

;;;;;;

;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;

;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;

;;;;;;;;;;;; ;;;;;;;;;

;;;;;;;; ;;;;;; ;;;;;; ;;;;

;;;;; ;;;;;; ;;;;;; ;;;;;;

;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;

;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;

;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;

;;;;;;;;; ;;;;;;; ;;;;;;;

;;;;; ;;;; ;;;; ;;;;;;;;

;;;;;; ; ;;; ;;;;;;;;;;;;;

;;;;;;;;;;;; ;;;;;;;;

;;;;;;;;;;;;

;;;;;;;;; ;;;;;;;;;;;;;

;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;

;;;

:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'

!"##"<=

=<>

=<=

Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.

Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02

Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.

size of patches.Since we observe a substantial number of high correla-

tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the

model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.

C. Hierarchical Analysis Design

After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.

The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the

27


!"

!#

$%&'

()*+,-.

()-*!

(/01-2

/01-2)

(1,0-.

1,0-.)

(34(5

(/0,1

(6.7)

(+).,)

)(0-.(1

(!)8

,./39

,./39.

63.(

63.(.

4(1

4(1.

:0

:0.

;;;;;;;;;;;

;;;;;;;;

;;;;;;;;;

;;;;;;

;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;

;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;

;;;;;;;;;;;; ;;;;;;;;;

;;;;;;;; ;;;;;; ;;;;;; ;;;;

;;;;; ;;;;;; ;;;;;; ;;;;;;

;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;

;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;

;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;

;;;;;;;;; ;;;;;;; ;;;;;;;

;;;;; ;;;; ;;;; ;;;;;;;;

;;;;;; ; ;;; ;;;;;;;;;;;;;

;;;;;;;;;;;; ;;;;;;;;

;;;;;;;;;;;;

;;;;;;;;; ;;;;;;;;;;;;;

;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;

;;;

:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'

!"##"<=

=<>

=<=










Correlogram showscorrelation “hotspots”

27


!"

!#

$%&'

()*+,-.

()-*!

(/01-2

/01-2)

(1,0-.

1,0-.)

(34(5

(/0,1

(6.7)

(+).,)

)(0-.(1

(!)8

,./39

,./39.

63.(

63.(.

4(1

4(1.

:0

:0.

;;;;;;;;;;;

;;;;;;;;

;;;;;;;;;

;;;;;;

;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;

;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;

;;;;;;;;;;;; ;;;;;;;;;

;;;;;;;; ;;;;;; ;;;;;; ;;;;

;;;;; ;;;;;; ;;;;;; ;;;;;;

;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;

;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;

;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;

;;;;;;;;; ;;;;;;; ;;;;;;;

;;;;; ;;;; ;;;; ;;;;;;;;

;;;;;; ; ;;; ;;;;;;;;;;;;;

;;;;;;;;;;;; ;;;;;;;;

;;;;;;;;;;;;

;;;;;;;;; ;;;;;;;;;;;;;

;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;

;;;

:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'

!"##"<=

=<>

=<=










(1) Build model with all independent variables

(2) Measure VIF of each independent variable

(3) Remove variable with highest VIF

Repeat until all VIF below threshold (t=10)

28


(2) Build linear models step-by-step

log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1

Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.

odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.

We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].

Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-

ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.

Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.

Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-

Baseline Model: Code Churn

(Ball, Naggapan)

29











Adding first dimensionDiscussion Content

Increase in explanatory powerstatistically significant

30











Adding second dimension

Discussion Content

Increase in explanatory power:

not statistically significant!

31











32

Addingthird

dimension

Increase in explanatory

power significant!











33

Model based on Social Structures

14.41% Variation Explained

Model based onCode Metrics


34

(Zimmermann 2007)



Model based onCode Metrics


What happens when we combine both?

34

(Zimmermann 2007)



Traditional Model(Zimmermann, PROMISE 2007)


Combined Model30.77% Variation

Explained

35

Strongest Variables(Odds Ratios)

Size of PatchesWorkflowCode Churn# Code Examples# Links

36

Length of the DiscussionInterestigness of Report# PatchesReply Time

Strongest Variables(Odds Ratios)

Size of PatchesWorkflowCode Churn# Code Examples# Links

37

Length of the DiscussionInterestigness of Report# PatchesReply Time

Correlation,

not Causatio

n!

studying the impact of social structures on software quality

Education

exception exception

string message

return message

bug entry

edatatyp value patrick

nonnull message string

default value

patrick sodre component