studying the impact of social structures on software quality
DESCRIPTION
Talk given at the ICPC'10: 18th IEEE International Conference on Program Comprehension, Braga, Portugal.TRANSCRIPT
Studying the Impact of Social Structures on Software Quality
1
Nicolas Bettenburg and Ahmed E. HassanQueen’s University, Kingston, Canada
James LarusMicrosoft Research
“If software developers were angels, debugging would be unnecessary!”
(in ‘Why programs fail’, Elsevier 2009)
2
Software errors cost companies millions of dollars
each year!
3
4
Where will the errors be?
Measure the source code...
complexitycohesioncouplingchurn
...bugs
5
6
...and create a Prediction Model
Measurements
Risk of Bugs
New Source Code
7
We can look inside the Box and study the model!
8
vs8
We can also use a prediction model to understand factors!
In this work: SOCIAL STRUCTURES
9
10
CASE STUDY
300,000 Bug Reports (BugZilla)
224,643 Transactions (CVS)
Linked throughChange-Log Analysis
11
Eclipse 3.0time
6 months
MeasureDiscussionMetrics
6 months
MeasurePost-Release
Bugs
12
DiscussionCONTENT
SocialSTRUCTURES
CommunicationDYNAMICS
Measures of WORKFLOW
4 Dimensionsof Measures
13
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem.
------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 -------
Fixed in the I200604270000 built
------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 -------
Move to verified as per bug 206558.
Source Code
Metadata
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem.
------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 -------
Fixed in the I200604270000 built
------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 -------
Move to verified as per bug 206558.
Patch
Stack Trace
DiscussionCONTENT
SocialSTRUCTURES
CommunicationDYNAMICS
Measures of WORKFLOW
4 Dimensionsof Measures
16
# PARTICIPANTS IN DISCUSSION
ROLE OF PARTICIPANTS
CENTRALITY
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem.
REPUTATION
17
DiscussionCONTENT
SocialSTRUCTURES
CommunicationDYNAMICS
Measures of WORKFLOW
4 Dimensionsof Measures
18
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem.
# MESSAGES
MESSAGELENGTH
INTERESTINGNESS
REPLY TIME
19
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem. 20
# MESSAGES
MESSAGELENGTH
INTERESTINGNESS
REPLY TIME
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem. 21
# MESSAGES
MESSAGELENGTH
INTERESTINGNESS
REPLY TIME
Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <[email protected]>
Component: Core Assignee: Marcelo Paternostro <[email protected]>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: [email protected]
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description: Opened: 2006-04-20 14:25 -0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at java.math.BigDecimal.<init>(BigDecimal.java:647)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem. 22
# MESSAGES
MESSAGELENGTH
INTERESTINGNESS
REPLY TIME
DiscussionCONTENT
SocialSTRUCTURES
CommunicationDYNAMICS
Measures of WORKFLOW
4 Dimensionsof Measures
23
24
2.3 Life Cycle of Bug Reports 7
UNCONFIRMED
ASSIGNED
FIXED
WORKSFORME
WONTFIX
INVALID
DUPLICATE
NEW / REOPEN VERIFIEDRESOLVED
CLOSED
Figure 2.1: The life cycle of a bug in the BUGZILLA system.
2.2 Life Cycle of Bug Reports
In order to manage bug reports and changes effectively, reports have a lifecycle. We illustrate the life cycle of a bug for the MOZILLA project. For
other projects like ECLIPSE the states may vary slightly (e.g. ECLIPSE has
no UNCONFIRMED state). Figure 2.1 shows the series of states bugs move
through over their lifetime. We will now describe these states in detail:
When a new report is filed from a user, it is assigned an initial status
UNCONFIRMED. Once it has either been confirmed by someone with appro-
priate rights, or it has received enough votes through the BUGZILLA system,
its status is changed to NEW. Once the responsibility for a bug report has
been decided, the status of the report is changed to ASSIGNED.
When the bug has been addressed, the bug report’s status is set to RESOLVED.
The bug report can be further marked to reflect the internal treatment: when
the fix has been verified by the quality assurance of the project, it will be
marked as VERIFIED. If it has been closed for good, its state is set to CLOSED.
The resolution status is used to record how a report was resolved. If the
developer made a change to the source code that fixes the bug, the resolution
status is set to FIXED. If the reported bug was already filed previously, the
report is marked as being identical to another report and the resolution status
is set to DUPLICATE. If the developer was unable to reproduce the reported
problem, the resolution status is set to WORKSFORME. If the described issue is
not considered to be an actual bug, the report’s resolution status is set to
INVALID.
If a previously resolved bug report needs to be addressed again later on, it
can be reopened. The bug report’s status is changed accordingly to REOPENED
and the cycle starts again until its next resolution.
REGRESSION MODEL
the amount of post-release defects into relation of our pre-release measures. The complete linear regression model hasthe form
Defects =�
i
αi · ContentMeasurei
+�
j
βj · StructuralMeasurej
+�
k
γk · ComDynMeasurek
+�
l
δl ·WorkflowMeasurel + �
Based on this model, we will investigate the statisticalrelationships between the social interaction measures, whichare represented by the regression variables in the model, andpost release defects, represented by the dependent variablein the model. We start with a preliminary analysis of theregression variables using descriptive statistics, to illustrategeneral properties of the measurements we collected. Next,we perform a correlation analysis to consider possible inter-relations between measurements. We then construct severallogistic regression models to investigate the relative impactthat each of the four dimensions of social measures has onpost-release defects. Our approach is similar to the work byCataldo and Mockus [10], [20].
We follow a hierarchical modelling approach when cre-ating these models: we start out with a baseline modelthat uses classical defect predictors as regression variables.We then build subsequent models to which we step-by-stepadd our content, structure, communication dynamics andworkflow measures, and report for each statistical modelthe explanatory power, χ2, of the model and the percentageof deviance explained. The deviance for each model Mi isdefined as D(Mi) = −2 · LL(Mi), with LL(Mi) denotingthe log-likelihood of the model, and the deviance explainedas a ratio between D(M0) = D(Defects ∼ Intercept)and D(Mi). In addition we test for each subsequent model,whether the difference from the model it is derived from isstatistically significant and present the corresponding p-level.A hierarchical modelling approach has the advantage overa step-wise modelling approach that it minimizes artificialinflation of errors and thus overfitting.
B. Preliminary Analysis of Social Interaction MeasuresOur four groups of social interaction measures (content,
structure, dynamics, workflow) represent different charac-teristics of collaborative development on work items thatare represented by issue reports. While content measuresare more explicit in capturing the information exchangedbetween developers and users, our measures of social struc-ture are more implicit and capture the latent relationshipsand roles of stakeholders. Table I presents a summary ofour measures in the form of descriptive statistics.
Due to a relatively high amount of skew, we apply a stan-dard log transformation to each social interaction measure-
Mean SD Min Max SkewPOST 1.16 2.28 0.00 35.00 5.00
NSOURCE 0.86 2.48 0.00 48.00 7.14NSCOM 0.27 0.49 0.00 5.00 2.77NPATCH 0.02 0.24 0.00 5.00 17.17PATCHS 0.01 0.11 0.00 3.00 13.26
NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04
NLINK 0.20 0.91 0.00 8.00 7.02NPART 3.61 3.89 1.00 40.00 7.48NDEVS 2.94 1.46 1.00 12.00 2.78
NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43
NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17
REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60
DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94
INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68
WAE 0.17 0.19 0.00 1.00 0.65
Table IDESCRIPTIVE STATISTICS OF SOCIAL INTERACTION MEASURES
ment to even out the skewing effects during modelling [9].Figure 1 summarizes the pairwise correlations between our20 regression variables and our dependent variable in acorrelogram visualization [14]. A correlogram reports foreach unique pair of variables the strength of the correlationas a colour-coded field (red for positive correlation, blue fornegative correlation) and the p-level at which the correlationis significant. This visualization technique allows us toidentify “hotspots” that need our attention.
We identify the following types of intercorrelations inour dataset that could pose problems in our statisticalmodelling. First, we observe correlations between measuresfrom different concepts. For example, the measure of in-terestingness (INT) has a moderate to high correlation withour measures for number of users (NUSERS), number ofparticipants (NPART), number of developers (NDEV), andnumber of links (NLINK). We believe the first of thesethree intercorrelations stem from a default setting in theissue tracking systems that puts contributors automaticallyon the notification lists, but can not offer an explanation forthe correlation between the URLs provided by users in thediscussion contents and interestingness.
Second, we observe correlations through redundancy. Weexpected such correlations when designing social interactionmeasures: naturally the number of participants (NPART) ishighly correlated with the number of users (NUSERS) andnumber of developers (NDEVS). However, our motivationfor incorporating such redundancy is to investigate, whethersplitting up the information into more specialized represen-tations helps to improve our model. The same intuition holdsfor the measure of centrality (SNACENT).
The third type of observed hotspots are moderate corre-lations between quantitative measures and qualitative mea-sures, e.g., between the number of patches and the average
25
REGRESSION MODEL
the amount of post-release defects into relation of our pre-release measures. The complete linear regression model hasthe form
Defects =�
i
αi · ContentMeasurei
+�
j
βj · StructuralMeasurej
+�
k
γk · ComDynMeasurek
+�
l
δl ·WorkflowMeasurel + �
Based on this model, we will investigate the statisticalrelationships between the social interaction measures, whichare represented by the regression variables in the model, andpost release defects, represented by the dependent variablein the model. We start with a preliminary analysis of theregression variables using descriptive statistics, to illustrategeneral properties of the measurements we collected. Next,we perform a correlation analysis to consider possible inter-relations between measurements. We then construct severallogistic regression models to investigate the relative impactthat each of the four dimensions of social measures has onpost-release defects. Our approach is similar to the work byCataldo and Mockus [10], [20].
We follow a hierarchical modelling approach when cre-ating these models: we start out with a baseline modelthat uses classical defect predictors as regression variables.We then build subsequent models to which we step-by-stepadd our content, structure, communication dynamics andworkflow measures, and report for each statistical modelthe explanatory power, χ2, of the model and the percentageof deviance explained. The deviance for each model Mi isdefined as D(Mi) = −2 · LL(Mi), with LL(Mi) denotingthe log-likelihood of the model, and the deviance explainedas a ratio between D(M0) = D(Defects ∼ Intercept)and D(Mi). In addition we test for each subsequent model,whether the difference from the model it is derived from isstatistically significant and present the corresponding p-level.A hierarchical modelling approach has the advantage overa step-wise modelling approach that it minimizes artificialinflation of errors and thus overfitting.
B. Preliminary Analysis of Social Interaction MeasuresOur four groups of social interaction measures (content,
structure, dynamics, workflow) represent different charac-teristics of collaborative development on work items thatare represented by issue reports. While content measuresare more explicit in capturing the information exchangedbetween developers and users, our measures of social struc-ture are more implicit and capture the latent relationshipsand roles of stakeholders. Table I presents a summary ofour measures in the form of descriptive statistics.
Due to a relatively high amount of skew, we apply a stan-dard log transformation to each social interaction measure-
Mean SD Min Max SkewPOST 1.16 2.28 0.00 35.00 5.00
NSOURCE 0.86 2.48 0.00 48.00 7.14NSCOM 0.27 0.49 0.00 5.00 2.77NPATCH 0.02 0.24 0.00 5.00 17.17PATCHS 0.01 0.11 0.00 3.00 13.26
NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04
NLINK 0.20 0.91 0.00 8.00 7.02NPART 3.61 3.89 1.00 40.00 7.48NDEVS 2.94 1.46 1.00 12.00 2.78
NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43
NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17
REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60
DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94
INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68
WAE 0.17 0.19 0.00 1.00 0.65
Table IDESCRIPTIVE STATISTICS OF SOCIAL INTERACTION MEASURES
ment to even out the skewing effects during modelling [9].Figure 1 summarizes the pairwise correlations between our20 regression variables and our dependent variable in acorrelogram visualization [14]. A correlogram reports foreach unique pair of variables the strength of the correlationas a colour-coded field (red for positive correlation, blue fornegative correlation) and the p-level at which the correlationis significant. This visualization technique allows us toidentify “hotspots” that need our attention.
We identify the following types of intercorrelations inour dataset that could pose problems in our statisticalmodelling. First, we observe correlations between measuresfrom different concepts. For example, the measure of in-terestingness (INT) has a moderate to high correlation withour measures for number of users (NUSERS), number ofparticipants (NPART), number of developers (NDEV), andnumber of links (NLINK). We believe the first of thesethree intercorrelations stem from a default setting in theissue tracking systems that puts contributors automaticallyon the notification lists, but can not offer an explanation forthe correlation between the URLs provided by users in thediscussion contents and interestingness.
Second, we observe correlations through redundancy. Weexpected such correlations when designing social interactionmeasures: naturally the number of participants (NPART) ishighly correlated with the number of users (NUSERS) andnumber of developers (NDEVS). However, our motivationfor incorporating such redundancy is to investigate, whethersplitting up the information into more specialized represen-tations helps to improve our model. The same intuition holdsfor the measure of centrality (SNACENT).
The third type of observed hotspots are moderate corre-lations between quantitative measures and qualitative mea-sures, e.g., between the number of patches and the average
Dependent Variable(what is predicted)
25
REGRESSION MODEL
the amount of post-release defects into relation of our pre-release measures. The complete linear regression model hasthe form
Defects =�
i
αi · ContentMeasurei
+�
j
βj · StructuralMeasurej
+�
k
γk · ComDynMeasurek
+�
l
δl ·WorkflowMeasurel + �
Based on this model, we will investigate the statisticalrelationships between the social interaction measures, whichare represented by the regression variables in the model, andpost release defects, represented by the dependent variablein the model. We start with a preliminary analysis of theregression variables using descriptive statistics, to illustrategeneral properties of the measurements we collected. Next,we perform a correlation analysis to consider possible inter-relations between measurements. We then construct severallogistic regression models to investigate the relative impactthat each of the four dimensions of social measures has onpost-release defects. Our approach is similar to the work byCataldo and Mockus [10], [20].
We follow a hierarchical modelling approach when cre-ating these models: we start out with a baseline modelthat uses classical defect predictors as regression variables.We then build subsequent models to which we step-by-stepadd our content, structure, communication dynamics andworkflow measures, and report for each statistical modelthe explanatory power, χ2, of the model and the percentageof deviance explained. The deviance for each model Mi isdefined as D(Mi) = −2 · LL(Mi), with LL(Mi) denotingthe log-likelihood of the model, and the deviance explainedas a ratio between D(M0) = D(Defects ∼ Intercept)and D(Mi). In addition we test for each subsequent model,whether the difference from the model it is derived from isstatistically significant and present the corresponding p-level.A hierarchical modelling approach has the advantage overa step-wise modelling approach that it minimizes artificialinflation of errors and thus overfitting.
B. Preliminary Analysis of Social Interaction MeasuresOur four groups of social interaction measures (content,
structure, dynamics, workflow) represent different charac-teristics of collaborative development on work items thatare represented by issue reports. While content measuresare more explicit in capturing the information exchangedbetween developers and users, our measures of social struc-ture are more implicit and capture the latent relationshipsand roles of stakeholders. Table I presents a summary ofour measures in the form of descriptive statistics.
Due to a relatively high amount of skew, we apply a stan-dard log transformation to each social interaction measure-
Mean SD Min Max SkewPOST 1.16 2.28 0.00 35.00 5.00
NSOURCE 0.86 2.48 0.00 48.00 7.14NSCOM 0.27 0.49 0.00 5.00 2.77NPATCH 0.02 0.24 0.00 5.00 17.17PATCHS 0.01 0.11 0.00 3.00 13.26
NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04
NLINK 0.20 0.91 0.00 8.00 7.02NPART 3.61 3.89 1.00 40.00 7.48NDEVS 2.94 1.46 1.00 12.00 2.78
NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43
NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17
REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60
DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94
INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68
WAE 0.17 0.19 0.00 1.00 0.65
Table IDESCRIPTIVE STATISTICS OF SOCIAL INTERACTION MEASURES
ment to even out the skewing effects during modelling [9].Figure 1 summarizes the pairwise correlations between our20 regression variables and our dependent variable in acorrelogram visualization [14]. A correlogram reports foreach unique pair of variables the strength of the correlationas a colour-coded field (red for positive correlation, blue fornegative correlation) and the p-level at which the correlationis significant. This visualization technique allows us toidentify “hotspots” that need our attention.
We identify the following types of intercorrelations inour dataset that could pose problems in our statisticalmodelling. First, we observe correlations between measuresfrom different concepts. For example, the measure of in-terestingness (INT) has a moderate to high correlation withour measures for number of users (NUSERS), number ofparticipants (NPART), number of developers (NDEV), andnumber of links (NLINK). We believe the first of thesethree intercorrelations stem from a default setting in theissue tracking systems that puts contributors automaticallyon the notification lists, but can not offer an explanation forthe correlation between the URLs provided by users in thediscussion contents and interestingness.
Second, we observe correlations through redundancy. Weexpected such correlations when designing social interactionmeasures: naturally the number of participants (NPART) ishighly correlated with the number of users (NUSERS) andnumber of developers (NDEVS). However, our motivationfor incorporating such redundancy is to investigate, whethersplitting up the information into more specialized represen-tations helps to improve our model. The same intuition holdsfor the measure of centrality (SNACENT).
The third type of observed hotspots are moderate corre-lations between quantitative measures and qualitative mea-sures, e.g., between the number of patches and the average
Dependent Variable(what is predicted)
RegressionCoefficients
25
REGRESSION MODEL
the amount of post-release defects into relation of our pre-release measures. The complete linear regression model hasthe form
Defects =�
i
αi · ContentMeasurei
+�
j
βj · StructuralMeasurej
+�
k
γk · ComDynMeasurek
+�
l
δl ·WorkflowMeasurel + �
Based on this model, we will investigate the statisticalrelationships between the social interaction measures, whichare represented by the regression variables in the model, andpost release defects, represented by the dependent variablein the model. We start with a preliminary analysis of theregression variables using descriptive statistics, to illustrategeneral properties of the measurements we collected. Next,we perform a correlation analysis to consider possible inter-relations between measurements. We then construct severallogistic regression models to investigate the relative impactthat each of the four dimensions of social measures has onpost-release defects. Our approach is similar to the work byCataldo and Mockus [10], [20].
We follow a hierarchical modelling approach when cre-ating these models: we start out with a baseline modelthat uses classical defect predictors as regression variables.We then build subsequent models to which we step-by-stepadd our content, structure, communication dynamics andworkflow measures, and report for each statistical modelthe explanatory power, χ2, of the model and the percentageof deviance explained. The deviance for each model Mi isdefined as D(Mi) = −2 · LL(Mi), with LL(Mi) denotingthe log-likelihood of the model, and the deviance explainedas a ratio between D(M0) = D(Defects ∼ Intercept)and D(Mi). In addition we test for each subsequent model,whether the difference from the model it is derived from isstatistically significant and present the corresponding p-level.A hierarchical modelling approach has the advantage overa step-wise modelling approach that it minimizes artificialinflation of errors and thus overfitting.
B. Preliminary Analysis of Social Interaction MeasuresOur four groups of social interaction measures (content,
structure, dynamics, workflow) represent different charac-teristics of collaborative development on work items thatare represented by issue reports. While content measuresare more explicit in capturing the information exchangedbetween developers and users, our measures of social struc-ture are more implicit and capture the latent relationshipsand roles of stakeholders. Table I presents a summary ofour measures in the form of descriptive statistics.
Due to a relatively high amount of skew, we apply a stan-dard log transformation to each social interaction measure-
Mean SD Min Max SkewPOST 1.16 2.28 0.00 35.00 5.00
NSOURCE 0.86 2.48 0.00 48.00 7.14NSCOM 0.27 0.49 0.00 5.00 2.77NPATCH 0.02 0.24 0.00 5.00 17.17PATCHS 0.01 0.11 0.00 3.00 13.26
NTRACE 0.14 0.44 0.00 9.00 7.82TRACES 3.56 10.73 0.00 175.00 5.04
NLINK 0.20 0.91 0.00 8.00 7.02NPART 3.61 3.89 1.00 40.00 7.48NDEVS 2.94 1.46 1.00 12.00 2.78
NUSERS 0.67 2.81 0.00 28.00 8.44SNACENT 0.19 0.07 0.00 0.51 0.43
NMSG 7.32 5.92 2.00 67.00 3.13REPLY 122.32 206.99 0.00 3239.00 5.17
REPLYE 0.10 0.09 0.00 1.00 1.29DLEN 337.00 441.75 2.00 6259.00 4.60
DLENE 0.23 0.10 0.00 1.00 0.08INT 3.80 8.42 0.00 55.00 4.94
INTE 0.14 0.26 0.00 1.00 1.74WA 9.33 6.36 0.00 49.00 1.68
WAE 0.17 0.19 0.00 1.00 0.65
Table IDESCRIPTIVE STATISTICS OF SOCIAL INTERACTION MEASURES
ment to even out the skewing effects during modelling [9].Figure 1 summarizes the pairwise correlations between our20 regression variables and our dependent variable in acorrelogram visualization [14]. A correlogram reports foreach unique pair of variables the strength of the correlationas a colour-coded field (red for positive correlation, blue fornegative correlation) and the p-level at which the correlationis significant. This visualization technique allows us toidentify “hotspots” that need our attention.
We identify the following types of intercorrelations inour dataset that could pose problems in our statisticalmodelling. First, we observe correlations between measuresfrom different concepts. For example, the measure of in-terestingness (INT) has a moderate to high correlation withour measures for number of users (NUSERS), number ofparticipants (NPART), number of developers (NDEV), andnumber of links (NLINK). We believe the first of thesethree intercorrelations stem from a default setting in theissue tracking systems that puts contributors automaticallyon the notification lists, but can not offer an explanation forthe correlation between the URLs provided by users in thediscussion contents and interestingness.
Second, we observe correlations through redundancy. Weexpected such correlations when designing social interactionmeasures: naturally the number of participants (NPART) ishighly correlated with the number of users (NUSERS) andnumber of developers (NDEVS). However, our motivationfor incorporating such redundancy is to investigate, whethersplitting up the information into more specialized represen-tations helps to improve our model. The same intuition holdsfor the measure of centrality (SNACENT).
The third type of observed hotspots are moderate corre-lations between quantitative measures and qualitative mea-sures, e.g., between the number of patches and the average
Dependent Variable(what is predicted)
RegressionCoefficients
IndependentVariables
25
APPROACHHierarchical Analysis
(2) Build linear model step-by-step, each time adding new independent variables.
(3) At each step, evaluate how well the set of independent variables explains the data.
Goal: Investigate the effect of each variable on the model.
Successfully used in the past! (Cataldo, Mockus)
(1) Eliminate variables with high correlation.
26
APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.
!"
!#
$%&'
()*+,-.
()-*!
(/01-2
/01-2)
(1,0-.
1,0-.)
(34(5
(/0,1
(6.7)
(+).,)
)(0-.(1
(!)8
,./39
,./39.
63.(
63.(.
4(1
4(1.
:0
:0.
;;;;;;;;;;;
;;;;;;;;
;;;;;;;;;
;;;;;;
;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;
;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;
;;;;;;;;;;;; ;;;;;;;;;
;;;;;;;; ;;;;;; ;;;;;; ;;;;
;;;;; ;;;;;; ;;;;;; ;;;;;;
;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;
;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;
;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;
;;;;;;;;; ;;;;;;; ;;;;;;;
;;;;; ;;;; ;;;; ;;;;;;;;
;;;;;; ; ;;; ;;;;;;;;;;;;;
;;;;;;;;;;;; ;;;;;;;;
;;;;;;;;;;;;
;;;;;;;;; ;;;;;;;;;;;;;
;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;
;;;
:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'
!"##"<=
=<>
=<=
Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.
Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02
Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.
size of patches.Since we observe a substantial number of high correla-
tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the
model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.
C. Hierarchical Analysis Design
After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.
The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the
27
APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.
!"
!#
$%&'
()*+,-.
()-*!
(/01-2
/01-2)
(1,0-.
1,0-.)
(34(5
(/0,1
(6.7)
(+).,)
)(0-.(1
(!)8
,./39
,./39.
63.(
63.(.
4(1
4(1.
:0
:0.
;;;;;;;;;;;
;;;;;;;;
;;;;;;;;;
;;;;;;
;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;
;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;
;;;;;;;;;;;; ;;;;;;;;;
;;;;;;;; ;;;;;; ;;;;;; ;;;;
;;;;; ;;;;;; ;;;;;; ;;;;;;
;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;
;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;
;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;
;;;;;;;;; ;;;;;;; ;;;;;;;
;;;;; ;;;; ;;;; ;;;;;;;;
;;;;;; ; ;;; ;;;;;;;;;;;;;
;;;;;;;;;;;; ;;;;;;;;
;;;;;;;;;;;;
;;;;;;;;; ;;;;;;;;;;;;;
;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;
;;;
:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'
!"##"<=
=<>
=<=
Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.
Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02
Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.
size of patches.Since we observe a substantial number of high correla-
tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the
model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.
C. Hierarchical Analysis Design
After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.
The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the
Correlogram showscorrelation “hotspots”
27
APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.
!"
!#
$%&'
()*+,-.
()-*!
(/01-2
/01-2)
(1,0-.
1,0-.)
(34(5
(/0,1
(6.7)
(+).,)
)(0-.(1
(!)8
,./39
,./39.
63.(
63.(.
4(1
4(1.
:0
:0.
;;;;;;;;;;;
;;;;;;;;
;;;;;;;;;
;;;;;;
;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;
;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;
;;;;;;;;;;;; ;;;;;;;;;
;;;;;;;; ;;;;;; ;;;;;; ;;;;
;;;;; ;;;;;; ;;;;;; ;;;;;;
;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;
;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;
;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;
;;;;;;;;; ;;;;;;; ;;;;;;;
;;;;; ;;;; ;;;; ;;;;;;;;
;;;;;; ; ;;; ;;;;;;;;;;;;;
;;;;;;;;;;;; ;;;;;;;;
;;;;;;;;;;;;
;;;;;;;;; ;;;;;;;;;;;;;
;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;
;;;
:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'
!"##"<=
=<>
=<=
Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.
Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02
Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.
size of patches.Since we observe a substantial number of high correla-
tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the
model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.
C. Hierarchical Analysis Design
After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.
The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the
(1) Build model with all independent variables
(2) Measure VIF of each independent variable
(3) Remove variable with highest VIF
Repeat until all VIF below threshold (t=10)
28
APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.
!"
!#
$%&'
()*+,-.
()-*!
(/01-2
/01-2)
(1,0-.
1,0-.)
(34(5
(/0,1
(6.7)
(+).,)
)(0-.(1
(!)8
,./39
,./39.
63.(
63.(.
4(1
4(1.
:0
:0.
;;;;;;;;;;;
;;;;;;;;
;;;;;;;;;
;;;;;;
;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;
;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;
;;;;;;;;;;;; ;;;;;;;;;
;;;;;;;; ;;;;;; ;;;;;; ;;;;
;;;;; ;;;;;; ;;;;;; ;;;;;;
;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;
;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;
;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;
;;;;;;;;; ;;;;;;; ;;;;;;;
;;;;; ;;;; ;;;; ;;;;;;;;
;;;;;; ; ;;; ;;;;;;;;;;;;;
;;;;;;;;;;;; ;;;;;;;;
;;;;;;;;;;;;
;;;;;;;;; ;;;;;;;;;;;;;
;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;
;;;
:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'
!"##"<=
=<>
=<=
Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.
Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02
Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.
size of patches.Since we observe a substantial number of high correla-
tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the
model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.
C. Hierarchical Analysis Design
After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.
The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the
(1) Build model with all independent variables
(2) Measure VIF of each independent variable
(3) Remove variable with highest VIF
Repeat until all VIF below threshold (t=10)
28
APPROACHHierarchical Analysis(1) Eliminate variables with high correlation.
!"
!#
$%&'
()*+,-.
()-*!
(/01-2
/01-2)
(1,0-.
1,0-.)
(34(5
(/0,1
(6.7)
(+).,)
)(0-.(1
(!)8
,./39
,./39.
63.(
63.(.
4(1
4(1.
:0
:0.
;;;;;;;;;;;
;;;;;;;;
;;;;;;;;;
;;;;;;
;;;;;;;;; ;;; ;;;;;;;;; ;;;;;;
;;;;;;;;;;;; ;;;; ;;;;;;;;;;; ;;;;;;;
;;;;;;;;;;;; ;;;;;;;;;
;;;;;;;; ;;;;;; ;;;;;; ;;;;
;;;;; ;;;;;; ;;;;;; ;;;;;;
;;;;;; ;;;;;; ; ;;;;;;;;; ;;;;;;;;;
;;;;;; ;;; ;;;; ;;;;; ;;;;;;;;;
;;;;;;;;; ;;; ;;;;;; ;;;;;;;;;;;; ;;; ;;;;
;;;;;;;;; ;;;;;;; ;;;;;;;
;;;;; ;;;; ;;;; ;;;;;;;;
;;;;;; ; ;;; ;;;;;;;;;;;;;
;;;;;;;;;;;; ;;;;;;;;
;;;;;;;;;;;;
;;;;;;;;; ;;;;;;;;;;;;;
;;;;;;;;; ;;; ;;;;;;;;;;;;;;; ;;;
;;;
:0. :0 4(1. 4(1 63.(. 63.( ,./39. ,./39 (!)8 )(0-.(1 (+).,) (6.7) (/0,1 (34(5 1,0-.) (1,0-. /01-2) (/01-2 ()-*! ()*+,-. $%&'
!"##"<=
=<>
=<=
Figure 1. Pairwise correlations of social interaction measures with levels * p<0.05, ** p<0.01, *** p<0.001. Strength of correlations is indicated bycolour intensities; negative correlations are marked with a dashed outline.
Variance Inflation Factorlog(Yi) Model 1 Model 2 Model 3NSOURCE 3.38 3.38 3.40NSCOM 3.34 3.34 3.36NPATCH 3.94 3.88 3.90PATCHS 3.84 3.82 3.84NTRACE 4.62 4.60 4.57TRACES 4.78 4.75 4.70NLINK 2.24 2.22 1.90NDEVS 9.32 9.27 1.91NUSERS 4.55 4.54 2.30SNACENT 10.66 10.65 —NMSG 11.63 — —REPLY 1.17 1.17 1.17REPLYE 2.04 1.91 1.90DLEN 4.21 1.91 1.87DLENE 4.65 1.98 1.96INT 2.82 2.82 2.60INTE 1.71 1.71 1.71WA 2.26 1.99 1.96WAE 2.08 2.06 2.02
Table IISTEP-WISE ANALYSIS OF MULTICOLLINEARITY.
size of patches.Since we observe a substantial number of high correla-
tions among regression variables, we have to examine po-tential issues due to multi-collinearity among the variables.To investigate potential problems, we compute the varianceinflation factors for each variable. Variance inflation factorsare widely used to measure the degree of multi-collinearitybetween variables in regression models [17]. FollowingKutner et al. [17], we remove those variables from the
model that have a variance inflation factor greater than 10.We start our analysis with a regression model that containsall our variables. The variance inflation factors for thismodel are presented in Table II, Model 1. We observe twovariables that have a variance inflation factors greater than10. We remove the highest one (NMSG) from the regressionmodel and recompute the variance inflation factors with thereduced set of variables. The resulting model, (Model 2 inTable II) contains only one more variable with an inflationfactor larger than 10. We remove the regression variable(SNACENT) from the model and recompute the inflationfactors. In the resulting model (Model 3 in Table II), novariables have an inflation factor larger than 5 and we finishour analysis of multicollinearity.
C. Hierarchical Analysis Design
After having determined the reduced set of regressionvariables with low multicollinearity, we proceed by inves-tigating the relative impact of each of the four dimensionsof social measures on the post-release defects.
The results of our hierarchical analysis are presented inTable III. To make the interpretation of the coefficients ofthe regression variables easier, we report the odds ratios [12]of each measure, rather than the coefficients themselves.An odds ratio greater than one indicates a positive rela-tion between the dependent variable (post-release defects)and the independent variables (social interaction measures),whereas an odds ratio smaller than one indicates a negativerelation. As we are working in a log-transformed space, the
(1) Build model with all independent variables
(2) Measure VIF of each independent variable
(3) Remove variable with highest VIF
Repeat until all VIF below threshold (t=10)
28
APPROACHHierarchical Analysis
(2) Build linear models step-by-step
log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1
Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.
odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.
We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].
Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-
ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.
Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.
Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-
Baseline Model: Code Churn
(Ball, Naggapan)
29
APPROACHHierarchical Analysis
(2) Build linear models step-by-step
log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1
Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.
odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.
We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].
Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-
ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.
Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.
Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-
Adding first dimensionDiscussion Content
Increase in explanatory powerstatistically significant
30
APPROACHHierarchical Analysis
(2) Build linear models step-by-step
log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1
Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.
odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.
We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].
Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-
ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.
Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.
Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-
Adding second dimension
Discussion Content
Increase in explanatory power:
not statistically significant!
31
APPROACHHierarchical Analysis
(2) Build linear models step-by-step
log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1
Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.
odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.
We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].
Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-
ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.
Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.
Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-
32
Addingthird
dimension
Increase in explanatory
power significant!
APPROACHHierarchical Analysis
(2) Build linear models step-by-step
log(Yi) MB M1 M2 M3 M4 M5CHURN 4.996 *** 4.631 *** 4.658 *** 5.303 *** 3.688 *** 4.470 ***NSOURCE 1.694 *** 1.698 *** 1.772 *** 1.769 *** 1.667 ***NTRACE 0.79 0.768 0.864 0.881 1.115NPATCH 0.209 * 0.210 * 0.284 + 0.231 * 0.291NSCOM 1.218 1.194 1.246 1.208 1.244PATCHS 12.607 * 12.626 * 11.200 * 12.736 * 18.207 **TRACES 1.016 1.012 1.004 0.989 0.975NLINK 1.764 *** 1.613 ** 1.600 ** 1.666 ** 1.596 +NPART 2.481 2.888 4.480 4.542NDEVS 0.475 0.582 0.385 0.274NUSERS 0.749 0.803 0.692 0.792REPLY 1.019 0.986 0.982REPLYE 0.117 *** 0.082 *** 0.044 ***DLEN 0.936 0.898 * 0.876 +DLENE 2.499 1.251 2.044INT 0.829 ** 0.821 ** 0.963INTE 1.109 1.013 1.306WA 1.432 *** 1.224 +WAE 2.718 * 2.169CON1-3 Fig. 2 ***χ2 559.01 *** 698.5 *** 700.15 731.5 *** 752.3 *** 1055.19 ***Dev. Expl. 10.71% 13.38 % 13.41 % 14.02 % 14.41 % 26.07 %∆χ2 139.48 1.652 31.357 20.28 302.87*** p<0.001, ** p<0.01, * p<0.05, + p <0.1
Table IIIHIERARCHICAL ANALYSIS OF LOGISTIC REGRESSION MODELS ALONG THE FOUR DIMENSIONS OF SOCIAL INTERACTION MEASURES.
odds-ratios have to be interpreted accordingly: a single unitchange in the log-transformed space corresponds to a changefrom 1 to 2.71 (= e1) units in untransformed space.
We start our hierarchical analysis with a baseline modelwhich relates churn [21] to post-release defects. Churn hasbeen shown in the past to be a valuable code-based predictorof defects [22], [23] even when used across projects [33]. Weobtained a measure of churn by mining the change historiesof each file in the project’s version control system. Theresults for the baseline model are presented in column MBof Table III and show that CHURN is positively associatedto the failure proneness of a file during the post-releaseperiod. As expected, these results are in line with earlierfindings [22], [23].
Model M1 introduces the first dimension of social in-teraction we want to study: the measures of structuralinformation items in the contents of issue report discussions.The results of the logistic regression model show, that onlyspecific structural information items are statistically signif-icant. Whereas the number of source code examples, thenumber of links and the effect size of patches stay significantthroughout all models when new variables are introduced,the total number of patches plays only a marginal role, indi-cating that this measure is unlikely to impact future failureproneness. When looking at the odds ratios, we surprisinglyfind a positive link between the number of source codesamples and future defects, as we initially expected codesamples to have a beneficial effect. One possible explanationmight be that developers trust user provided sample solutionsand incorporate their proposed (yet possibly flawed) mod-
ifications without further verification. The second strongestrelationship observed from the results is the positive relationbetween the number of links provided by users and failureproneness. The third relationship that we observe is a veryhigh, positive relation between the effect size of patches andfuture failure proneness of files. This result confirms earlierfindings on the risk of scattered changes [16]. Overall theresults show that measures of structural elements in issuereport discussions are indicators of increased future failureproneness of a file. The explanatory power of the modelincreases by 2.67% over the baseline model and this changeis statistically significant.
Model M2 introduces the second dimension of social in-teraction measures: structural information. The results showthat the role of a participants and the overall amount ofparticipants in a discussion have no statistically significantimpact on the future failure proneness of files. As a resultwe see no increase in the explanatory power of the modelby introducing the role of participants. We left out themeasures of reputation from this model, as we record themas factors with many levels that may disrupt our hierarchicalmodelling approach. We will revisit these measures later inmodel M5. Overall, we cannot find a significant relationbetween the role of participants and post-release defects.The explanatory power of the extended model increasesonly marginally, however this increase is not statisticallysignificant.
Model M3 introduces measures form the category ofcommunication dynamics. The results show a statisticallysignificant and strongly negative relation between the mea-
33
Model based on Social Structures
14.41% Variation Explained
Model based onCode Metrics
17.04% Variation Explained
34
(Zimmermann 2007)
Model based on Social Structures
14.41% Variation Explained
Model based onCode Metrics
17.04% Variation Explained
What happens when we combine both?
34
(Zimmermann 2007)
Model based on Social Structures
14.41% Variation Explained
Traditional Model(Zimmermann, PROMISE 2007)
17.04% Variation Explained
Combined Model30.77% Variation
Explained
35
Strongest Variables(Odds Ratios)
Size of PatchesWorkflowCode Churn# Code Examples# Links
36
Length of the DiscussionInterestigness of Report# PatchesReply Time
Strongest Variables(Odds Ratios)
Size of PatchesWorkflowCode Churn# Code Examples# Links
37
Length of the DiscussionInterestigness of Report# PatchesReply Time
Correlation,
not Causatio
n!
38
38
38
38
38