empirical research results for the evolution of a data-intensive software system and the gnome...
DESCRIPTION
Presentation by Mathieu Goeminne of joint empirical research (with Tom Mens, Alexandre Decan, Alexander Serebrenik, Bogdan Vasilescu) on evolving software systems during the EOSESE 2014 European Symposium in Lille, France, 30 June 2014. Part 1 of the presentation focuses on data-intensive software systems; part 2 focuses on the Gnome ecosystemTRANSCRIPT
Co-‐Evolving Changes in a Data-‐Intensive So3ware System
Mathieu Goeminne, Alexandre Decan, Tom Mens Service de Génie Logiciel, Université de Mons
hDp://informaIque.umons.ac.be/genlog/projects/disse
EOSESE 2014 European Open Symposium on Empirical So6ware Engineering -‐ July 2014
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Context
• FNRS Projet de Recherche “Data-‐Intensive So3ware System EvoluIon” – Interuniversity collaboraIon with Anthony Cleve and Loup Meurice (University of Namur)
• Overall goal – Expand empirical MSR research to include database-‐related acIviIes – Analyse and support co-‐evoluIon between program code and database (schema) in data-‐intensive so3ware systems
• Approach – Develop generic framework – Implement dedicated analysis and visualisaIon tools – Carry out empirical case studies
2
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Research QuesIons
• Code-‐centric focus – RQ1: How does the relaIon between how source code files and database-‐related files evolve?
– RQ2: What’s the effect of introducinga parIcular persistency technology (Hibernate, JPA)?
• Social focus – RQ3: How do developers divide theirwork and how does this evolve over Ime?
3
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Case Study
4
characteris/c value
duraIon 3,939 days ( > 129 months)
dates from Nov 2002 Ill Aug 2013number of commits 18,727
number of disInct files 20,718 (of which 54% code files)
number of file touches 93,721
number of disInct developers 100
Official repository hDps://github.com/scoophealth/oscar.git
A Java Electronical Medical Record system
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCAR Code dimension
• 3-‐Ier web applicaIon wriDen in Java and JSP
5
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
2003-07#
2004-01#
2004-07#
2005-01#
2005-07#
2006-01#
2006-07#
2007-01#
2007-07#
2008-01#
2008-07#
2009-01#
2009-07#
2010-01#
2010-07#
2011-01#
2011-07#
2012-01#
2012-07#
2013-01#
jsp#
java#
Monthly aggregated proporIon of JSP and Java files
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCARSocial Dimension
• Monthly number of disInct acIve developers for OSCAR (a3er idenIty merging)
6
0"
5"
10"
15"
20"
25"
2003'07"
2004'01"
2004'07"
2005'01"
2005'07"
2006'01"
2006'07"
2007'01"
2007'07"
2008'01"
2008'07"
2009'01"
2009'07"
2010'01"
2010'07"
2011'01"
2011'07"
2012'01"
2012'07"
2013'01"
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCARCode dimension
• Number of source code files w.r.t. database-‐related files
7
0"
1000"
2000"
3000"
4000"
5000"
6000"
2003)07"
2004)01"
2004)07"
2005)01"
2005)07"
2006)01"
2006)07"
2007)01"
2007)07"
2008)01"
2008)07"
2009)01"
2009)07"
2010)01"
2010)07"
2011)01"
2011)07"
2012)01"
2012)07"
2013)01"
pure"
sql"
sql = code files containing embedded SQL statements
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCAR Social Dimension
• How does the acIvity of developers evolve over Ime?
8
Develope
r
monthly aggregated number of file touches
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Introducing PersistencyProvider in OSCAR
• Hibernate – introduced in OSCAR since July 2006 – Java object-‐relaIonal mapping (ORM) library
• XML files map Java classes to database tables and Java data types to SQL data types
• facilitates data query and retrieval • generates SQL calls and relieves the developer from manual result set handling and object conversion
• Java Persistency Architecture (JPA) – introduced in OSCAR since July 2008 – industry standard ORM persistency API – Uses Java annotaIons instead of XML files
9
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Introducing Persistency ProviderCode dimension
SQL = code file containing embedded SQL query HIB = Java file targeted by Hibernate XML file JPA = Java file containing JPA annotaIon pure = code files not containing any of these
!!!!!!
10
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#
2003-07-01#
2004-01-01#
2004-07-01#
2005-01-01#
2005-07-01#
2006-01-01#
2006-07-01#
2007-01-01#
2007-07-01#
2008-01-01#
2008-07-01#
2009-01-01#
2009-07-01#
2010-01-01#
2010-07-01#
2011-01-01#
2011-07-01#
2012-01-01#
2012-07-01#
2013-01-01#
2013-07-01#
JPA# HIB#
SQL# pure#
Monthly aggregated number of acIve code files
0"
200"
400"
600"
800"
1000"
1200"
1400"
1600"
2003)07)01"
2003)11)01"
2004)03)01"
2004)07)01"
2004)11)01"
2005)03)01"
2005)07)01"
2005)11)01"
2006)03)01"
2006)07)01"
2006)11)01"
2007)03)01"
2007)07)01"
2007)11)01"
2008)03)01"
2008)07)01"
2008)11)01"
2009)03)01"
2009)07)01"
2009)11)01"
2010)03)01"
2010)07)01"
2010)11)01"
2011)03)01"
2011)07)01"
2011)11)01"
2012)03)01"
2012)07)01"
2012)11)01"
2013)03)01"
2013)07)01"
JPA" HIB" SQL"
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Introducing Persistency ProviderSocial dimension
• Who is involved in introducing changes in database-‐related code?
11
Develope
r
Bubble size = log(monthly aggregated number of touched files)
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCARSocial Dimension
• How do developers divide their work?
12
Java$(87)$ JSP$(86)$
OSCAR$developers$(100)$3"
2"
5" 1"
6" 9"
SQL$(89)$
HIB$
31"
JPA$16" 15" 12"
Number of OSCAR developers involved in file touches per activity type
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
EvoluIon of OSCARSocial Dimension
• How do developers divide their work?
13
Number of developers that introduce database-related code in some file for the first time
Java$(87)$ JSP$(86)$
OSCAR$developers$(100)$3"
24"
10" 10"
1" 0"
SQL$(53)$
HIB$
24"
JPA$9" 8" 11"
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Preliminary Conclusions
RQ1: Code-‐related and database-‐related files evolve together (no “phased” co-‐evoluIon) !RQ2: IntroducIon of Hibernate, then JPA which takes over Hibernate, but embedded SQL sIll remains very important !!RQ3: No clear separaIon of acIviIes between developers Majority of developers changes both db-‐related and db-‐unrelated code No observed periods dedicated to a specific acIvity
14
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Future Work
• Analyse file changes at finer granularity • Study of other data-‐intensive so3ware systems • Study the evoluIon of DISS quality
– Unit tests involving database-‐related classes – Revisited modularity, coupling, cohesion – Database inconsistencies
• Study the evoluIon of social aspects – Are there disInct sub-‐communiIes? – How is the effort distributed in each community? Are there different teamwork paDerns in these communiIes?
15
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
References
• M. Goeminne, A. Decan, T. Mens, Co-‐evolving Code-‐Related and Database-‐Related Changes in a Data-‐Intensive So6ware System, CSMR-‐WCRE 2014 ERA track
• L. Meurice, A. Cleve, DAHLIA: A Visual Analyzer of Database Schema EvoluKon, CSMR-‐WCRE 2014 Tool Demo
• A. Cleve, T. Mens, J.-‐L. Hainaut, Data-‐Intensive System EvoluIon, IEEE Computer 43(8): 110-‐112 (2010)
• A. Cleve, M. Gobert, L. Meurice, J. Maes, J. Weber, Understanding database schema evoluIon: A case study, Science of Computer Programming (2013)
16
A Historical Dataset for the Gnome Ecosystem
Mathieu Goeminne, Tom Mens Service de Génie Logiciel, Université de Mons
hDp://informaIque.umons.ac.be/genlog/projects/disse
EOSESE 2014 European Open Symposium on Empirical So6ware Engineering -‐ July 2014
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome ecosystem Historical Dataset
• Goal – Study the evoluIon of social aspects of the Gnome ecosystem (1,418 projects, 11,094 contributors, 1,315,997 commits).
• Methodology 1. Clone the source code repository of each Gnome
project 2. Store its history in a MySQL database 3. Add extra informaIon to facilitate empirical studies
18
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome ecosystem About the dataset
19
FLOSSMetrics MySQL datase: hDps://bitbucket.org/mgoeminne/sgl-‐flossmetric-‐dbmerge
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome ecosystem Extra informaIon
20
• IdenIty merging • CSVAnalY2 hack • Semi-‐automaIc idenIty merging based on name and e-‐mail
• 5,923 / 11,094 contributors a3er merging • AcIvity types
• Tool for associaIng an acIvity type (coding, translaIon, documentaIon, etc.) to each file.
• Regular expressions on file extension, file name and path.
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome: On the variaIon and specialisaIon of workload
21
Empir Software EngDOI 10.1007/s10664-013-9244-1
On the variation and specialisation of workload—Acase study of the GNOME ecosystem community
Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens
© Springer Science+Business Media New York 2013
Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relationshipsbetween technical artifacts. In contrast, we have carried out a case study that focuseson the actual contributors to software ecosystems, being collections of softwareprojects that are maintained by the same community. To this aim, we defined anew series of workload and involvement metrics, as well as a novel approach—!T-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosystemcontributors varies across projects and across activity types, and we explored towhich extent projects and contributors specialise in particular activity types. UsingGnome as a case study we observed that, next to coding, the activities of localization,development documentation and building are prevalent throughout the ecosystem.We also observed notable differences between frequent and occasional contributorsin terms of the activity types they are involved in and the number of projects theycontribute to. Occasional contributors and contributors that are involved in manydifferent projects tend to be more involved in the localization activity, while frequent
Communicated by: Margaret-Anne Storey
B. Vasilescu · A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven,The Netherlands
B. Vasilescue-mail: [email protected]
A. Serebrenike-mail: [email protected]
M. Goeminne · T. Mens (B)COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgiume-mail: [email protected]
M. Goeminnee-mail: [email protected]
����
����
���
���
������ ��� ��
����
���
� ����
����
����� ��
�
����
��
�� � ���
����
���
���
���
�����
RATW < 14 RATW >= 14
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome: On the variaIon and specialisaIon of workload
22
Empir Software EngDOI 10.1007/s10664-013-9244-1
On the variation and specialisation of workload—Acase study of the GNOME ecosystem community
Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens
© Springer Science+Business Media New York 2013
Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relationshipsbetween technical artifacts. In contrast, we have carried out a case study that focuseson the actual contributors to software ecosystems, being collections of softwareprojects that are maintained by the same community. To this aim, we defined anew series of workload and involvement metrics, as well as a novel approach—!T-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosystemcontributors varies across projects and across activity types, and we explored towhich extent projects and contributors specialise in particular activity types. UsingGnome as a case study we observed that, next to coding, the activities of localization,development documentation and building are prevalent throughout the ecosystem.We also observed notable differences between frequent and occasional contributorsin terms of the activity types they are involved in and the number of projects theycontribute to. Occasional contributors and contributors that are involved in manydifferent projects tend to be more involved in the localization activity, while frequent
Communicated by: Margaret-Anne Storey
B. Vasilescu · A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven,The Netherlands
B. Vasilescue-mail: [email protected]
A. Serebrenike-mail: [email protected]
M. Goeminne · T. Mens (B)COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgiume-mail: [email protected]
M. Goeminnee-mail: [email protected]
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
Gnome ecosystem References
• M. Goeminne, M. Claes, and T. Mens. A historical dataset for the Gnome ecosystem, MSR 2013, pp. 225–228
hDps://bitbucket.org/mgoeminne/sgl-‐flossmetric-‐dbmerge
• B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens. On the variaKon and specialisaKon of workload—a case study of the Gnome ecosystem community, Empirical So3ware Engineering, pp. 955–1008, 2014
hDp://dx.doi.org/10.1007/s10664-‐013-‐9244-‐1
23
European Open Symposium on Empirical So3ware Engineering — Lille, France, July 2014
References
24
!!Evolving Software Systems Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4