empirical research results for the evolution of a data-intensive software system and the gnome...

24
CoEvolving Changes in a DataIntensive So3ware System Mathieu Goeminne, Alexandre Decan, Tom Mens Service de Génie Logiciel, Université de Mons hDp://informaIque.umons.ac.be/genlog/projects/disse EOSESE 2014 European Open Symposium on Empirical So6ware Engineering July 2014

Upload: tom-mens

Post on 15-Jan-2015

212 views

Category:

Science


1 download

DESCRIPTION

Presentation by Mathieu Goeminne of joint empirical research (with Tom Mens, Alexandre Decan, Alexander Serebrenik, Bogdan Vasilescu) on evolving software systems during the EOSESE 2014 European Symposium in Lille, France, 30 June 2014. Part 1 of the presentation focuses on data-intensive software systems; part 2 focuses on the Gnome ecosystem

TRANSCRIPT

Page 1: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

Co-­‐Evolving  Changes  in  a Data-­‐Intensive  So3ware  System

Mathieu  Goeminne,  Alexandre  Decan,  Tom  Mens  Service  de  Génie  Logiciel,  Université  de  Mons

hDp://informaIque.umons.ac.be/genlog/projects/disse

EOSESE  2014  European  Open  Symposium  on  Empirical  So6ware  Engineering  -­‐  July  2014

Page 2: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Context

• FNRS  Projet  de  Recherche  “Data-­‐Intensive  So3ware  System  EvoluIon”  – Interuniversity  collaboraIon  with  Anthony  Cleve  and  Loup  Meurice  (University  of  Namur)  

• Overall  goal  – Expand  empirical  MSR  research  to  include  database-­‐related  acIviIes    – Analyse  and  support  co-­‐evoluIon  between  program  code  and  database  (schema)  in  data-­‐intensive  so3ware  systems  

• Approach  – Develop  generic  framework  – Implement  dedicated  analysis  and  visualisaIon  tools  – Carry  out  empirical  case  studies

2

Page 3: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Research  QuesIons

• Code-­‐centric  focus  – RQ1:  How  does  the  relaIon  between how  source  code  files  and database-­‐related  files  evolve?  

– RQ2:  What’s  the  effect  of  introducinga  parIcular  persistency  technology  (Hibernate,  JPA)?  

• Social  focus  – RQ3:  How  do  developers  divide  theirwork  and  how  does  this  evolve  over  Ime?

3

Page 4: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Case  Study

4

characteris/c value

duraIon 3,939  days  (  >  129  months)

dates from  Nov  2002  Ill  Aug  2013number  of  commits 18,727

number  of  disInct  files 20,718  (of  which  54%  code  files)

number  of  file  touches 93,721

number  of  disInct  developers 100

Official  repository hDps://github.com/scoophealth/oscar.git

A  Java  Electronical  Medical  Record  system

Page 5: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCAR  Code  dimension

• 3-­‐Ier  web  applicaIon  wriDen  in  Java  and  JSP

5

0%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

2003-07#

2004-01#

2004-07#

2005-01#

2005-07#

2006-01#

2006-07#

2007-01#

2007-07#

2008-01#

2008-07#

2009-01#

2009-07#

2010-01#

2010-07#

2011-01#

2011-07#

2012-01#

2012-07#

2013-01#

jsp#

java#

Monthly  aggregated  proporIon  of  JSP  and  Java  files

Page 6: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCARSocial  Dimension

• Monthly  number  of  disInct  acIve  developers  for  OSCAR     (a3er  idenIty  merging)

6

0"

5"

10"

15"

20"

25"

2003'07"

2004'01"

2004'07"

2005'01"

2005'07"

2006'01"

2006'07"

2007'01"

2007'07"

2008'01"

2008'07"

2009'01"

2009'07"

2010'01"

2010'07"

2011'01"

2011'07"

2012'01"

2012'07"

2013'01"

Page 7: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCARCode  dimension

• Number  of  source  code  files  w.r.t. database-­‐related  files

7

0"

1000"

2000"

3000"

4000"

5000"

6000"

2003)07"

2004)01"

2004)07"

2005)01"

2005)07"

2006)01"

2006)07"

2007)01"

2007)07"

2008)01"

2008)07"

2009)01"

2009)07"

2010)01"

2010)07"

2011)01"

2011)07"

2012)01"

2012)07"

2013)01"

pure"

sql"

sql  =  code  files  containing  embedded  SQL  statements

Page 8: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCAR  Social  Dimension

• How  does  the  acIvity  of  developers  evolve  over  Ime?

8

Develope

r

monthly  aggregated  number  of  file  touches  

Page 9: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Introducing  PersistencyProvider  in  OSCAR

• Hibernate  – introduced  in  OSCAR  since  July  2006  – Java  object-­‐relaIonal  mapping  (ORM)  library  

• XML  files  map  Java  classes  to  database  tables  and  Java  data  types  to  SQL  data  types  

• facilitates  data  query  and  retrieval  • generates  SQL  calls  and  relieves  the  developer  from  manual  result  set  handling  and  object  conversion  

• Java  Persistency  Architecture  (JPA)  – introduced  in  OSCAR  since  July  2008  – industry  standard  ORM  persistency  API  – Uses  Java  annotaIons  instead  of  XML  files

9

Page 10: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Introducing  Persistency  ProviderCode  dimension

SQL  =  code  file  containing  embedded  SQL  query  HIB  =  Java  file  targeted  by  Hibernate  XML  file  JPA  =  Java  file  containing  JPA  annotaIon  pure  =  code  files  not  containing  any  of  these  

!!!!!!

10

0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#100%#

2003-07-01#

2004-01-01#

2004-07-01#

2005-01-01#

2005-07-01#

2006-01-01#

2006-07-01#

2007-01-01#

2007-07-01#

2008-01-01#

2008-07-01#

2009-01-01#

2009-07-01#

2010-01-01#

2010-07-01#

2011-01-01#

2011-07-01#

2012-01-01#

2012-07-01#

2013-01-01#

2013-07-01#

JPA# HIB#

SQL# pure#

Monthly  aggregated  number  of  acIve  code  files

0"

200"

400"

600"

800"

1000"

1200"

1400"

1600"

2003)07)01"

2003)11)01"

2004)03)01"

2004)07)01"

2004)11)01"

2005)03)01"

2005)07)01"

2005)11)01"

2006)03)01"

2006)07)01"

2006)11)01"

2007)03)01"

2007)07)01"

2007)11)01"

2008)03)01"

2008)07)01"

2008)11)01"

2009)03)01"

2009)07)01"

2009)11)01"

2010)03)01"

2010)07)01"

2010)11)01"

2011)03)01"

2011)07)01"

2011)11)01"

2012)03)01"

2012)07)01"

2012)11)01"

2013)03)01"

2013)07)01"

JPA" HIB" SQL"

Page 11: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Introducing  Persistency  ProviderSocial  dimension

• Who  is  involved  in  introducing  changes  in  database-­‐related  code?

11

Develope

r

Bubble  size  =  log(monthly  aggregated  number  of  touched  files)

Page 12: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCARSocial  Dimension

• How  do  developers  divide  their  work?

12

Java$(87)$ JSP$(86)$

OSCAR$developers$(100)$3"

2"

5" 1"

6" 9"

SQL$(89)$

HIB$

31"

JPA$16" 15" 12"

Number of OSCAR developers involved in file touches per activity type

Page 13: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

EvoluIon  of  OSCARSocial  Dimension

• How  do  developers  divide  their  work?

13

Number of developers that introduce database-related code in some file for the first time

Java$(87)$ JSP$(86)$

OSCAR$developers$(100)$3"

24"

10" 10"

1" 0"

SQL$(53)$

HIB$

24"

JPA$9" 8" 11"

Page 14: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Preliminary  Conclusions

RQ1:  Code-­‐related  and  database-­‐related  files  evolve  together  (no  “phased”  co-­‐evoluIon)  !RQ2:  IntroducIon  of  Hibernate,  then  JPA  which  takes  over  Hibernate,  but  embedded  SQL  sIll  remains  very  important  !!RQ3:  No  clear  separaIon  of  acIviIes  between  developers  Majority  of  developers  changes  both  db-­‐related  and  db-­‐unrelated  code  No  observed  periods  dedicated  to  a  specific  acIvity

14

Page 15: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Future  Work

• Analyse  file  changes  at  finer  granularity    • Study  of  other  data-­‐intensive  so3ware  systems  • Study  the  evoluIon  of  DISS  quality  

– Unit  tests  involving  database-­‐related  classes  – Revisited  modularity,  coupling,  cohesion  – Database  inconsistencies  

• Study  the  evoluIon  of  social  aspects  – Are  there  disInct  sub-­‐communiIes?  – How  is  the  effort  distributed  in  each  community?  Are  there  different  teamwork  paDerns  in  these  communiIes?

15

Page 16: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

References

• M.  Goeminne,  A.  Decan,  T.  Mens,  Co-­‐evolving  Code-­‐Related  and  Database-­‐Related  Changes  in  a  Data-­‐Intensive  So6ware  System,  CSMR-­‐WCRE  2014  ERA  track    

• L.  Meurice,  A.  Cleve,  DAHLIA:  A  Visual  Analyzer  of  Database  Schema  EvoluKon,  CSMR-­‐WCRE  2014  Tool  Demo  

• A.  Cleve,  T.  Mens,  J.-­‐L.  Hainaut,  Data-­‐Intensive  System  EvoluIon,  IEEE  Computer  43(8):  110-­‐112  (2010)  

• A.  Cleve,  M.  Gobert,  L.  Meurice,  J.  Maes,  J.  Weber,  Understanding  database  schema  evoluIon:  A  case  study,  Science  of  Computer  Programming  (2013)

16

Page 17: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

A  Historical  Dataset  for  the  Gnome  Ecosystem

Mathieu  Goeminne,  Tom  Mens  Service  de  Génie  Logiciel,  Université  de  Mons

hDp://informaIque.umons.ac.be/genlog/projects/disse

EOSESE  2014  European  Open  Symposium  on  Empirical  So6ware  Engineering  -­‐  July  2014

Page 18: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome  ecosystem  Historical  Dataset

• Goal  – Study  the  evoluIon  of  social  aspects  of  the  Gnome  ecosystem  (1,418  projects,  11,094  contributors,  1,315,997  commits).  

• Methodology  1. Clone  the  source  code  repository  of  each  Gnome  

project  2. Store  its  history  in  a  MySQL  database  3. Add  extra  informaIon  to  facilitate  empirical  studies  

18

Page 19: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome  ecosystem  About  the  dataset

19

FLOSSMetrics  MySQL  datase:  hDps://bitbucket.org/mgoeminne/sgl-­‐flossmetric-­‐dbmerge

Page 20: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome  ecosystem  Extra  informaIon

20

• IdenIty  merging  • CSVAnalY2  hack  • Semi-­‐automaIc  idenIty  merging  based  on  name  and  e-­‐mail  

• 5,923  /  11,094  contributors  a3er  merging  • AcIvity  types  

• Tool  for  associaIng  an  acIvity  type  (coding,  translaIon,  documentaIon,  etc.)  to  each  file.  

• Regular  expressions  on  file  extension,  file  name  and  path.

Page 21: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome:  On  the  variaIon  and  specialisaIon  of  workload

21

Empir Software EngDOI 10.1007/s10664-013-9244-1

On the variation and specialisation of workload—Acase study of the GNOME ecosystem community

Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens

© Springer Science+Business Media New York 2013

Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relationshipsbetween technical artifacts. In contrast, we have carried out a case study that focuseson the actual contributors to software ecosystems, being collections of softwareprojects that are maintained by the same community. To this aim, we defined anew series of workload and involvement metrics, as well as a novel approach—!T-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosystemcontributors varies across projects and across activity types, and we explored towhich extent projects and contributors specialise in particular activity types. UsingGnome as a case study we observed that, next to coding, the activities of localization,development documentation and building are prevalent throughout the ecosystem.We also observed notable differences between frequent and occasional contributorsin terms of the activity types they are involved in and the number of projects theycontribute to. Occasional contributors and contributors that are involved in manydifferent projects tend to be more involved in the localization activity, while frequent

Communicated by: Margaret-Anne Storey

B. Vasilescu · A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven,The Netherlands

B. Vasilescue-mail: [email protected]

A. Serebrenike-mail: [email protected]

M. Goeminne · T. Mens (B)COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgiume-mail: [email protected]

M. Goeminnee-mail: [email protected]

����

����

���

���

������ ��� ��

����

���

� ����

����

����� ��

����

��

�� � ���

����

���

���

���

�����

RATW  <  14 RATW  >=  14

Page 22: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome:  On  the  variaIon  and  specialisaIon  of  workload

22

Empir Software EngDOI 10.1007/s10664-013-9244-1

On the variation and specialisation of workload—Acase study of the GNOME ecosystem community

Bogdan Vasilescu · Alexander Serebrenik ·Mathieu Goeminne · Tom Mens

© Springer Science+Business Media New York 2013

Abstract Most empirical studies of open source software repositories focus on theanalysis of isolated projects, or restrict themselves to the study of the relationshipsbetween technical artifacts. In contrast, we have carried out a case study that focuseson the actual contributors to software ecosystems, being collections of softwareprojects that are maintained by the same community. To this aim, we defined anew series of workload and involvement metrics, as well as a novel approach—!T-graphs—for reporting the results of comparing multiple distributions. We usedthese techniques to statistically study how workload and involvement of ecosystemcontributors varies across projects and across activity types, and we explored towhich extent projects and contributors specialise in particular activity types. UsingGnome as a case study we observed that, next to coding, the activities of localization,development documentation and building are prevalent throughout the ecosystem.We also observed notable differences between frequent and occasional contributorsin terms of the activity types they are involved in and the number of projects theycontribute to. Occasional contributors and contributors that are involved in manydifferent projects tend to be more involved in the localization activity, while frequent

Communicated by: Margaret-Anne Storey

B. Vasilescu · A. SerebrenikMDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven,The Netherlands

B. Vasilescue-mail: [email protected]

A. Serebrenike-mail: [email protected]

M. Goeminne · T. Mens (B)COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgiume-mail: [email protected]

M. Goeminnee-mail: [email protected]

Page 23: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

Gnome  ecosystem  References

• M.  Goeminne,  M.  Claes,  and  T.  Mens.  A  historical  dataset  for  the  Gnome  ecosystem,  MSR  2013,  pp.  225–228  

hDps://bitbucket.org/mgoeminne/sgl-­‐flossmetric-­‐dbmerge  

• B.  Vasilescu,  A.  Serebrenik,  M.  Goeminne,  and  T.  Mens.  On  the  variaKon  and  specialisaKon  of  workload—a  case  study  of  the  Gnome  ecosystem  community,  Empirical  So3ware  Engineering,  pp.  955–1008,  2014  

hDp://dx.doi.org/10.1007/s10664-­‐013-­‐9244-­‐1  

23

Page 24: Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014

References

24

!!Evolving Software Systems Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4