systematic mining of software...

21
software evolution & architecture lab University of Zurich, Switzerland http://seal.ifi.uzh.ch @ LASER summer school 2014 Harald Gall Systematic Mining of Software Repositories Lecture 5 - Retrospective

Upload: others

Post on 27-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

software evolution & architecture lab

University of Zurich, Switzerland http://seal.ifi.uzh.ch @ LASER summer school 2014

Harald Gall

Systematic Mining of Software Repositories !

Lecture 5 - Retrospective

Page 2: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

software evolution & architecture lab

2009 Roundtable on the Future of Mining Software Archives

Page 3: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

2009 Future of Mining Software Repos

Vision statement Status open

Answer Commonly Asked Project Questions Michael W. Godfrey partly

Software Repositories: A Strategic Asset Ahmed E. Hassan yes

Create Centralized Data Repositories James Herbsleb yes

Embed Mining in Developer Tools Gail C. Murphy partly

Help Developers Search for Information Martin Robillard partly

Deploy Mining to Industry Audris Mockus ongoing

Let Us Not Mine for Fool’s Gold David Notkin ongoing

based on a Software Roundtable published in IEEE Software 2009

Page 4: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

software evolution & architecture lab

2013 MSRconf revisited

Page 5: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

MSRconf.org: Status in 2013

‣ A Trend Analysis on Past MSR Papers, by Serge Demeyer et al., MSR 2013 ‣ RQ 1: Which are the popular and outdated research

topics? (by text analysis, with n-grams) ‣ RQ 2: Which are the frequently and less frequently cited

cases? ‣ RQ 3: Which is the popular and emerging mining

infrastructure? ‣ RQ 4: What is the “actionable information” which we are

deemed to uncover?

Page 6: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

RQ 1: Popularity of topics

Page 7: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

RQ 2: Frequently cited cases

Page 8: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

RQ 3: SCM’s

Page 9: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

CfP of MSRconf:

‣ Goal is “to uncover interesting and actionable information about software systems and projects”

Page 10: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

software evolution & architecture lab

The LNCS book chapter

Page 11: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

LNCS book chapter

‣ Revisiting Mining StudiesKatja Kevic (UZH), Stefanie Beyer (AAU), Ilias Rousinopoulos (AAU), Sven Amann (TUD) ‣ what’s a mining study: setup, resources, machinery, .. ‣ what sources (archives) can be used for what kind of

study (a catalog) ‣ what questions have been addressed so far ‣ what questions and conclusions (answers) so far ‣ which studies can be automated in terms of tooling and

infrastructure ‣ what is a benchmark for mining studies

Page 12: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

software evolution & architecture lab

A retrospective overview of topics addressed in the lectures

Page 13: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

The Screening Plant of a SW Miner

13

Page 14: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

Which data sources?

‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole, Sourcerer, Ultimate Debian DB ‣ Provide benchmark (raw) data

‣ Interactive online web platforms that provide various analyses ‣ Boa, FOSSology, Alitheia core, Ohloh ‣ Analyses offered by design ‣ Data produced is best used within the system

‣ Industrial project data (not widely accessible!)

Page 15: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

What kind of studies?

‣ Source code ‣ Which entities co-evolve/co-change? ‣ How to identify code smells or design disharmonies?

‣ Bugs and changes ‣ Who should / how long will it take to fix this bug? ‣ When do changes induce fixes? ‣ Predicting bugs and their components?

‣ Project and process ‣ Do code and comments co-evolve? ‣ Who are the experts of a piece of code?

Page 16: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Example: Bug Prediction

Using Code Churn vs. Fine-Grained Changes

Using the Gini Coefficient for Bug Prediction

Predicting the MethodPredicting the Types of Code Changes

Using developer networks for Bug Prediction

Page 17: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

• Learn a prediction model from historic data

• Predict defects for the same project

• Hundreds of prediction models / learners exist

• Models work fairly well with precision and recall of up to 80%.

Predictor Precision Recall

Pre-­‐Release  Bugs 73.80% 62.90%

Test  Coverage 83.80% 54.40%

Dependencies 74.40% 69.90%

Code  Complexity 79.30% 66.00%

Code  Churn 78.60% 79.90%

Org.  Structure 86.20% 84.00%From: N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality. ICSE 2008.

Performance of bug prediction

Page 18: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Example: Code Ownership

C. Bird, N. Nagappan, B. Murphy, H. Gall, P Devanbu, Don't touch my code! Examining the effects of ownership on software quality, ESEC/FSE ’11

Page 19: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Performance/Time variance

J. Ekanayake, J. Tappolet, H. Gall, A. Bernstein, Time variance and defect prediction in software projects, Empirical Software Engineering, Vol. 17 (4-5), 2012

Page 20: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Workflows & Mashups

Page 21: Systematic Mining of Software Repositorieslaser.inf.ethz.ch/2014/material/gall/gall-lecture-5-retrospective.pdf · ‣ Evolution analysis data repositories à la PROMISE ‣ Flossmole,

Type to enter text

Conclusions

‣ Bug predictions do work ‣ Cross-project predictions do not really work ‣ Data sets (systems) need to be “harmonized” ‣ Data preprocessing and learners need to be

calibrated ‣ Studies need to be replicable (systematically) ‣ Periods of stability vs. drift