msr conf.odp

Download MSR Conf.odp

If you can't read please download the document

Upload: kamanw

Post on 11-Nov-2015

237 views

Category:

Documents


5 download

TRANSCRIPT

Mining Software Repositories

What to do? And where to get data?

Israel Herraiz Universidad Alfonso X el Sabio

June 18th 2010

Outline

What is Mining Software Repositories? What are repositories?

Conferences and journals of interest

And some words about trending topics

Tools for Mining Software Repositories

Datasets for Mining Software Repositories

For replicable and verifiable empirical studies

1. What is Mining Software Repositories?

What is Mining Software Repositories?

MSR analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.

Popular topic since 2004MSR workshop, colocated with ICSE

Working Conference since 2008

What are repositories?

Anything that leaves a trail about any software development or maintenance activities

Also includes any software artifact

TipicallyVersion control systems

Bug tracking systems

Public communication tools (mailing lists)

Differences between artifact and repository

#include

int main() {printf(Hello world);return 0;}Artifact Source code file

hello.c

- printf(Hello world);+ printf(Hello world\n);Author: rmsDate: 20100618 04:34 UTCChange: +1 -1Log: Forgot to add new line

hello.c.diff

RepositoryChange to an artifactMeta-information

2. Conferences and journals of interest

Working conferences of interest

IEEE Int. Working Conf. Source Code Analysis & Manipulation(SCAM)

http://www.ieee-scam.org

IEEE Int. Working Conf. Mining Software Repositories(MSR)

http://msr.uwaterloo.ca

Deadlines Accept rate Journal possib.

January

(Februray for the challenge)

April

26% (2007)38% (2008)45% (2009)

19% (2008)31% (2010)

JSSSCP

EMSEIEEE TSE

Conferences of interest

IEEE Int. Conf. Software Engineering (ICSE)

http://www.sbs.co.za/ICSE2010/

IEEE Int. Conf. Software Maintenance (ICSM)

http://icsm2010.upt.ro/

Deadlines Accept rate Journal possib.

April

AugustSeptember

15% (2008)12% (2009)14% (2010)

21% (2007)26% (2008)22% (2009)

Nospecial issues

Nospecialissues

Empirical Software Eng. & Measurement (EMSE)

http://www.esem-conferences.org/

March

?

EMSE

Other interesting conferences

Working Conference on Reverse Engineering (WCRE)http://web.soccerlab.polymtl.ca/wcre2010/

International Conference on Predictive Models and Software Engineering (PROMISE)http://promisedata.org/

European Conference on Software Mainteance and Re-engineering (CSMR)http://www.sait.escet.urjc.es/csmr2010/

Journals of interest

IEEE Transactions on Software Engineering (TSE) http://www.computer.org/tse/

ACM Transactions on Software Engineering and Methodology (TOSEM)http://tosem.acm.org/

Empirical Software Engineering (EMSE)http://www.springerlink.com/content/1382-3256

Journal of Systems and Software (JSS)http://www.elsevier.com/locate/jss

Journal of Software Maintenance and Evolution (JSME)http://eu.wiley.com/WileyCDA/WileyTitle/productCd-SMR.html

Handy links

Software Engineering ConferencesVerification, Formal Methods, Programming Lang. and Compilers, Web, Security

http://people.engr.ncsu.edu/txie/seconferences.htm

Upcoming Software Engineering Conferences Maphttp://research.csc.ncsu.edu/ase/semap/

Trending topics

Replication of empirical studiesThe replication package

Recommendation systemsAutomated Software Engineering

3. Tools for Mining Software Repositories

Tools for Mining Software Repositories

Mining toolsLibresoft Tools http://tools.libresoft.es/

CVSAnaly CVS/SVN/Git repositories log parser

MLStats Mailman and Mboxes parser

Bicho Bugzilla and SF.net tracker parser

Software Architecture Group (SWAG) University of Waterloohttp://www.swag.uwaterloo.ca/tools.html

4. Datasets for Mining Software Repositories

MSR Mining Challenge

Mirrors of the version archives and bug databases for Mozilla Firefox and Eclipsehttp://msr.uwaterloo.ca/msr2008/challenge/

Repository logs of over 500+ Gnome projects, XML dump of the bug databases, and the complete SVN repositories of 69 Gnome projectshttp://msr.uwaterloo.ca/msr2009/challenge/

Ultimate Debian Database

Database with information about packages and bug reports of Debian and Ubuntuhttp://udd.debian.org/

Eclipse bug database

Saarland University

Datasheets, databases, scripts, with information about Eclipse bug reports for several releases

http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/

FLOSSMetrics

Databases about ~5000 open source projects

Control version repositories, mailing list archives, bug tracking databases

MySQL dumpsNot very user friendly

Obtained using the Libresoft Tools

http://www.flossmetrics.org/

FLOSSMole

Database with information about all the SourceForge.net projects

~150,000 projects

Mainly metainformation, obtained through parsing the web pages of the projects

No low level or fine grained information

http://flossmole.org

PROMISE repository

All PROMISE papers must also submit a package with the data used in the paper

http://promisedata.org/

101 datasetsDefect prediction (58)

Effort prediction (18)

General (9)

Model-based SE (7)

Text mining (9)

http://www.uax.es