visualization and analysis of open source software evolution using an evolution curve method dr....

24
Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department, Kaunas University of Technology Studentų 50-415, Kaunas, Lithuania Email: [email protected] http://soften.ktu.lt/~damarobe

Upload: horace-smith

Post on 11-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method

Dr. Robertas DamaševičiusSoftware Engineering Department,Kaunas University of Technology

Studentų 50-415, Kaunas, LithuaniaEmail: [email protected]

http://soften.ktu.lt/~damarobe

Page 2: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 2

Context and Problem

Software systems are: designed, constructed and used by people components in larger socio-technical systems

Software design is: a social process embedded within organizational and cultural structures influenced by social processes such as programmer collaboration in teams

Open source software systems: Free to use Free availability of source code Developed by many programmers Continuously evolve

Aim: analysis of open source software evolution using metrics

Page 3: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 3

What is software evolution?

Definition: a continuing process in time during which some essential

software properties are changed Activities:

modification, adaptation, maintenance, and other activities which occur after the delivery of the first

operational release to the users Importance:

costs devoted to system maintenance and evolution account for more than 90% of total software costs (Erlikh, 1990)

Page 4: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 4

Forces and factors of open source software evolution Evolution of open source systems:

less strict control and management model usually started by a single developer (seed) attracted users become co-developers governed by the needs of users and spontaneous collaboration

of co-developers Evolution mechanisms:

natural selection, competition variation-increasing & variation-decreasing influenced by psychological, intellectual, social and cultural,

economic and business factors

Page 5: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 5

Software metrics Common

Source lines of code Cyclomatic complexity Halstead metrics Number of classes and interfaces R.C. Martin’s software package metrics Cohesion, Coupling, …

Specific software evolution metrics SDI metric L–metric AICC metric G-metric

Software development models Statistical models Rayleigh model Halstead’s Software Science model COCOMO model

Page 6: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 6

Lehman’s “Laws of Software Evolution” Formulated by M.M. Lehman in the 1980s

Law of Continuing Change Law of Increasing Complexity Law of Statistically Smooth Growth Law of Organisational Stability Law of Conservation of Familiarity Law of Continuing Growth Law of Declining Quality Law of Feedback System

Evolution forces Growth Maintenance

Page 7: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 7

Transition-based model of evolution

Stages: many, often overlapping Transitions: breakpoints between stages, which represent significant

changes. Transitions occur because as a system evolves, its structure must be regularly adapted to the changing requirements and environment

Gradual change: a slow process of incremental change caused by accumulating maintenance steps or gradual decay

Sudden change: significant changes in the evolving system or in the process by which it is evolved

time

Software characteristic

Transitions

Sudden change

Gradual change

Page 8: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 8

Information-theoretic methods Shannon entropy

A measure of the uncertainty associated with a random variable. The information source generates a series of symbols xi belonging to an

alphabet with size N according to a known probability distribution p(xi), the entropy function H of a sequence X can be defined:

High entropy: higher complexity of the system’s code Low entropy: there are some repeated patterns of source code; code

maintenance is required Kolmogorov Complexity

Measures the ‘complexity’ (i.e., information content) of an object by the length of the smallest program that generates it.

Kolmogorov Complexity Kφ(x) of an object x in the description system φ is the length of the shortest program capable of producing x:

}:{min xwxK ww

n

iii xpxpXH

12log

Page 9: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 9

Evolution curve method (1)

Motivation: the addition of new features to a software system leads to the change of basic software characteristics (complexity/entropy) in the system.

Idea: use the change of software size and complexity as a means to determine different stages of evolution of a software system

Inspiration: Z-curve1 and DNA walk2 methods used in analyzing complex genetic sequences

1 R. Zhang, C.T. Zhang. Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. J. Biomol. Struc. Dynamics 11, 767–782, 1994. 2 S. Paxia, A. Rudra, Y. Zhou, B. Mishra. A Random Walk down the Genomes: DNA Evolution in VALIS. IEEE Computer 35(7):73-79, 2002.

Page 10: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 10

Evolution curve method (2)

E-curve is composed of a series of nodes , whose coordinates are and (i = 1,2,...,N), where N is the number of versions of the analyzed software system.

The nodes are connected sequentially with straight segments. The coordinates and are calculated iteratively:

is the Kolmogorov Complexity of the i-th version of a software system;

is the Shannon entropy of the i-th version of a system

),( iii yxE

iE

ix iy

11

11

11

1

,

,1

iii

iii

iii

i

KKifx

KKifx

KKifx

x

ix iy

iK

iH

11

11

11

1

,

,1

iii

iii

iii

i

HHify

HHify

HHify

y

Page 11: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 11

Evolution curve method (3)

Two dimensions of the Evolution curve x (relative information content) and y (relative complexity),

Represent two independent (orthogonal) characteristics of a software system: x-dimension: amount of information contained in a software

system and is an estimation of software size; y-dimension: information entropy of a software system and is

an estimation of software complexity.

Page 12: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 12

Software evolution stages

Software Growth: system is actively developed

Software Maintenance: system becomes simpler often at a cost of its size

Software Improvement: system becomes more complex and generic

Software Shrink: functionality of a system is reduced

Size

Complexity GROWTH MAINTENANCE

IMPROVEMENT SHRINK

EVOLUTION

Page 13: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 13

Trends of Evolution curve

Size

Complexity Actively Developed Systems

Size

Complexity

Mature Systems

Actively developed systems: long upward trends of growth Mature, stable systems: long downward trends of maintenance

Page 14: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 14

Case studies

Source: SourceForge 7-zip

Archiver 82 versions, 5 years, 160K LOC

Grip CD player/ripper 36 versions, 14K LOC

eMule P2P file sharing client

Page 15: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 15

Case study: eMule

eMule: one of the biggest P2P file sharing clients coded in Microsoft Visual C++ using MFC Free software, released under the GNU GPL Source code first released at version 0.02 on July 6, 2002 Latest release contains 222,680 lines of code Actively developed by 5 developers Current development status is “Production/Stable” For analysis, 68 versions of eMule source code were used

Page 16: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 16

eMule: Entropy

Version 015a

Version 018a

Version 030a

Page 17: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 17

eMule: Size

y = A + B∙x + C∙x2

A = 7676.17B = 4324.67C = 177.488 r = 0.9935

Page 18: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 18

eMule’s Evolution curve

23b

25b

30e

44b

47c

Page 19: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 19

What does the changelog say?

Page 20: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 20

Conclusions Software evolution process can be divided into 4 stages

software growth: the size and complexity of developed software is increasing

software maintenance: the aim is to contain complexity and fix software bugs

software improvement: the aim is to contain software system size at a cost of increasing complexity

software shrink: both software size and its complexity is trimmed Evolution curve method can:

identify software evolution stages identify the initial development status of the analyzed software system:

actively developed systems show long growth trends mature systems show maintenance and improvement trends

Is independent from software implementation language

Page 21: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 21

Ongoing Research and Further Work Analysis of other entropy measures such as block entropy

and Rényi entropies paper submitted to Journal of Software Maintenance and

Evolution Dynamic models of software evolution

Differential equations, etc. More case studies

paper submitted to Computing and Information Systems Journal

Page 22: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 22

Thank You.Any Questions?

Page 23: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 23

7-zip: Evolution curve

Page 24: Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method Dr. Robertas Damaševičius Software Engineering Department,

Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 24

Grip: Evolution curve