visualization and analysis of open source software evolution using an evolution curve method dr....
TRANSCRIPT
Visualization and Analysis of Open Source Software Evolution using An Evolution Curve Method
Dr. Robertas DamaševičiusSoftware Engineering Department,Kaunas University of Technology
Studentų 50-415, Kaunas, LithuaniaEmail: [email protected]
http://soften.ktu.lt/~damarobe
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 2
Context and Problem
Software systems are: designed, constructed and used by people components in larger socio-technical systems
Software design is: a social process embedded within organizational and cultural structures influenced by social processes such as programmer collaboration in teams
Open source software systems: Free to use Free availability of source code Developed by many programmers Continuously evolve
Aim: analysis of open source software evolution using metrics
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 3
What is software evolution?
Definition: a continuing process in time during which some essential
software properties are changed Activities:
modification, adaptation, maintenance, and other activities which occur after the delivery of the first
operational release to the users Importance:
costs devoted to system maintenance and evolution account for more than 90% of total software costs (Erlikh, 1990)
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 4
Forces and factors of open source software evolution Evolution of open source systems:
less strict control and management model usually started by a single developer (seed) attracted users become co-developers governed by the needs of users and spontaneous collaboration
of co-developers Evolution mechanisms:
natural selection, competition variation-increasing & variation-decreasing influenced by psychological, intellectual, social and cultural,
economic and business factors
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 5
Software metrics Common
Source lines of code Cyclomatic complexity Halstead metrics Number of classes and interfaces R.C. Martin’s software package metrics Cohesion, Coupling, …
Specific software evolution metrics SDI metric L–metric AICC metric G-metric
Software development models Statistical models Rayleigh model Halstead’s Software Science model COCOMO model
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 6
Lehman’s “Laws of Software Evolution” Formulated by M.M. Lehman in the 1980s
Law of Continuing Change Law of Increasing Complexity Law of Statistically Smooth Growth Law of Organisational Stability Law of Conservation of Familiarity Law of Continuing Growth Law of Declining Quality Law of Feedback System
Evolution forces Growth Maintenance
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 7
Transition-based model of evolution
Stages: many, often overlapping Transitions: breakpoints between stages, which represent significant
changes. Transitions occur because as a system evolves, its structure must be regularly adapted to the changing requirements and environment
Gradual change: a slow process of incremental change caused by accumulating maintenance steps or gradual decay
Sudden change: significant changes in the evolving system or in the process by which it is evolved
time
Software characteristic
Transitions
Sudden change
Gradual change
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 8
Information-theoretic methods Shannon entropy
A measure of the uncertainty associated with a random variable. The information source generates a series of symbols xi belonging to an
alphabet with size N according to a known probability distribution p(xi), the entropy function H of a sequence X can be defined:
High entropy: higher complexity of the system’s code Low entropy: there are some repeated patterns of source code; code
maintenance is required Kolmogorov Complexity
Measures the ‘complexity’ (i.e., information content) of an object by the length of the smallest program that generates it.
Kolmogorov Complexity Kφ(x) of an object x in the description system φ is the length of the shortest program capable of producing x:
}:{min xwxK ww
n
iii xpxpXH
12log
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 9
Evolution curve method (1)
Motivation: the addition of new features to a software system leads to the change of basic software characteristics (complexity/entropy) in the system.
Idea: use the change of software size and complexity as a means to determine different stages of evolution of a software system
Inspiration: Z-curve1 and DNA walk2 methods used in analyzing complex genetic sequences
1 R. Zhang, C.T. Zhang. Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. J. Biomol. Struc. Dynamics 11, 767–782, 1994. 2 S. Paxia, A. Rudra, Y. Zhou, B. Mishra. A Random Walk down the Genomes: DNA Evolution in VALIS. IEEE Computer 35(7):73-79, 2002.
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 10
Evolution curve method (2)
E-curve is composed of a series of nodes , whose coordinates are and (i = 1,2,...,N), where N is the number of versions of the analyzed software system.
The nodes are connected sequentially with straight segments. The coordinates and are calculated iteratively:
is the Kolmogorov Complexity of the i-th version of a software system;
is the Shannon entropy of the i-th version of a system
),( iii yxE
iE
ix iy
11
11
11
1
,
,1
iii
iii
iii
i
KKifx
KKifx
KKifx
x
ix iy
iK
iH
11
11
11
1
,
,1
iii
iii
iii
i
HHify
HHify
HHify
y
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 11
Evolution curve method (3)
Two dimensions of the Evolution curve x (relative information content) and y (relative complexity),
Represent two independent (orthogonal) characteristics of a software system: x-dimension: amount of information contained in a software
system and is an estimation of software size; y-dimension: information entropy of a software system and is
an estimation of software complexity.
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 12
Software evolution stages
Software Growth: system is actively developed
Software Maintenance: system becomes simpler often at a cost of its size
Software Improvement: system becomes more complex and generic
Software Shrink: functionality of a system is reduced
Size
Complexity GROWTH MAINTENANCE
IMPROVEMENT SHRINK
EVOLUTION
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 13
Trends of Evolution curve
Size
Complexity Actively Developed Systems
Size
Complexity
Mature Systems
Actively developed systems: long upward trends of growth Mature, stable systems: long downward trends of maintenance
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 14
Case studies
Source: SourceForge 7-zip
Archiver 82 versions, 5 years, 160K LOC
Grip CD player/ripper 36 versions, 14K LOC
eMule P2P file sharing client
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 15
Case study: eMule
eMule: one of the biggest P2P file sharing clients coded in Microsoft Visual C++ using MFC Free software, released under the GNU GPL Source code first released at version 0.02 on July 6, 2002 Latest release contains 222,680 lines of code Actively developed by 5 developers Current development status is “Production/Stable” For analysis, 68 versions of eMule source code were used
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 16
eMule: Entropy
Version 015a
Version 018a
Version 030a
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 17
eMule: Size
y = A + B∙x + C∙x2
A = 7676.17B = 4324.67C = 177.488 r = 0.9935
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 18
eMule’s Evolution curve
23b
25b
30e
44b
47c
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 19
What does the changelog say?
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 20
Conclusions Software evolution process can be divided into 4 stages
software growth: the size and complexity of developed software is increasing
software maintenance: the aim is to contain complexity and fix software bugs
software improvement: the aim is to contain software system size at a cost of increasing complexity
software shrink: both software size and its complexity is trimmed Evolution curve method can:
identify software evolution stages identify the initial development status of the analyzed software system:
actively developed systems show long growth trends mature systems show maintenance and improvement trends
Is independent from software implementation language
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 21
Ongoing Research and Further Work Analysis of other entropy measures such as block entropy
and Rényi entropies paper submitted to Journal of Software Maintenance and
Evolution Dynamic models of software evolution
Differential equations, etc. More case studies
paper submitted to Computing and Information Systems Journal
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 22
Thank You.Any Questions?
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 23
7-zip: Evolution curve
Eighth International Baltic Conference on Databases and Information SystemsJune 2-5, 2008, Tallinn, Estonia 24
Grip: Evolution curve