investigating of evolution for spreadsheet application: a ... · existing software systems [18]....

47
Investigating of Evolution for Spreadsheet Application: A Case Study Azin Ashkan and Ladan Tahvildari Electrical & Computer Engineering Department University of Waterloo, Canada Tech. Report #2004-27 December 2004

Upload: others

Post on 21-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

Investigating of Evolution for Spreadsheet

Application: A Case Study

Azin Ashkan and Ladan Tahvildari

Electrical & Computer Engineering Department

University of Waterloo, Canada

Tech. Report #2004-27

December 2004

Page 2: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

ii

Abstract With the approach of the new millennium, one of the main focuses in software engineering discipline involves issues related to upgrading, migrating, and evolving existing software systems [18]. According to the software system evolution, it is needed to understand and study the various changes of the software systems [24].

Nowadays, spreadsheets are among the most popular software systems. The proposed project aims to study the evolution of a Linux spreadsheet system namely KSpread. The project concentrates on nine releases of the proposed case study. To extract and analyze the evolution of the system, we use the information from two main resources namely: change notes provided by developers, and source code of KSpread. The SWAGKit tool [4] has been used to extract the architecture of the system.

In a nutshell, our proposed approach performs Retrospective and Predictive analysis on the selected releases of KSpread in terms of some metrics such as size, function calls, cohesion, coupling, and some heuristics based on structural dependencies. The project aims to provide the observation about what KSpread has had in its evolution during last five years and how much stable the system has been in terms of changes among the major releases.

Page 3: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

iii

Table of Contents 1 Introduction................................................................................................................. 1

2 Proposed Approach..................................................................................................... 2

3 Overview of the Case Study: KSpread ....................................................................... 4

4 The Architecture of KSpread ...................................................................................... 6

4.1. Concrete Architecture ......................................................................................... 6

4.2. Architecture Style of KSpread Application ........................................................ 8

4.3. Subsystems Description...................................................................................... 9

5 Retrospective Analysis of KSpread .......................................................................... 11

5.1. System Growth.................................................................................................. 11

5.1.1 KSpread Growth: Size Parameters............................................................ 11

5.1.2 KSpread Growth: Number of Functions ................................................... 11

5.1.3 KSpread Growth: Number of Function Calls ........................................... 13

5.2. System Changes ................................................................................................ 14

5.2.1 KSpread Subsystems Changes: Changed Functions in Subsystems......... 14

5.2.2 KSpread System Changes: Changed Functions in Releases..................... 15

5.2.3 KSpread System Changes in terms of Function Calls .............................. 16

5.3. Cohesion and Coupling..................................................................................... 17

5.3.1 Cohesion ................................................................................................... 17

5.3.2 Coupling.................................................................................................... 18

5.4. Discussion on Results ....................................................................................... 20

5.4.1 Growing Rate ............................................................................................ 20

5.4.2 Changing Rate........................................................................................... 24

5.4.3 Cohesion and Coupling............................................................................. 26

6 Predictive Analysis of KSpread Application ............................................................ 29

6.1. Analysis in terms of Structural Dependency .................................................... 29

6.1.1 A Catalogue of Heuristics......................................................................... 30

6.1.2 The Algorithm........................................................................................... 30

6.1.3 Entity-based Structures using Dependency and Layout ........................... 30

Page 4: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

iv

6.1.4 Results....................................................................................................... 31

6.2. Analysis in terms of Size .................................................................................. 31

6.3. Analysis in terms of Coupling .......................................................................... 32

7 Conclusions and Future Works................................................................................. 34

8 Acknowledgement .................................................................................................... 36

9 References................................................................................................................. 37

Appendix A – Details of Heuristic Analysis..................................................................... 39

Page 5: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

v

List of Figures

Figure 1 . Types of Evolution Support Techniques according to the Time of Evolution ........................................ 2

Figure 2 . BFX-Based Pipeline ............................................................................................................................... 6

Figure 3 . KSpread 1.1.0 Top-Level Concrete Architecture ................................................................................... 7

Figure 4 . KSpread 1.3.4 Top-Level Concrete Architecture ................................................................................... 8

Figure 5 . KSpread Concrete Architecture in More Details .................................................................................... 9

Figure 6 . Script to Extract the KSpread Growth in Terms of Number of Functions ............................................ 12

Figure 7 . Script to Extract the KSpread Growth in Terms of Number of Function Calls .................................... 13

Figure 8 . Script to Extract the KSpread Subsystems Changes in Terms of Number of Functions ....................... 14

Figure 9 . Script to Extract the KSpread System Changes in Terms of Number of Functions .............................. 15

Figure 10 . Script to Extract the KSpread Changes in Terms of Number of Function Calls ................................... 16

Figure 11 . Script for Cohesion Analysis on KSpread Releases.............................................................................. 17

Figure 12 . Script for Coupling Analysis on KSpread System ................................................................................ 19

Figure 13 . KSpread Growing Rate in terms of Number of Files, cLinks, Subsystems, and Sub-subsystems ........ 21

Figure 14 . Growing Rate of Different Subsystems in terms of Number of Functions in the Nine Releases .......... 22

Figure 15 . Growing Rate of KSpread according to # of Functions during the Evolution Releases....................... 23

Figure 16 . Growing Rate of KSpread according to # of Function Calls during Evolution.................................... 23

Figure 17 . Changing Rate of KSpread Subsystems in terms of Function during the Evolution............................ 24

Figure 18 . Changing Rate of KSpread in terms of Function during the Evolution................................................ 25

Figure 19 . Changing Rate of KSpread in terms of Function Calls during the Evolution ...................................... 26

Figure 20 . Cohesion in each subsystem of KSpread during the evolution ............................................................ 27

Figure 21 . Coupling for Controller and Document ............................................................................................... 28

Figure 22 . Model of the Change Propagation Process .......................................................................................... 29

Figure 23 . KSpread Change in terms of Number of Lines of Code ...................................................................... 32

Figure 24 . Some Lower Level Entities with High Coupling ................................................................................. 33

Page 6: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

vi

List of Tables

Table 1. Size of Each Release in Terms of Some Parameters ............................................................ 11

Table 2. Number of Functions in Each Subsystem of Each Release .................................................. 12

Table 3. Number of Functions in Each Release.................................................................................. 13

Table 4. Number of Function Calls in Each Release .......................................................................... 13

Table 5. Number of Functions Added and Deleted within Each Subsystem of Each Release............ 15

Table 6. Number of Functions Added and Deleted in Each Release .................................................. 16

Table 7. Number of Function Calls Added and Deleted between Two Successive Releases ............. 17

Table 8. Number of cLinks Within Each Subsystem of Each Release ............................................... 18

Table 9. Coupling of Controller and Document with the whole system............................................. 18

Table 10. Some Lower Entities with High Coupling............................................................................ 20

Table 11. Results of the Heuristics for the Studied Releases................................................................ 30

Table 12. Number of Lines of Code during the Evolution of KSpread ................................................ 31

Page 7: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

1

1 Introduction Software systems continually change and increase in size and complexity. By performing many changes in terms of adding new features, fixing faults and restructuring during maintenance, further modifications may become hard to do. The aging of these products is impeding the further evolution of the system [23].

Software evolution is the stage of software lifecycle where substantial changes are made. One of the standard definitions for this term is presented in [18]: the dynamic behavior of programming systems as they maintained and enhanced over their life times. So, the elementary building block of software evolution is software change [14]. Understanding dependency among the system modules helps to evaluate the impact of changes. There are several methods used in this way based on various metrics.

The investigation in this project concentrates on nine releases of a Linux spreadsheet namely KSpread [1] that was delivered over a period of five years. KSpread is the spreadsheet application of the KOffice [4]. It is released under GPL-compatible open source licenses (e.g. GPL, LGPL, and BSD) with scriptable feature that provides both table-oriented sheets and complex mathematical formulas and statistics. There are several versions of KSpread available and some of them are also in development. Some of these releases are the target of this project to be analyzed. Such an analysis aims to determine the dimensions of the architectural evolution of KSpread. The selected releases in this project are: 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.3.3, and 1.3.4.

In this report, we use terms subsystems, sub-subsystems, and modules to refer different levels of abstractions and entities shown the system architecture. For some investigations about new features or fixes in release notes, we have also analyzed lower levels in the architecture tree up to leaves (object files). Release information such as developers change notes, source code, and the SWAGKit [4] tool together have been used to extract the architecture of the system. The project also performs Retrospective and Predictive analysis on the nine releases in terms of some metrics such as size, function calls, cohesion, coupling, and some heuristics based on structural dependencies.

The report is organized as follows. Section 2 describes the proposed approach following by Section 3 which includes a brief overview on the case study. Section 4 presents the process of extracting architecture and the resulted concrete architecture for the system with a little description for each high level subsystem. Section 5 and Section6, respectively, describe the way we have done the Retrospective and Predictive analysis including our findings during the analysis. Finally, section 7 concludes the report and talks about the findings and future works in overall.

Page 8: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

2

2 Proposed Approach In the investigation of the characteristics for evolving software systems, some attributes may be useful to classify the metrics and applicability of the evolution support techniques. One of the important types of this family is according to the time of evolution support which may be done after, during, or before the evolution: they are Retrospective, Curative1, and Predictive [10] as depicted in Figure 1 .

Figure 1 . Types of Evolution Support Techniques according to the Time of Evolution

In this project, the two types of retrospective and predictive analysis are performed on the KSpread. Both of them are under the assumption that the goal of software architecture is software evolution. Basically, the retrospective analysis includes techniques and tools that allow analyzing where, how and why a software system has evolved in the past. Its goal is to check whether the architecture is consistent through various releases of the software by determining the improvements and degradation of quality through releases. It can be State-based consisting of the techniques and tools that compare the intermediate stages (e.g., the UNIX facility diff). Or it may be Change-based which is related to the techniques and tools that analyze the changes (e.g., inspecting the change log). One of the goals of this project is to do the change-based retrospective analysis on a spreadsheet releases.

Another direction of this project is to do the predictive analysis on the spreadsheet. Generally, this category of analysis covers all techniques and tools that allow maintainers to make decisions concerning the parts of the software that should be improved. It can be Evolution-critical, Evolution-prone, or Evolution-sensitive which are elaborated as follows [11]:

1 Curative concerns techniques and tools that support the actual changes to the software system.

Page 9: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

3

• Evolution-critical identifies those parts of the software that need to be evolved due to a lack of quality (e.g., quality metrics)

• Evolution-prone assess which parts are likely to evolve in the future (e.g., by

visualizing the number of changes) • Evolution-sensitive distinguishes those parts of the software that will suffer from

evolution (e.g., impact analysis)

Therefore, we accomplish a historical analysis on several releases of a system in the family of spreadsheets. We, mainly, concentrate on change-based Historical Analysis and using the results some Predictive Analysis is done afterwards. In this way, it is possible to study the system evolution in terms of the collected metrics [9, 15, 18, 21] and predict about the evolution-critical and evolution-sensitive parts of the system.

Page 10: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

4

3 Overview of the Case Study: KSpread To study software evolution, we need to concentrate on different releases of a software system. For such a purpose, it is useful to gather a family of software releases and compare some properties of each release to the next one. By comparing those properties, the stability of the architecture and its evolution supportability can be evaluated.

One of the most common forms of software in use today is Spreadsheet language program [13]. Commercial spreadsheet systems, one of the most widely used PC business applications, are a subclass of spreadsheet languages. The relative simplicity of the spreadsheet paradigm lets end-users with little or no formal programming background quickly automate a wide variety of computational tasks. The spreadsheets these users create play an influential role in decisions about budgets, investments, student grades, taxes, and many other important issues. Spreadsheets are not just mechanisms for organizing and displaying data: rather, they are programs that use formulas to transform inputs into outputs. Moreover, like programs in imperative languages, spreadsheets often contain errors. Their errors can often be traced to problems in the development of the spreadsheet [22]. Therefore, this project’s concentration is on a case study of spreadsheet families.

The project investigates the results of analysis on the proposed metrics in the context of a spreadsheet of the Linux family namely, KSpread. It is part of the KDE (“KOffice”) since KDE2 [2]. It is a project released under GPL-compatible open source licenses (e.g. GPL, LGPL, and BSD) which is developed during the period of last 5 years. KSpread is a scriptable spreadsheet application that provides both table-oriented sheets and complex mathematical formulas and statistics. Some of its features include:

• Multiple tables/sheets per document • Templates • Multiple chart formats for displaying data graphically • Headers and footers • Over 100 formulas, including standard deviation, variance, present value of

annuities and much more • Sorting • Scripting • Lists • Cell data validity checking with configurable warnings/actions • Comments • Series (days of week, months of year, numbers, etc.) • Conditional coloring of cells • Hyperlinks • Row and column customization (size, show/hide, font type, style and size, etc.) • Cell customization (data/number format, precision, border, alignment, rotation,

background color and pattern, font type, style and size, etc.)

Page 11: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

5

There are several versions of KSpread available and some of them are also under development. The stable releases of this spreadsheet are the target of the proposed project to be analyzed to determine the dimensions of KSpread evolution. The selected nine releases in this project are: 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.3.3, and 1.3.4.

Before applying the historical analysis, it is needed to extract the architecture of KSpread. This helps to extract the relations from source code, and to visualize architectural entities for each release of KSpread. In the following section, this process and the resulted concrete architecture for the last release is proposed to provide a roadmap for the rest of the work.

Page 12: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

6

4 The Architecture of KSpread

4.1. Concrete Architecture A BFX-based pipeline [4] has been used to extract relations from source code and visualize architectural entities of KSpread as shown in Figure 2 [6].

Figure 2 . BFX-Based Pipeline

The extraction and clustering process of KSpread concrete architecture consists of the following process:

• Step1: Building KSpread system using GCC and make files (There were many challenges to be deal with to have a correct compilation, because each release of the KSpread needs different versions of libraries!)

• Step2: Using the BFX utility of the BFX Pipeline to extract the facts from the

object files generated during the building process. The result of this step is the TA file: kspread.bfx.ta.

• Step3: Creating a hierarchy file for KSpread system as follow:

The KSpread raw file has been extracted by running a jGork script to get kspread.raw.ta.

The kspread.contain.ta file (the mapping file) has been created in RSF format.

• Step4: Adding the schema part to the raw file to produce kspread.ls.ta for LsEdit. • Step5: Clustering the high-level modules of KSpread using the extracted

architecture visualized by LsEdit and the available source codes of KSpread [1].

Page 13: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

7

The system hierarchy in the source model mainly comes from the source directories hierarchy, but for the top-level, there are so much details and highly dependent modules that can be put in a cluster. As a result of the mentioned processes, the top-level concrete architecture of KSpread first (1.1.0) and last releases (1.3.4) are shown in Figure 3 and Figure 4 respectively.

Figure 3 . KSpread 1.1.0 Top-Level Concrete Architecture

It is observable in these two figures that the last version ha more modules and more complexity in relations in contrast to the first release: The two top-level subsystems namely Value and Formatting have been added during the evolution of KSpread.

Page 14: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

8

Figure 4 . KSpread 1.3.4 Top-Level Concrete Architecture

In the two following sub-sections, the architecture style of the system and a short description of each top-level subsystem and its sub-subsystems are presented.

4.2. Architecture Style of KSpread Application At the first glance, the MVC (Model-View-Controller) Architecture of KSpread application seems obvious. As a brief description of MVC, it can mention that the MVC architectural pattern divides an interactive application into three components.

The Model contains the core functionality and data. View displays information to the user. Controller handles user inputs. View and Controller together comprise the user interface. A change-propagation mechanism ensures consistency between the user interface and the Model [19]. KSpread treats in a similar way. But, like other office applications, KSpread uses the Document/View Architecture, a slightly different variant of MVC, where the View and Controller are put together as one part. If a user is performing calculations on the spreadsheet application, the interface the user is viewing is made of small boxes called cells. Another user may be in front of a graphic document while drawing lines and other geometric figures. The object the user is starring at and performing changes is called a view. The view also allows the user to print a document.

Page 15: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

9

We have found that the GUI module works as the controller and with the View module are included by the Controller subsystem. On the other hand, Document holds the user's data. For example, after some working, the user may want to save the file. Such an action creates a document, and this document must reside somewhere. In the same way, to use an existing file, the user must locate it, open it, and make it available to the application. These two jobs and many others are handled behind the scenes as a document [7]. Figure 4 shows these descriptions in architectural view and the existing relations among the mentioned subsystems clearly.

4.3. Subsystems Description

KSpread consists of seven top-level subsystems. Some of them consist of other subsystems at a lower level that is called sub-subsystem. Figure 5 Figure 9 depicts the architecture of KSpread (last release) in more details showing the second level modules that are called sub-subsystem.

According to Figure 5 , the brief description of each top-level subsystem and its sub-subsystems are as follow:

Figure 5 . KSpread Concrete Architecture in More Details

Document: This subsystem is the Model part of the MVC (Model/View/Controller) architecture of the system. It contains two sub-subsystems as follow:

Page 16: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

10

• Workbook: This part contains Row/Column, Worksheet, and Map (a single container for all tables) which form some of the main parts of the spreadsheet

• DS: This part is the Data Structure part which holds the data structures and objects.

Functions: The Functions subsystem includes the Built-in Functions subsystem which contains functions such as conversion, math, text, date/time, financial, and statistical functions. It also contains two other sub-subsystems named FormulaEngine and Dependency. A brief description for them is as below:

• FormulaEngine: It works as an expression evaluator. To offer a better performance, the expression is first compiled into byte codes which will be executed later by a virtual machine.

• Dependency: It manages the maintenance and handling of dependencies for each worksheet.

Value: It provides a wrapper for cell value so that each cell in a worksheet must hold a value, either as entered by user or as a result of formula evaluation.

Controller: The Controller subsystem includes the View and Controller parts of the MVC (or better to say Document/View) architecture of the system. The GUI subsystem acts as the Controller and the other entities make the View part together. The GUI, itself, contains some user interface entities and two sub-subsystems named SelectionHandling and dlg. The SelectionHandling work is to contain a list of cells/ranges/rows/whatever like current selection. This will allow the implantation of CTRL-selections and each operation will automatically support these. The dlg is related to the actions related to the dialog box.

Formatting: This subsystem specifies how a cell should look like. For example, it evolves font attributes like bold or italics, vertical and horizontal alignment, rotation angle, shading, background color and so on.

Plug-in: One of the Plug-in sub-subsystems is Calculations. It works as a basic calculator for the spreadsheet. Other modules of Plug-in concern some configuration tasks used by the calculator which are included by the Config sub-subsystem.

Utility: Lib is one of its subsystems. It contains some libraries which are used frequently by different parts of the system. It also supports the undo and redo actions whenever the user likes to do so. The other sub-subsystem is the Init which contains the initialization code for the system.

Page 17: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

11

5 Retrospective Analysis of KSpread

The retrospective analysis looks through the successive releases of the software to see how smoothly the evolution happened. It is based on various comparing properties of the software through various releases to check the architecture consistency.

In this way, a subset of available metrics in software evolution domain has been used for KSpread to compare the adjacent releases with each other. They relate to the study of System Growth, System Changes, Cohesion, and Coupling in the following subsections. As it was said, the working releases are 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.3.3, and 1.3.4.

5.1. System Growth The growing rate is defined as the rate of functions increment from one release to the next [20]. In this part of the analysis, the historical development of KSpread size in terms of some parameters such as size, number of functions, and number of function calls are studied. As a sequence, the growing rate in the case of proposed metrics will be achieved.

5.1.1 KSpread Growth: Size Parameters One of the metrics that is used to show the growing rate of the system is the size of system in terms of several parameters. As systems grow in size, it becomes increasingly difficult to add new code unless explicit steps are taken to recognize the overall design [16]. The parameters that are used here are number of files, cLinks, subsystems, and sub-subsystems. Table 1 shows the result of querying the working releases of KSpread system in terms the mentioned parameters.

Table 1. Size of Each Release in Terms of Some Parameters

Release # 1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

# of files 66 66 86 86 99 99 99 99 99

# of cLinkc 6015 6018 7815 9357 13410 13404 13404 13407 13626

# of subsystems 5 5 5 5 7 7 7 7 7

# of sub-subsystems 12 12 16 16 17 17 17 17 17

5.1.2 KSpread Growth: Number of Functions One view in studying the growing rate of the system is the growth in terms of number of functions in each subsystem and also the total number in the whole system. These two sets of data have been extracted by running a script in ql/Grok [5] shown in Figure 6 on the .ls.ta file of each working release.

Page 18: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

12

Figure 6 . Script to Extract the KSpread Growth in Terms of Number of Functions

The results are shown in Table 2 and Table 3 in the context of each subsystem and the whole KSpread system respectively for each release.

Table 2. Number of Functions in Each Subsystem of Each Release

Release #

Subsystem

1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

Controller 1264 1266 2045 2045 2375 2375 2375 2377 2368

Document 1378 1376 1605 1610 1545 1545 1545 1545 1544

Functions 250 250 449 449 692 692 692 692 684

Formatting - - - - 362 362 362 362 360

Plugin 311 311 317 317 312 312 312 312 306

Utility 200 200 261 261 294 294 294 294 293

Value - - - - 124 124 124 124 123

Page 19: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

13

Table 3. Number of Functions in Each Release

Release # 1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

# of functions 3403 3403 4677 4682 5704 5704 5704 5706 5678

5.1.3 KSpread Growth: Number of Function Calls In the following of KSpread growth investigation, function call has been selected as another metric. Figure 7 contains the script which has been run on the .ls.ta file of each release in this way.

Figure 7 . Script to Extract the KSpread Growth in Terms of Number of Function Calls

The result of this part is shown in Table 4 for the nine releases of KSpread.

Table 4. Number of Function Calls in Each Release

Release # 1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

# of function calls 4318 4321 7189 7196 10519 10522 10522 10525 10723

Page 20: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

14

5.2. System Changes In this part, the changing rate is considered as the number of functions and function calls changed from one release to another [20]. To compute the changing rate, as usual, the relative number of changed variable represents the changing rate for each two successive releases. Here, change means add or delete which are studied in both the whole KSpread system and each of its subsystems in terms of number of functions and function calls.

5.2.1 KSpread Subsystems Changes: Changed Functions in Subsystems To do the changes analysis in terms of functions, the script in Figure 8 has been run on the .ls.ta files and resulted in Table 5.

Figure 8 . Script to Extract the KSpread Subsystems Changes in Terms of Number of Functions

Page 21: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

15

Table 5. Number of Functions Added and Deleted within Each Subsystem of Each Release

1.1.0-1.1.1 1.1.1-1.2.0 1.2.0-1.2.1 1.2.1-1.3.0 1.3.0-1.3.1 1.3.1-1.3.2 1.3.2-1.3.3 1.3.3-1.3.4 Release

Subsys A D T A D T A D T A D T A D T A D T A D T A D T

Document 2 4 6 1402 1173 2575 5 0 5 828 893 1721 0 0 0 0 0 0 0 0 0 23 24 47

Functions 0 0 0 447 248 695 0 0 0 313 70 383 0 0 0 0 0 0 0 0 0 9 17 26

Value - - - - - - - - - - - - 0 0 0 0 0 0 0 0 0 2 3 5

Controller 2 0 2 1987 1208 3195 1 1 2 794 464 1258 0 0 0 0 0 0 2 0 2 31 40 71

Formatting - - - - - - - - - - - - 0 0 0 0 0 0 0 0 0 1 3 4

Plugin 2 2 4 238 232 470 0 0 0 7 12 19 0 0 0 0 0 0 0 0 0 0 6 6

Utility 2 2 4 258 197 455 0 0 0 100 67 167 0 0 0 0 0 0 0 0 0 3 4 7

5.2.2 KSpread System Changes: Changed Functions in Releases By running the script in Figure 9 on the .ls.ta files of each two successive releases the resulted data in Table 6 has been extracted.

Figure 9 . Script to Extract the KSpread System Changes in Terms of Number of Functions

Page 22: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

16

Table 6. Number of Functions Added and Deleted in Each Release

Release # 1.1.0-1.1.1

1.1.1-1.2.0

1.2.0-1.2.1

1.2.1-1.3.0

1.3.0-1.3.1

1.3.1-1.3.2

1.3.2-1.3.3

1.3.3-1.3.4

# of added functions 8 4332 6 2528 0 0 2 69

# of deleted functions 8 3058 1 1506 0 0 0 97

# of total changes 16 7390 7 4034 0 0 2 166

5.2.3 KSpread System Changes in terms of Function Calls The number of function calls added or deleted between each pair of successive releases of KSpread is another metric for studying changes growth. This parameter has been proposed over each release of KSpread by running the Figure 10 script on .ls.ta file.

Figure 10 . Script to Extract the KSpread Changes in Terms of Number of Function Calls

Page 23: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

17

In consequence, Table 7 shows how number of function calls changes in the whole system of KSpread during the working releases.

Table 7. Number of Function Calls Added and Deleted between Two Successive Releases

Release # 1.1.0-1.1.1

1.1.1-1.2.0

1.2.0-1.2.1

1.2.1-1.3.0

1.3.0-1.3.1

1.3.1-1.3.2

1.3.2-1.3.3

1.3.3-1.3.4

# of added function calls 18 6997 18 7098 12 0 7 210

# of deleted function calls 15 4129 11 3775 9 0 4 12

# of total changes 33 11126 29 10873 21 0 11 222

5.3. Cohesion and Coupling Cohesion, mainly, is a measure of strength of functional relatedness of the elements in a module. And, Coupling addresses connections between subsystems, but it could be applied in different levels of abstraction in an architecture model [12]. They can be used as two basic metrics for software system architecture.

To have investigation in this context for each release, number of internal relations in each subsystem and number of interactions among subsystems has been calculated. The first one is used for cohesion and the later for coupling analysis. More details of this analysis are presented in the following sections.

5.3.1 Cohesion The script shown in Figure 11 is used to count the number of internal relationship within each subsystem (= cLinks#). This metric is used to measure cohesion for each top-level subsystem.

Figure 11 . Script for Cohesion Analysis on KSpread Releases

Page 24: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

18

In consequence, Table 8 contains the results of running Figure 11 script on the .ls.ta file of each release. As mentioned earlier, it expresses the cohesion by the number of cLinks relation within each subsystem. The more the number of cLinks, the more the cohesion inside the subsystem

Table 8. Number of cLinks Within Each Subsystem of Each Release

Release #

Subsystem

1.1.0

1.1.1

1.2.0

1.2.1

1.3.0

1.3.1

1.3.2

1.3.3

1.3.4

Document 2086 2081 2972 2980 2824 2824 2824 2824 2908

Functions 198 198 748 748 1624 1624 1624 1624 1665

Value - - - - 367 367 367 367 367

Controller 1718 1723 2631 2632 3441 3436 3436 3439 3515

Formatting - - - - 672 672 672 672 676

Plugin 769 769 769 769 770 770 770 770 770

Utility 287 287 695 695 745 745 745 745 759

5.3.2 Coupling The number of external relationship between each module and all other modules has been used to express coupling for the initial module. In this project, the investigation of coupling has mostly focused on the two larger and more important subsystems of KSpread that are Document and Controller (The main parts of the View/Controller architecture of the system) are shown in Table 9.

Table 9. Coupling of Controller and Document with the whole system

Release #

Subsystem

1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

Controller 1441 1507 2247 2393 3351 3310 3059 3313 3385

Document 2786 2785 4096 4225 4193 4191 4191 4191 4353

The results have been achieved by running the script of Figure 12 on .ls.ta files of the nine releases.

Page 25: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

19

Figure 12 . Script for Coupling Analysis on KSpread System

Page 26: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

20

After running the script, some other entities in lower level have been also noticeable in case of their high coupling with other modules. They are listed in Table 10 with their number of coupling during the evolution of KSpread.

Table 10. Some Lower Entities with High Coupling

Release #Entity

1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

KSpread_Table.o 1285 1285 1557 1555 0 0 0 0 0

KSpread_canvas.o 437 442 417 418 450 450 450 450 454

KSpread_cell.o 364 364 818 824 1095 1095 1095 1095 1108

KSpread_dlg_layout.o 265 265 324 324 391 391 391 391 395

KSpread_view.o 271 271 565 563 759 756 756 759 766

KSpread_doc.o 132 132 213 213 508 507 507 507 520

KSpread_sheet.o 0 0 0 0 1674 1674 1674 1674 1702

KSpread_sheetprint.o 0 0 0 0 230 230 230 230 232

The sudden change of table entity to 0 is because of the rename task done during the project development, it does not have any technical reason. The developers’ team decided not to use the term table. It was incorrectly invented because of the term Tabelle (German, literally means table). The correct term is sheet or worksheet. The English version of Microsoft uses sheet while the German version uses Tabelle. So, they rename the word “table” by “sheet” in various places. The addition of sheet entities is obvious in Table 10.

5.4. Discussion on Results One of the powerful ways of discussion on measurement analysis is using graphs to have a better understanding of what has been achieved. The results of measurement used for the retrospective analysis of KSpread evolution (by the proposed metrics) are discussed in this section. These results are shown via 2D and 3D graphs according to the case.

5.4.1 Growing Rate Table 1 shows the result of growth analysis on KSpread releases in terms of some size parameters. The investigation has been done via four metrics: number of files, Clinks, subsystems, and sub-subsystems. As a result of what found in Table 1, Figure 13 depicts the growth of releases according to the four parameters, visually.

Page 27: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

21

Figure 13 . KSpread Growing Rate in terms of Number of Files, cLinks, Subsystems, and Sub-

subsystems

It is obvious that significant number of files have been added in the cross of 1.1.X to 1.2.X and a.2.X and 1.3.X. According to all graphs, a large amount of growth from releases 1.1.X to 1.2.X is clearer than anything else. It is almost correct for the cross of 1.2.X to 1.3.X too. We map these major changes to the developers’ notes in [25] that, KSpread 1.3.0 and KSpread 1.2.0 are two stable releases which offer a number of important feature additions and improvements compared to their previous versions. On the other hand, the other releases that are in between do not have any noticeable changes and improvement.

Next interesting graph is resulted from Table 2 data sets. It is a 3D graph depicted in Figure 14 that shows the growing rate of KSpread across the releases and in terms of number of function in each subsystem. It is interesting that this one seems to tell the same story as the previous one. It clearly shows the more growing rate over 1.1.1 to 1.2.0 and nearly from 1.2.1 to 1.3.0. It is obvious that the Functions subsystem has the more nonlinear growth between releases 1.1.1 to 1.2.0.

We map this change to that part of developers’ notes that, several built-in functions have been added to 1.2.0. Controller and Formatting follows the same way because of the increment of UI and formatting features to 1.2.0 respectively. It is also noticeable that Document and Functions have the major changes from their first creation to the last

Page 28: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

22

release. They have become 2 and 3 times larger in terms of functions number during the evolution of KSpread.

In the context of subsystems, we also see that the major subsystems of the system are Document and Controller. They consist of a large amount of functions. It is reasonable according to the fact that these two subsystems role as Document and View parts of the Document/View system architecture as mentioned before.

Figure 14 . Growing Rate of Different Subsystems in terms of Number of Functions in the Nine

Releases

This perspective can be seen by another view which is the study of the growing rate of the system according to the total number of functions (also shown in Table 3). As Figure 15 shows, the same story is correct here: A nonlinear growth from 1.1.X to 1.2.X and 1.2.x to 1.3.X.

We have also analyzed the system according to the number of function calls (results shown in Table 4). This metric shows the total number of function calls within each release. Figure 16 shows that during the evolution of KSpread, we are facing with a complexity in the system. The more the release number, the more the number of function calls and, consequently, more complexity exists in the system.

Page 29: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

23

Figure 15 . Growing Rate of KSpread according to # of Functions during the Evolution Releases

Figure 16 . Growing Rate of KSpread according to # of Function Calls during Evolution

Page 30: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

24

5.4.2 Changing Rate The first parameter in this category is the changes in functions. We have measured this metric in both system and subsystem levels. The results of analysis on this metric have been shown in Table 5 and Table 6. We define changes number as the sum of the number of added and deleted functions.

Table 5 resulted in Figure 17 that shows the change rate of the seven top-level subsystems of KSpread according to the function changes during the nine releases.

Figure 17 . Changing Rate of KSpread Subsystems in terms of Function during the Evolution

We see that evolution happened mostly on the Document, Controller, and Functions subsystems. Plugin is also evolving during releases 1.2.0 to 1.3.0. Again, it is obvious that these noticeable changes occurred mostly during those mentioned intervals that were 1.1.X to 1.2.X and 12.X to 1.3.X. This observation again reflects that KSpread 1.3.0 and KSpread 1.2.0 offer a number of important feature additions and improvements compared to their previous versions On the other hand, Formatting and Value are two subsystems that do not have any changes (nearly 0). It is probably because of becoming stable during the evolution. But, it is obvious that their evolution has started since release 1.3.0 that these two subsystems have been created in the system.

And, 0 is the result of Table 6. As the graph shows, the rate of addition, deletion, and the total changes have a same rate during the nine releases.

Page 31: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

25

Figure 18 . Changing Rate of KSpread in terms of Function during the Evolution

The three types of changes (addition, deletion, and total) peak at the cross of 1.1.1 to 1.2.1 and 1.2.1 to 1.3.0 that reflect the same observation as mentioned before. And, there is a linear change during the other parts. It is also noticeable that the changes are too small from 1.1.0 to 1.1.1, 1.2.0 to 1.2.1, and 1.3.3 to 1.3.4. It shows that, it exists a stable state for the system after a large amount of changes in the features (and consequently in the number of function calls).

The last metric of this category is the changes in function calls. This metric has been analyzed to show that how many total resources were used in each release. Figure 19 shows the result of analysis on this metric that was shown in Table 7. The changes in this parameter are extremely similar to the function changes in the previous graph. The significant change in function calls is between releases 1.1.1 to 1.2.0 and 1.2.1 to 1.3.0 and they are probably done in Document and Controller in major amount.

Page 32: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

26

Figure 19 . Changing Rate of KSpread in terms of Function Calls during the Evolution

In consequence of the comparison between this graph and the previous one ( Figure 18 ), it is obvious that, there are around 11000 changes in function calls against the 7500 changes in function at the peaks of both diagrams.

5.4.3 Cohesion and Coupling In investigation of the cohesion parameter, graph shown in Figure 20 is the result of Table 8. We have considered the number of cLinks relation within each subsystem as a metric to measure cohesion inside that subsystem. As mentioned before, the more the number of cLinks in a subsystem, the more cohesive is the subsystem.

Page 33: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

27

Figure 20 . Cohesion in each subsystem of KSpread during the evolution

Figure 20 expresses that the Controller and Document are two high cohesive parts of KSpread. And, the Document is the more stable one between these two according to this parameter, because it has a smoothly change in this way. Among the other subsystems, Formatting, Plugin, and Utility seem stable during the evolution.

The last metric of this part and also the retrospective analysis is the coupling. The investigation done on the Document and Controller coupling resulted in Table 9. Diagram of Figure 21 shows the results visually.

The two main subsystems, Controller and Document, are the highly coupled sub-systems. It is not strange that both of them are the two parts of the Document/View architecture of the system. Every other subsystem needs these two main subsystems for the controlling and structuring works. So, it seems that even the rapid increase of coupling in these two subsystems during the evolution of major releases is logical. We map this with the large feature improvements during these releases.

Page 34: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

28

Figure 21 . Coupling for Controller and Document

It is also noticeable that, release 1.2.1 seems a stable version in case of this parameter for the Document subsystem, since the changes are smooth for it after release 1.2.1. On the other hand, Controller has an inconsistent state in terms of this parameter and has dramatic changes.

Page 35: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

29

6 Predictive Analysis of KSpread Application

6.1. Analysis in terms of Structural Dependency This approach is presented in [8] to determine the change propagation during the source code. We consider an entity wants to change and try to determine other entities to change because of the first one. Figure 22 shows the model visually. Once the initial entity is changed, the developer then analyze the source code to determine if there are other entities to which the change must be propagated. The propagation process is repeated for such an entity. When the developer cannot locate other entities to change, there is a Guru part to consult. It can be a senior developer, a software development tool, or even a suite case. There are several heuristics to generate the set of entities that should be changed in response to a changed entity. Ideally, a heuristic would correctly suggest all the entities that represent a change set without asking the Guru for any advice. Referring back to the model shown in Figure 22 , this method tries to minimize the number of times the Guru is consulted for an entity to change.

Figure 22 . Model of the Change Propagation Process

Two concepts used in the approach are Recall and Precision that are defined in details in [8]. We state a short description of these two main concepts and some basic definition to follow the process:

Predicted Set (P) = Total set of suggested entities

Change Set (C) = Entities in P which need to be changed

Occurred Set (O) = Set of entities that needed to be predicted

Initial Entity (IE) = Selected entity by developer initially

Predicted-Occurred (PO) = Intersection of Predicted and Occurred

In consequence of these definitions, we have the following rules:

O = C– {IE}

Recall = PO/ O

Precision = PO/P

Page 36: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

30

The rest of the approach will use these definitions of Recall and Precision. To measure the performance over time, sum of the recall and precision will be divided by the number of change sets (M) in the history of the project [8]:

Average Recall = ∑=

×M

iicall

M 1)(Re1

Average Precision = ∑=

×M

iiecision

M 1)(Pr1

6.1.1 A Catalogue of Heuristics There are several heuristics that can be used to predict change propagation by suggesting entities that should change based on an entity that has changed. Each heuristic is characterized based on two main aspects: Heuristic Data Source and Pruning Technique. The proposed heuristics in [8] are Entity-based Historical Co-change, Entity-based Code Structure, Call Layout, Developer-based, Process-based, Name Similarity, and Random Data. And, the proposed pruning techniques are Frequency, Regency, and Hybrid.

Among the proposed heuristics, we have studied on two of them: Entity-based Code Structure using Relationships (CUD), and Entity Based Code Structure using Code Layout (FIL).

6.1.2 The Algorithm Using the historical change data stored in the source control repository of KSpread [25], the performance of the two mentioned heuristics is measured. These heuristics do not employ any pruning techniques. A change set is derived from the modification record stored in the source control repository of the project. By studying the content of the detailed message attached to each modification record throughout the history of the project, the General Maintenance (GM) modifications have been removed, because they do not reflect the implementation of a particular feature. So, these modifications are not considered in our analysis of the change propagation process. It is mentionable that, according to the findings of the previous parts, we applied these heuristics on two pairs of successive releases. It means that we have considered the cross of 1.1.X to 1.2.X and 1.2.X to 1.3.X. Because these two pairs of releases (1.1.1 with 1.2.0 and 1.2.1 with 1.3.0) form the main evolving parts of the releases evolution.

6.1.3 Entity-based Structures using Dependency and Layout By applying these two types of heuristic on the feature sets improvement from 1.1.1 to 1.2.0 and 1.2.1 to 1.3.0, table has been achieved. Details of calculations for this part are presented in Appendix A. The average of the found results will be calculated according to appendix A and are shown in 0.

Table 11. Results of the Heuristics for the Studied Releases

CUD FIL Release Interval

Recall Precision Recall Precision

Page 37: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

31

1.1.1-1.2.0 0.5 0.57 0.06 0.14

1.2.1-1.3.0 0.34 0.48 0.05 0.08

6.1.4 Results By using these two types of heuristics, we have focused on the structural dependency to evaluate the system. We have tried to track changes from other changes according to the structural dependencies exist within system. In this way, we see that the FIL has a low recall with sacrificing of precision in contrast to CUD. It can validate that the system’s source code files do not represent a coherent conceptual grouping of related items. So, changes have been propagated to entities in different objects during the evolution of the software. In this way, the prediction comes to picture that the changes propagation in KSpread do not have a stable state and every thing may happen without control.

Further examination of the results in 0 shows that the code structure dependency relation (CUD) shows more powerful change propagation during the evolution of the software. It reveals that there are noticeable numbers of entities that are related to the just changed entities in the system and are via a dependency relationship. It may let to predict that, it is risky in this system to have any restructuring. By removing any entity form its own place and replacing it some where else, some inconsistency may happen in the structure of the system and in consequence the way the system tolerate changes is under risk.

6.2. Analysis in terms of Size One of the popular metrics used for this purpose is the line of code. We have also investigated on this metric to have more results in context of size, so that we can have a better prediction according to this category of metrics. Table 12 reveals the result for the nine releases of KSpread.

Table 12. Number of Lines of Code during the Evolution of KSpread

Release# 1.1.0 1.1.1 1.2.0 1.2.1 1.3.0 1.3.1 1.3.2 1.3.3 1.3.4

# of Lines of Code 61713 61722 78710 78927 102239 102238 102259 102275 102276

As it is seen in Figure 23 , there are again noticeable increases in between the major releases (means 1.1.1 to 1.2.0 and 1.2.1 to 1.3.0). All the size parameters in retrospective analysis have expressed this fact too.

Page 38: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

32

Figure 23 . KSpread Change in terms of Number of Lines of Code

So, it is predictable that the future major releases of KSpread may deal with a significant change in size. As the size of software grows up dramatically, the maintenance procedure becomes more costly and complex. KSpread may be in this way too.

6.3. Analysis in terms of Coupling One of the results from the coupling analysis in Table 10 is shown in Figure 24 . These entities may be among those kinds of modules that are called Evolution Sensitive Parts of the system. Because, these are such modules that provide many resources to other subsystems, therefore having any change on them, may force significant number of changes in the entities they are linked with. This result is closely related to our achievement in the structural dependency analysis.

Page 39: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

33

Figure 24 . Some Lower Level Entities with High Coupling

Page 40: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

34

7 Conclusions and Future Works In this project, we have accomplished a historical analysis on several releases of KSpread. The study has been done on nine releases of the system developed during a period of five years.

Release information such as developers change notes, source code, and the SWAGKit tool together have been used to extract the architecture of the system. Consequently, it seems that KSpread follows the MVC architecture and better to say Document/View architecture. After extracting the architecture for all the nine releases, we have focused on the evolution process of the project. It has involved analysis on the nine releases in terms of some metrics such as size, function calls, cohesion, coupling, and some heuristics based on structural dependencies. Using the scripts written in ql/Grok [5], the historical data from all the releases have been extracted and put in tables according to the metric category. 2D and 3D diagrams have helped us to have a better understand of what happened during the evolution in terms of the proposed metrics.

Software evolution is software change. Understanding dependency among the system modules helps to evaluate the impact of changes. In this way, we have done the Retrospective analysis especially in the change-based view and compared different releases to each other. Using the retrospective results and some heuristics we have tried to do the predictive analysis on the releases too. To do so, we have prioritized new features and bug fixes [3] in release notes and rough change logs. Then, we have applied the heuristics on the major releases transitions.

All in all, during the analysis we understood that KSpread 1.3.0 and KSpread 1.2.0 are two stable releases that offer a number of important feature additions and improvements compared to their previous versions. On the other hand, the other releases that are in between do not have any noticeable changes and improvements. The major transitions are from 1.1.1 to 1.2.0 and 1.2.1 to 1.3.0.

We have also found that the major subsystems of the system are Document and Controller. They consist of a large amount of functions. It is reasonable according to the fact that these two subsystems role as Document and View parts of the Document/View system architecture of KSpread.

The other mentionable thing is that during the evolution of KSpread, we are facing with a complexity in the system. The more releases, the more the number of function calls and size of the system (in terms of various size parameters). Consequently, more complexity will be imposed to the system when it evolves.

Moreover, we could find those parts of the system that have evolved more. We see that evolution happened mostly on the Document, Controller, and Functions subsystems. Plugin is also evolving during releases 1.2.0 to 1.3.0. In this way, we predict that these parts may evolve more during the future evolution of the system, so they may be the Evolution-critical parts of the system.

During the study, we could also found the most cohesive parts of the system. It seems that Controller and Document are two high cohesive parts of KSpread. And, the Document is the more stable one between these two according to this parameter, because

Page 41: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

35

it has a smoothly change in this way. Among the other subsystems, Formatting, Plugin, and Utility seem stable during the evolution too. The two main subsystems, Controller and Document, seem to be the high coupled sub-systems too. We think that it is logical, because both of them are the two parts of the Document/View architecture of the system and role as the main subsystems. Every other subsystem needs these two main subsystems for the controlling and structuring works. So, it seems that even the rapid increase of coupling in these two subsystems during the evolution of major releases is logical. We map this with the large feature improvements during these releases.

By using two types of structural entity-base heuristics, we have also focused on the structural dependency to evaluate the system. We have tried to study on changes propagation to reach a better prediction. In this way, we see that the FIL heuristic has a low recall with sacrificing of precision in contrast to CUD. It can validate that the system’s source code files do not represent a coherent conceptual grouping of related items. So, changes have been propagated to entities in different objects during the evolution of the software. In this way, the prediction comes to picture that the changes propagation in KSpread do not have a stable state and every thing may happen without control.

Finally, using the coupling metric, we have found some lower level entities (Graph in Figure 24 ) as the Evolution-sensitive parts of the system which must be taken in to account during any further changes in the system.

In this investigation, we have selected several metrics from different categories. We think that these sets of metrics may result in a good category to study the evolution of a system, specially the spreadsheets. One of the future works can be the use of these metrics on a family of spreadsheets to compare them during their evolution that is a framework for comparing metrics in different releases. Degree of change could be defined (by means of several thresholds) to distinct slight changes versus sensible changes. Fuzzy variables with specific fuzzy membership for each metric and a fuzzy rule set may help effectively to handle this work.

Page 42: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

36

8 Acknowledgement We would like to express our gratitude to Professor Richard C. Holt for his guides and the Grok language and also to Jingwei Wu for his helps and QLDX pipeline.

Page 43: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

37

9 References

[1] KSpread documentation at: http://www.kde.org /

[2] KOffice available at http://www.koffice.org/

[3] Bugs notes available at: http://bugs.kde.org/

[4] SWAGKit tool available at: http://www.swag.uwaterloo.ca/swagkit/

[5] Ql/Grok Language available at: http://swag.uwaterloo.ca/~nsynytskyy/grokdoc/index.html

[6] J. Wu, “CS746 Software Architecture Organizational Meeting”, 2004, Available at: http://plg.uwaterloo.ca/~holt/cs/746/04f/slides/organizationalmeeting.ppt

[7] The Document/View Architecture: http://www.functionx.com/MFCFundamentals/Lesson05.htm

[8] Hassan A. E., Holt R. C., “Predicting Change Propagation in Software Systems”, In proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM’04), Pages: 284-293, 2004.

[9] Wu J., Holt R.C., Hassan A.E., “Exploring software evolution using spectrographs”, In proceedings of WCRE: Working Conference on Reverse Engineering, 2004.

[10] Demeyer S., Mens T., Wermelinger M., “Towards a Software Evolution Benchmark”, In Proceedings IWPSE2001 (4th International Workshop on Principles of Software Evolution), Pages: 174-177, 2002

[11] Mens T., Demeyer S., “Future trends in software evolution metrics”, International Conference on Software Engineering, Pages: 83-86, 2001.

[12] Mens T., Demeyer S., “Future Trends in Software Evolution Metrics,” In proceedings of IWPSE’01, 2001.

[13] Vijay B. Krishna et al, “Incorporating Incremental Validation and Impact Analysis into Spreadsheet Maintenance: An Empirical Study”, In proceedings of the IEEE International Conference on Software Maintenance, Pages: 72-81, 2001.

[14] Rajlich V. T., “Software Evolution: A Road Map”, In Proceedings of IEEE International Conference, Page 6, 2001.

[15] Ramil J.F., Lehman M.M., “Metrics of software evolution as effort predictors – a case study”, In Proceedings of International Conference on Software Maintenance, Pages: 163-172, 2000.

[16] Godfrey M. W., Tu Q., “Evolution in Open Source Software: A Case Study”, International Conference on Maintenance, Pages: 131-142, 2000.

[18] Briand L.C., Wust J., Lounis H., “Using coupling measurement for impact analysis in object-oriented systems”, In Proceedings of 15th IEEE International Conference on Software Maintenance, Pages: 475–482, 1999.

Page 44: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

38

[18] Kemere C.F, Slaughter S., “An empirical approach to studying software evolution”, IEEE Transactions on Software Engineering, Volume: 25, Issue: 4, July-Aug. 1999.

[19] G. Krasner and S. Pope, “A Cookbook for Using the Model-View-Controller User Interface Paradigm in Smalltalk-80,” Journal of Object Oriented Programming, Vol. 1, pp. 26-49, 1988.

[20] Gall H., Jazayeri M., Klosch R. R., Trausmuth G., “Software Evolution Observations Based on Product Release History”, In proceedings of the International Conference on Software Maintenance, Pages: 160-166, 1997.

[21] Lehman M.M., Wernick P. D., Perry D. E., Turski W.M., “Metrics and Laws of Software Evolution – The Nineties View”, In proceedings of fourth Software Metrics Symposium, 1997

[22] R. Panko and R. Halverson Jr., "Spreadsheets on trial: A framework for research on spreadsheet risks", 29th Hawaii Intl. Conf .on System Sciences, Vol. II, Pages: 326-335, 1996.

[23] Parnas D.L., “Software aging”, In proceedings of 16th International Conference on Software Engineering, ICSE-16, Pages: 279-287, 16-21 May 1994.

[24] Garlan D., Shaw M., “An Introduction to Software Architecture”, Advances in Software Engineering and Knowledge Engineering, Volume: I, World Scientific Publishing Company, 1993.

[25] Release Notes available at: http://webcvs.kde.org/koffice/kspread/DESIGN.html?rev=1.4&view=log

Page 45: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

39

Appendix A – Details of Heuristic Analysis CUD and FIL Heuristics Calculations on Feature Sets Improvement from 1.1.1 to 1.2.0

Entity-based Code Structure using Dependencies (CUD)

Entity-based Code Structure using Layout (FIL)

Feature

EI P O PO Recall Precision P O PO Recall Precision

Fixed a small bug in Cell::calc to calculate non delayed, it is possible to do this with the cell it depends on

Ks_cell 19 8 6 0.75 0.31 5 8 0 0 0

paper layout becomes a property of sheet

Ks_Table 32 14 9 0.64 0.28 1 14 1 0.07 1

Some sort enhancement Ks_dlg_sort 1 2 1 0.5 1 24 2 0 0 0

Improved Data Consolidate with more choices: Sum, Average, Count, Min, Max, Product, Standard Deviation, Variance

Ks_dlg_cons 1 1 1 1 1 24 1 0 0 0

fixed some functions to be Excel-compatible (find, replace)

Ks_view 9 2 1 0.5 0.11 2

2 0 0 0

Relocated selection information to the View

Ks_view 9 6 2 0.33 0.22 2 6 0 0 0

Multiple views work for spreadsheets (change tables, select various parts of the sheet in each open view)

Ks_view 9 13 6 0.46 0.66 2 13 1 0.07 0.5

DCOP interface for a table has changed.

Ks_TableIface 2 10 1 0.1 0.5 2 10 1 0.1 0.5

Fix #42456: merged cells with centered text survive insert/delete rows

Ks_cell 13 3 3 1 0.23 7 3 1 0.33 0.14

Crash after "money format" fixed ( #45943)

Ks_view 9 6 2 0.33 0.22 2 6 0 0 0

Serious errors in formulas fixed (#46045).

Ks_inerpretor 2 7 2 0.28 1 2 7 0 0 0

Fix embedded chart changes titles when opening again

Ks_handler 1 3 1 0.33 1 3 3 0 0 0

Precision problem in calculations fixed (#40150).

Ks_cell 13 6 3 0.5 0.23 7 6 1 0.16 0.17

Formula editors tooltip problem fixed (#29524)

Ks_canvas 15 3 2 0.66 0.13 3 3 0 0 0

Crash on selecting validity fixed (#46530)

Ks_dlg_validity 1 3 1 0.33 1 24 3 0 0 0

Consolidate function result reference area selecting unintuitive (#45324)

Ks_dlg_cons 1 3 1 0.33 1 24 3 1 0.33 0.04

Page 46: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

40

CUD and FIL Heuristics Calculations on Feature Sets Improvement from 1.2.1 to 1.3.0

Entity-based Code Structure using Dependencies (CUD)

Entity-based Code Structure using Layout (FIL)

Feature

EI P O PO Recall Precision P O PO Recall Precision

enhanced the CSV import dlg to support "ignore double delimiters" and adjust cell width to imported cell content

Ks_dlg_csv 1 4 1 0.25 1 28 4 0 0 0

Goal Seek message fixes and it is possible to select the cells with the mouse

Ks_dlg_goalseek 1 3 0 0 0 28 3 0 0 0

Paper layout can now be applied to all sheets

Ks_table 37 13 5 0.38 0.13 6 13 1 0.07 0.16

Autoscroll while selecting available for columns and row headers

Ks_canvas 12 2 1 0.5 0.08 4 2 0 0 0

High resolution printing (600dpi)

Ks_doc 16 12 7 0.58 0.43 8 12 0 0 0

New direction mode when pressing Enter: possible to jump to the first cell of the next row

Ks_cell 19 4 3 0.75 0.15 6 4 2 0.5 0.33

more powerful conditional cell attributes (possible to assign a whole style if condition matches)

Ks_dlg_conditional 1 6 1 0.16 1 24 6 0 0 0

multiple steps undo/redo Ks_undo 12 4 4 1 0.33 2 4 0 0 0

fix problem where hidden sheet is simply appended in the tab bar when it is shown again

Ks_tabbar 1 3 0 0 0 24 3 0 0 0

fix bug #87369: ctrl+C in formula bar now copies contents

Ks_doc 16 15 9 0.6 0.56 8 15 2 0.13 0.25

fix condition cell attribute (multi condition)

Ks_dlg_conditional 1 9 1 0.11 1 24 9 1 0.11 0.04

fix enable/disable action into validation dialog box

Ks_dlg_validity 1 5 1 0.2 1 24 1 0 0 0

fix bug #77844: undo now works after deleting multiple cells

Ks_undo 12 5 4 0.8 0.33 2 5 0 0 0

indicator of chosen/selected cells is shown now (bug #58098)

Ks_canvas 12 9 2 0.22 0.16 4 9 1 0.11 0.25

Page 47: Investigating of Evolution for Spreadsheet Application: A ... · existing software systems [18]. According to the software system evolution, it is needed to understand and study the

41

Entity-based Code Structure using Dependencies (CUD)

Entity-based Code Structure using Layout (FIL)

Feature

EI P O PO Recall Precision P O PO Recall Precision

possible crash with conditional formatting prevented (#58713)

Ks_condition 1 6 1 0.16 1 8 6 0 0 0

Dependency problem with automatic recalculation fixed (#58097)

Ks_cell 19 2 1 0.5 0.05 6 2 0 0 0

unsorted zoom values fixed (#64154)

Ks_view 13 1 0 0 0 2 1 0 0 0

The calculator plugin saves its configuration now (#49954)

kcalc 2 14 2 0.14 1 4 14 2 0.14 0.5

conditions can be used for text and numbers now (was numbers only before)

Ks_condition 1 8 1 0.12 1 8 8 0 0 0