moose tutorial at wcre 2008
DESCRIPTION
I used this set of slides for the Moose tutorial at WCRE 2008TRANSCRIPT
forward engineering
actual development }
{
}
{
}
{
}
{}
{
}
{
}
{}
{
}
{
reve
rse e
ngin
eerin
g
built in Berne
built in Berne
used in several research groups
> 100 men years of effort
~ 150 publications
since 1997
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
}
{
}
{
}
{}
{
}
{
...McCabe = 21
LOC = 75
3,000
NOM = 102 classes select: #isGodClass
Metrics compress the system into numbers
NOMLOCTCCWMCCYCLOATFDHNL
NOCNOCmtsNOPAWLOCWNOCWOCMSG
DUPLINESNAINOANI...
Lanza, Marinescu 2006
Detection Strategies are metric-based queries to detect design flaws
METRIC 1 > Threshold 1
Rule 1
METRIC 2 < Threshold 2
Rule 2
AND Quality problem
Lanza, Marinescu 2006
Example: a God Class centralizes too much intelligence in the system
ATFD > FEW
Class uses directly more than a
few attributes of other classes
WMC ! VERY HIGH
Functional complexity of the
class is very high
TCC < ONE THIRD
Class cohesion is low
AND GodClass
Polymetric views show up to 5 metrics
Color metric
Width metric
Height metric
Position metrics
Lanza 2003
System Complexity shows class hierarchies
Lanza, Ducasse 2003
Class Blueprint shows class internals
Ducasse, Lanza 2005
Package Blueprint shows package usage
Ducasse etal 2007
Ducasse etal 2006
Distribution Map shows properties over structure
Semantic Clustering reveals implementation topics
user, run, load, message, file, buffer, util
property, AWT, edit, show, update, sp, set
start, buffer, end, text, length, line, count
action, box, component, event, button, layout, GUI
start, length, integer, end, number, pre, count
XML, dispatch, microstar, reader, XE, register, receive
current, buffer, idx, review, archive, endr, TAR
BSH, simple, invocation, assign, untype, general, arbitrary
maximum, label, link, item, code, put, vector
Kuhn etal 2006
Kuhn etal 2008
Software Map gives software space a meaning
Softwarenaut explores the package structure
Lungu etal 2006
Wettel, Lanza 2007
CodeCity shows where your code lives
Trace Signals reveal similar execution traces
Kuhn, Greevy 2006
Greevy etal 2006
Feature Views show how features cover classes
addFolder addPage
Greevy etal 2007
Feature Map shows relates features to code
Lienhard 2009
Object Flow captures object aliases
Lienhard 2009
Object Flow captures object aliases
Lienhard etal 2007
Object Flow shows how objects move
Object Dependencies reveal features dependencies
OpenConnect
Join ChannelSend Message
Lienhard etal 2007
Girba etal 2005
Hierarchy Evolution reveals evolution patterns
D’Ambros, Lanza 2006
Evolution Radar shows co-change relationships
Girba etal 2006
Ownership Map reveals patterns in CVS
Junker 2008
Kumpel shows how developers work on files
Balint etal 2006
Clone Evolution shows who copied from whom
}
{
}
{
}
{}
{
}
{
...McCabe = 21
LOC = 75
3,000
NOM = 102 classes select: #isGodClass
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
Class
Method
Attribute
Access
Inheritance
Package
Namespace
*
belongsTo
*
belongsTo
belongsTopackagedIn
superclass
subclass
* *
InvocationinvokedBy
candidate
*
accessedIn
*
accesses
*
*
*
*
FAMIX is a language independent meta-model
ClassVersion
SystemVersion
ClassVersion
ClassHistory
SystemVersion
ClassVersion
ClassHistory
SystemVersion
SystemHistory
ClassVersion
ClassHistory
SystemVersion
SystemHistory
VersionHistory
VersionHistory
Hismo is the history meta-model
Gîrba 2005
Hismo is the history meta-model
History
VersionHistory
VersionHistory
Version
Gîrba 2005
2 4 3 5 7
2 2 3 4 9
2 2 1 2 3
2 2 2 2 2
1 5 3 4 4
What changed? When did it change? ...
1 5 3 4 4
4 2 1 0+++ = 7=
LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-nEvolution ofNumber of Methods
LENOM(C)
Gîrba etal 2004
Gîrba etal 2004
1 5 3 4 4
LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-n
LENOM(C) 4 2-3 2 2-2 1 2-1 0 20+++ = 1.5=
EENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 22-i
Latest Evolution ofNumber of Methods
Earliest Evolution ofNumber of Methods
EENOM(C) 4 20 2 2-1 1 2-2 0 2-3+++ = 5.25=
ENOM LENOM EENOM
7 3.5 3.25
7 5.75 1.37
3 1 2
0 0 0
7 1.25 5.25
2 4 3 5 7
2 2 3 4 9
2 2 1 2 3
2 2 2 2 2
1 5 3 4 4
Gîrba etal 2004
ENOM LENOM EENOM
7 3.5 3.25
7 5.75 1.37
3 1 2
0 0 0
7 1.25 5.25
balanced changer
late changer
dead stable
early changer
Gîrba etal 2004
FAMIX
Class Method...
Dynamix
Activation... Instance
FAMIX
Class Method...
Dynamix
Activation... Instance
FAMIX
Class Method...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Hismo
Class
History
Method
History...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Hismo
Class
History
Method
History...
Subversion
File
History
File
Version...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
BugsLife
Bug Activity...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Dude
Duplication...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
BugsLife
Bug Activity...
ObjectFlow
Alias...
Dynamix
Activation... Instance
FAMIX
Class Method...
Dude
Duplication...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
BugsLife
Bug Activity...
ObjectFlow
Alias...
...
...
Dynamix
Activation... Instance
FAMIX Core
Class Method...
Dude
Duplication...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
BugsLife
Bug Activity...
ObjectFlow
Alias...
...
...
Dynamix
Activation... Instance
FAMIX Core
Class Method...
Dude
Duplication...
Hismo
Class
History
Method
History...
CVS
File
History
File
Version...
Subversion
File
History
File
Version...
BugsLife
Bug Activity...
ObjectFlow
Alias...
...
...
FAMIX is a family of meta-models
FM3 is the meta-meta-model
Kuhn, Verwaest 2008
FM3.ClassFM3.Package
name: String
fullName: String
FM3.Element
derived: Boolean
keyed: Boolean
multivalued: Boolean
FM3.Property
superclass opposite
type
extensions
MSE is the exchange format
Kuhn, Verwaest 2008
(FAMIX.Class (id: 100) (name 'Server') (container (ref: 82)) (isAbstract false) (isInterface false) (package (ref: 624)) (stub false) (NOM 9) (WLOC 124)) (FAMIX.Method (id: 101) (name 'accept') (signature 'accept(Visitor v)') (parentClass (ref: 100)) (accessControl 'public') (hasClassScope false) (stub false) (LOC 7) (CYCLO 3))
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
Meyer etal 2006
Mondrian scripts graph visualizations
view nodes: classes forEach: [ :each | view nodes: each methods. view gridLayout ].view edgesFrom: #superclass.view treeLayout.
Junker, Hofstetter 2007
EyeSee scripts charts
Wettel 2008
CodeCity scripts 3D visualizations
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
Repository Fame MondrianUIFAMIX EyeSee
MSE
Repository Fame MondrianUI
Smalltalk
FAMIX EyeSee
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX EyeSee
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX EyeSee
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX
HapaxDynaMooseChronia SmallDude
EyeSee
CodeCityYellow
Submarine
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX
HapaxDynaMooseChronia SmallDude
EyeSee
CVS
CodeCity
MSE SourceJ-Wiretap
Yellow
Submarine
SVN
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX
HapaxDynaMooseChronia SmallDude
EyeSee
CVS
CodeCity
MSE SourceJ-Wiretap
Yellow
Submarine
SVN
Softwarenaut BugsLife Clustering Metanool ...
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX
HapaxDynaMooseChronia SmallDude
EyeSee
CVS
CodeCity
MSE SourceJ-Wiretap
Yellow
Submarine
SVN
Softwarenaut BugsLife Clustering Metanool ...
Model
GUI
Model
Helpers GUI
Model
Murphy etal 1995
Helpers GUI
Model
Helpers
Model
Brühlmann etal 2008
Brühlmann etal 2008
MSE
Repository Fame MondrianUI
SmalltalkJava
C++
iPlasma
FAMIX
HapaxDynaMooseChronia SmallDude
EyeSee
CVS
CodeCity
MSE SourceJ-Wiretap
Yellow
Submarine
SVN
Softwarenaut BugsLife Clustering Metanool ...
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
INRIA Lille
Politehnica University of Timisoara
University of Berne
University of Lugano
Université Catholique de Louvain
Previous ContributorsCurrent Contributors
Previous TeamSerge DemeyerAdrian KuhnMichele LanzaSander Tichelaar
Current TeamStéphane DucasseTudor Gîrba
Michael MeerMichael MeyerLaura PonisioDaniel RatiuMatthias RiegerAzadeh RazavizadehAndreas SchlapbachDaniel SchweizerMauricio SeebergerLukas SteigerDaniele TalericoHerve VerjusVioleta VoinescuSara SellosLucas StreitRoel Wuyts
Jannik LavalAdrian LienhardMircea LunguOscar NierstraszDamien PolletJorge RessiaToon VerwaestRichard Wettel
Tobias AebiIlham AllouiGabriela ArevaloMihai BalintFrank BuchliThomas BühlerCalogero ButeraDaniel FreyGeorges GolomingiDavid GurtnerReinout HeeckMarkus HofstetterMarkus KobelMichael LocherMartin von LöwisPietro Malorgio
Hani AbdeenPhilipp BungeAlexandre BergelJohan BrichauMarco D’AmbrosSimon DenierOrla GreevyMatthias Junker
Previous ContributorsCurrent Contributors
Previous TeamSerge DemeyerAdrian KuhnMichele LanzaSander Tichelaar
Current TeamStéphane DucasseTudor Gîrba
Michael MeerMichael MeyerLaura PonisioDaniel RatiuMatthias RiegerAzadeh RazavizadehAndreas SchlapbachDaniel SchweizerMauricio SeebergerLukas SteigerDaniele TalericoHerve VerjusVioleta VoinescuSara SellosLucas StreitRoel Wuyts
Jannik LavalAdrian LienhardMircea LunguOscar NierstraszDamien PolletJorge RessiaToon VerwaestRichard Wettel
Tobias AebiIlham AllouiGabriela ArevaloMihai BalintFrank BuchliThomas BühlerCalogero ButeraDaniel FreyGeorges GolomingiDavid GurtnerReinout HeeckMarkus HofstetterMarkus KobelMichael LocherMartin von LöwisPietro Malorgio
Hani AbdeenPhilipp BungeAlexandre BergelJohan BrichauMarco D’AmbrosSimon DenierOrla GreevyMatthias Junker
> 100 men years
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration is an idea
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration is an idea
Scripting Visualizations with Mondrian
Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland
Abstract
Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.
1 Introduction
Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].
One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.
We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.
2 Mondrian by example
In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.
Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.
Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.
Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.
1
Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests
Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland
{lienhard, girba, greevy, oscar}@iam.unibe.ch
Abstract
Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.
Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing
1 IntroductionCreating automated tests for legacy systems is a key
maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].
The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.
Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].
Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.
In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.
To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].
Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-
1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).
Practical Object-Oriented Back-in-Time Debugging
Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerland
Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.
1 Introduction
When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].
Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].
The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.
Enriching Reverse Engineering withAnnotations!
Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/
Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.
1 Introduction
Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.
Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].
In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,
2008, pp. 660-674.
Semantic Clustering: Identifying Topics inSource Code
Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1
aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France
Abstract
Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.
Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.
Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:
Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”
Preprint submitted to Elsevier Science 11 October 2006
The Story of Moose: an Agile Reengineering Environment
Oscar NierstraszSoftware Composition Group
University of BerneSwitzerland
Stephane DucasseSoftware Composition Group
University of BerneSwitzerland
www.iam.unibe.ch/!scg
Tudor GırbaSoftware Composition Group
University of BerneSwitzerland
ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.
Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering
General TermsMeasurement, Design, Experimentation
KeywordsReverse engineering, Reengineering, Metrics, Visualization
1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-
tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].
The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].
What may not be clear from this very simplified picture isthat various kinds of documents are available to the software
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.
a z
xxx
yyy
Yyy
Xxx
z
Requirements
Code
Designs
model capture and analysis
problem assessment
migration
Figure 1: The Reengineering life cycle.
reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].
Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.
Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.
Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution
Scripting Visualizations with Mondrian
Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland
Abstract
Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.
1 Introduction
Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].
One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.
We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.
2 Mondrian by example
In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.
Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.
Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.
Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.
1
Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests
Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland
{lienhard, girba, greevy, oscar}@iam.unibe.ch
Abstract
Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.
Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing
1 IntroductionCreating automated tests for legacy systems is a key
maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].
The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.
Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].
Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.
In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.
To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].
Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-
1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).
Practical Object-Oriented Back-in-Time Debugging
Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerland
Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.
1 Introduction
When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].
Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].
The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.
Enriching Reverse Engineering withAnnotations!
Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/
Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.
1 Introduction
Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.
Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].
In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,
2008, pp. 660-674.
Semantic Clustering: Identifying Topics inSource Code
Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1
aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France
Abstract
Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.
Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.
Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:
Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”
Preprint submitted to Elsevier Science 11 October 2006
The Story of Moose: an Agile Reengineering Environment
Oscar NierstraszSoftware Composition Group
University of BerneSwitzerland
Stephane DucasseSoftware Composition Group
University of BerneSwitzerland
www.iam.unibe.ch/!scg
Tudor GırbaSoftware Composition Group
University of BerneSwitzerland
ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.
Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering
General TermsMeasurement, Design, Experimentation
KeywordsReverse engineering, Reengineering, Metrics, Visualization
1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-
tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].
The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].
What may not be clear from this very simplified picture isthat various kinds of documents are available to the software
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.
a z
xxx
yyy
Yyy
Xxx
z
Requirements
Code
Designs
model capture and analysis
problem assessment
migration
Figure 1: The Reengineering life cycle.
reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].
Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.
Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.
Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution
Research should be more than papers
Scripting Visualizations with Mondrian
Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland
Abstract
Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.
1 Introduction
Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].
One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.
We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.
2 Mondrian by example
In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.
Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.
Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.
Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.
1
Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests
Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland
{lienhard, girba, greevy, oscar}@iam.unibe.ch
Abstract
Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.
Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing
1 IntroductionCreating automated tests for legacy systems is a key
maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].
The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.
Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].
Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.
In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.
To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].
Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-
1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).
Practical Object-Oriented Back-in-Time Debugging
Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerland
Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.
1 Introduction
When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].
Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].
The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.
Enriching Reverse Engineering withAnnotations!
Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/
Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.
1 Introduction
Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.
Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].
In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,
2008, pp. 660-674.
Semantic Clustering: Identifying Topics inSource Code
Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1
aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France
Abstract
Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.
Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.
Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:
Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”
Preprint submitted to Elsevier Science 11 October 2006
The Story of Moose: an Agile Reengineering Environment
Oscar NierstraszSoftware Composition Group
University of BerneSwitzerland
Stephane DucasseSoftware Composition Group
University of BerneSwitzerland
www.iam.unibe.ch/!scg
Tudor GırbaSoftware Composition Group
University of BerneSwitzerland
ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.
Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering
General TermsMeasurement, Design, Experimentation
KeywordsReverse engineering, Reengineering, Metrics, Visualization
1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-
tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].
The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].
What may not be clear from this very simplified picture isthat various kinds of documents are available to the software
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.
a z
xxx
yyy
Yyy
Xxx
z
Requirements
Code
Designs
model capture and analysis
problem assessment
migration
Figure 1: The Reengineering life cycle.
reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].
Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.
Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.
Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution
Research should be more than paperscan
addFolder addPage
Research is a puzzle
Greevy etal 2007
Research is a puzzle
Wettel, Lanza 2008
Research should be opencan
Research should impact industrycan
is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration is an idea
moose.unibe.ch
Tudor Gîrbawww.tudorgirba.com
creativecommons.org/licenses/by/3.0/