moose tutorial at wcre 2008

107
Tudor Gîrba www.tudorgirba.com Moose Tutorial

Upload: tudor-girba

Post on 21-Jan-2015

2.625 views

Category:

Education


1 download

DESCRIPTION

I used this set of slides for the Moose tutorial at WCRE 2008

TRANSCRIPT

Page 1: Moose Tutorial at WCRE 2008

Tudor Gîrbawww.tudorgirba.com

Moose Tutorial

Page 2: Moose Tutorial at WCRE 2008

forward engineering

actual development }

{

}

{

}

{

}

{}

{

}

{

}

{}

{

}

{

reve

rse e

ngin

eerin

g

Page 3: Moose Tutorial at WCRE 2008

built in Berne

Page 4: Moose Tutorial at WCRE 2008

built in Berne

Page 5: Moose Tutorial at WCRE 2008

used in several research groups

> 100 men years of effort

~ 150 publications

since 1997

Page 6: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 7: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 8: Moose Tutorial at WCRE 2008

}

{

}

{

}

{}

{

}

{

...McCabe = 21

LOC = 75

3,000

NOM = 102 classes select: #isGodClass

Page 9: Moose Tutorial at WCRE 2008

Metrics compress the system into numbers

NOMLOCTCCWMCCYCLOATFDHNL

NOCNOCmtsNOPAWLOCWNOCWOCMSG

DUPLINESNAINOANI...

Page 10: Moose Tutorial at WCRE 2008

Lanza, Marinescu 2006

Detection Strategies are metric-based queries to detect design flaws

METRIC 1 > Threshold 1

Rule 1

METRIC 2 < Threshold 2

Rule 2

AND Quality problem

Page 11: Moose Tutorial at WCRE 2008

Lanza, Marinescu 2006

Example: a God Class centralizes too much intelligence in the system

ATFD > FEW

Class uses directly more than a

few attributes of other classes

WMC ! VERY HIGH

Functional complexity of the

class is very high

TCC < ONE THIRD

Class cohesion is low

AND GodClass

Page 12: Moose Tutorial at WCRE 2008
Page 13: Moose Tutorial at WCRE 2008

Polymetric views show up to 5 metrics

Color metric

Width metric

Height metric

Position metrics

Lanza 2003

Page 14: Moose Tutorial at WCRE 2008

System Complexity shows class hierarchies

Lanza, Ducasse 2003

Page 15: Moose Tutorial at WCRE 2008

Class Blueprint shows class internals

Ducasse, Lanza 2005

Page 16: Moose Tutorial at WCRE 2008

Package Blueprint shows package usage

Ducasse etal 2007

Page 17: Moose Tutorial at WCRE 2008

Ducasse etal 2006

Distribution Map shows properties over structure

Page 18: Moose Tutorial at WCRE 2008

Semantic Clustering reveals implementation topics

user, run, load, message, file, buffer, util

property, AWT, edit, show, update, sp, set

start, buffer, end, text, length, line, count

action, box, component, event, button, layout, GUI

start, length, integer, end, number, pre, count

XML, dispatch, microstar, reader, XE, register, receive

current, buffer, idx, review, archive, endr, TAR

BSH, simple, invocation, assign, untype, general, arbitrary

maximum, label, link, item, code, put, vector

Kuhn etal 2006

Page 19: Moose Tutorial at WCRE 2008

Kuhn etal 2008

Software Map gives software space a meaning

Page 20: Moose Tutorial at WCRE 2008

Softwarenaut explores the package structure

Lungu etal 2006

Page 21: Moose Tutorial at WCRE 2008

Wettel, Lanza 2007

CodeCity shows where your code lives

Page 22: Moose Tutorial at WCRE 2008

Trace Signals reveal similar execution traces

Kuhn, Greevy 2006

Page 23: Moose Tutorial at WCRE 2008

Greevy etal 2006

Feature Views show how features cover classes

addFolder addPage

Page 24: Moose Tutorial at WCRE 2008

Greevy etal 2007

Feature Map shows relates features to code

Page 25: Moose Tutorial at WCRE 2008

Lienhard 2009

Object Flow captures object aliases

Page 26: Moose Tutorial at WCRE 2008

Lienhard 2009

Object Flow captures object aliases

Page 27: Moose Tutorial at WCRE 2008

Lienhard etal 2007

Object Flow shows how objects move

Page 28: Moose Tutorial at WCRE 2008

Object Dependencies reveal features dependencies

OpenConnect

Join ChannelSend Message

Lienhard etal 2007

Page 29: Moose Tutorial at WCRE 2008

Girba etal 2005

Hierarchy Evolution reveals evolution patterns

Page 30: Moose Tutorial at WCRE 2008

D’Ambros, Lanza 2006

Evolution Radar shows co-change relationships

Page 31: Moose Tutorial at WCRE 2008

Girba etal 2006

Ownership Map reveals patterns in CVS

Page 32: Moose Tutorial at WCRE 2008

Junker 2008

Kumpel shows how developers work on files

Page 33: Moose Tutorial at WCRE 2008

Balint etal 2006

Clone Evolution shows who copied from whom

Page 34: Moose Tutorial at WCRE 2008

}

{

}

{

}

{}

{

}

{

...McCabe = 21

LOC = 75

3,000

NOM = 102 classes select: #isGodClass

Page 35: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 36: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 37: Moose Tutorial at WCRE 2008

Class

Method

Attribute

Access

Inheritance

Package

Namespace

*

belongsTo

*

belongsTo

belongsTopackagedIn

superclass

subclass

* *

InvocationinvokedBy

candidate

*

accessedIn

*

accesses

*

*

*

*

FAMIX is a language independent meta-model

Page 38: Moose Tutorial at WCRE 2008

ClassVersion

SystemVersion

Page 39: Moose Tutorial at WCRE 2008

ClassVersion

ClassHistory

SystemVersion

Page 40: Moose Tutorial at WCRE 2008

ClassVersion

ClassHistory

SystemVersion

SystemHistory

Page 41: Moose Tutorial at WCRE 2008

ClassVersion

ClassHistory

SystemVersion

SystemHistory

Page 42: Moose Tutorial at WCRE 2008

VersionHistory

VersionHistory

Hismo is the history meta-model

Gîrba 2005

Page 43: Moose Tutorial at WCRE 2008

Hismo is the history meta-model

History

VersionHistory

VersionHistory

Version

Gîrba 2005

Page 44: Moose Tutorial at WCRE 2008

2 4 3 5 7

2 2 3 4 9

2 2 1 2 3

2 2 2 2 2

1 5 3 4 4

What changed? When did it change? ...

Page 45: Moose Tutorial at WCRE 2008

1 5 3 4 4

4 2 1 0+++ = 7=

LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-nEvolution ofNumber of Methods

LENOM(C)

Gîrba etal 2004

Page 46: Moose Tutorial at WCRE 2008

Gîrba etal 2004

1 5 3 4 4

LENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 2i-n

LENOM(C) 4 2-3 2 2-2 1 2-1 0 20+++ = 1.5=

EENOM(C) = ∑ |NOMi(C)-NOMi-1(C)| 22-i

Latest Evolution ofNumber of Methods

Earliest Evolution ofNumber of Methods

EENOM(C) 4 20 2 2-1 1 2-2 0 2-3+++ = 5.25=

Page 47: Moose Tutorial at WCRE 2008

ENOM LENOM EENOM

7 3.5 3.25

7 5.75 1.37

3 1 2

0 0 0

7 1.25 5.25

2 4 3 5 7

2 2 3 4 9

2 2 1 2 3

2 2 2 2 2

1 5 3 4 4

Gîrba etal 2004

Page 48: Moose Tutorial at WCRE 2008

ENOM LENOM EENOM

7 3.5 3.25

7 5.75 1.37

3 1 2

0 0 0

7 1.25 5.25

balanced changer

late changer

dead stable

early changer

Gîrba etal 2004

Page 49: Moose Tutorial at WCRE 2008

FAMIX

Class Method...

Page 50: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Page 51: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

ObjectFlow

Alias...

Page 52: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Hismo

Class

History

Method

History...

ObjectFlow

Alias...

Page 53: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Hismo

Class

History

Method

History...

Subversion

File

History

File

Version...

ObjectFlow

Alias...

Page 54: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

ObjectFlow

Alias...

Page 55: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

BugsLife

Bug Activity...

ObjectFlow

Alias...

Page 56: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Dude

Duplication...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

BugsLife

Bug Activity...

ObjectFlow

Alias...

Page 57: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX

Class Method...

Dude

Duplication...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

BugsLife

Bug Activity...

ObjectFlow

Alias...

...

...

Page 58: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX Core

Class Method...

Dude

Duplication...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

BugsLife

Bug Activity...

ObjectFlow

Alias...

...

...

Page 59: Moose Tutorial at WCRE 2008

Dynamix

Activation... Instance

FAMIX Core

Class Method...

Dude

Duplication...

Hismo

Class

History

Method

History...

CVS

File

History

File

Version...

Subversion

File

History

File

Version...

BugsLife

Bug Activity...

ObjectFlow

Alias...

...

...

FAMIX is a family of meta-models

Page 60: Moose Tutorial at WCRE 2008

FM3 is the meta-meta-model

Kuhn, Verwaest 2008

FM3.ClassFM3.Package

name: String

fullName: String

FM3.Element

derived: Boolean

keyed: Boolean

multivalued: Boolean

FM3.Property

superclass opposite

type

extensions

Page 61: Moose Tutorial at WCRE 2008

MSE is the exchange format

Kuhn, Verwaest 2008

(FAMIX.Class (id: 100) (name 'Server') (container (ref: 82)) (isAbstract false) (isInterface false) (package (ref: 624)) (stub false) (NOM 9) (WLOC 124)) (FAMIX.Method (id: 101) (name 'accept') (signature 'accept(Visitor v)') (parentClass (ref: 100)) (accessControl 'public') (hasClassScope false) (stub false) (LOC 7) (CYCLO 3))

Page 62: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 63: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 64: Moose Tutorial at WCRE 2008

Meyer etal 2006

Mondrian scripts graph visualizations

view nodes: classes forEach: [ :each | view nodes: each methods. view gridLayout ].view edgesFrom: #superclass.view treeLayout.

Page 65: Moose Tutorial at WCRE 2008

Junker, Hofstetter 2007

EyeSee scripts charts

Page 66: Moose Tutorial at WCRE 2008

Wettel 2008

CodeCity scripts 3D visualizations

Page 67: Moose Tutorial at WCRE 2008
Page 68: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 69: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 70: Moose Tutorial at WCRE 2008

Repository Fame MondrianUIFAMIX EyeSee

Page 71: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

Smalltalk

FAMIX EyeSee

Page 72: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX EyeSee

Page 73: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX EyeSee

Page 74: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX

HapaxDynaMooseChronia SmallDude

EyeSee

CodeCityYellow

Submarine

Page 75: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX

HapaxDynaMooseChronia SmallDude

EyeSee

CVS

CodeCity

MSE SourceJ-Wiretap

Yellow

Submarine

SVN

Page 76: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX

HapaxDynaMooseChronia SmallDude

EyeSee

CVS

CodeCity

MSE SourceJ-Wiretap

Yellow

Submarine

SVN

Softwarenaut BugsLife Clustering Metanool ...

Page 77: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX

HapaxDynaMooseChronia SmallDude

EyeSee

CVS

CodeCity

MSE SourceJ-Wiretap

Yellow

Submarine

SVN

Softwarenaut BugsLife Clustering Metanool ...

Page 78: Moose Tutorial at WCRE 2008
Page 79: Moose Tutorial at WCRE 2008

Model

Page 80: Moose Tutorial at WCRE 2008

GUI

Model

Page 81: Moose Tutorial at WCRE 2008

Helpers GUI

Model

Page 82: Moose Tutorial at WCRE 2008

Murphy etal 1995

Helpers GUI

Model

Page 83: Moose Tutorial at WCRE 2008
Page 84: Moose Tutorial at WCRE 2008

Helpers

Model

Page 85: Moose Tutorial at WCRE 2008

Brühlmann etal 2008

Page 86: Moose Tutorial at WCRE 2008

Brühlmann etal 2008

Page 87: Moose Tutorial at WCRE 2008

MSE

Repository Fame MondrianUI

SmalltalkJava

C++

iPlasma

FAMIX

HapaxDynaMooseChronia SmallDude

EyeSee

CVS

CodeCity

MSE SourceJ-Wiretap

Yellow

Submarine

SVN

Softwarenaut BugsLife Clustering Metanool ...

Page 88: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 89: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 90: Moose Tutorial at WCRE 2008

INRIA Lille

Politehnica University of Timisoara

University of Berne

University of Lugano

Université Catholique de Louvain

Page 91: Moose Tutorial at WCRE 2008

Previous ContributorsCurrent Contributors

Previous TeamSerge DemeyerAdrian KuhnMichele LanzaSander Tichelaar

Current TeamStéphane DucasseTudor Gîrba

Michael MeerMichael MeyerLaura PonisioDaniel RatiuMatthias RiegerAzadeh RazavizadehAndreas SchlapbachDaniel SchweizerMauricio SeebergerLukas SteigerDaniele TalericoHerve VerjusVioleta VoinescuSara SellosLucas StreitRoel Wuyts

Jannik LavalAdrian LienhardMircea LunguOscar NierstraszDamien PolletJorge RessiaToon VerwaestRichard Wettel

Tobias AebiIlham AllouiGabriela ArevaloMihai BalintFrank BuchliThomas BühlerCalogero ButeraDaniel FreyGeorges GolomingiDavid GurtnerReinout HeeckMarkus HofstetterMarkus KobelMichael LocherMartin von LöwisPietro Malorgio

Hani AbdeenPhilipp BungeAlexandre BergelJohan BrichauMarco D’AmbrosSimon DenierOrla GreevyMatthias Junker

Page 92: Moose Tutorial at WCRE 2008

Previous ContributorsCurrent Contributors

Previous TeamSerge DemeyerAdrian KuhnMichele LanzaSander Tichelaar

Current TeamStéphane DucasseTudor Gîrba

Michael MeerMichael MeyerLaura PonisioDaniel RatiuMatthias RiegerAzadeh RazavizadehAndreas SchlapbachDaniel SchweizerMauricio SeebergerLukas SteigerDaniele TalericoHerve VerjusVioleta VoinescuSara SellosLucas StreitRoel Wuyts

Jannik LavalAdrian LienhardMircea LunguOscar NierstraszDamien PolletJorge RessiaToon VerwaestRichard Wettel

Tobias AebiIlham AllouiGabriela ArevaloMihai BalintFrank BuchliThomas BühlerCalogero ButeraDaniel FreyGeorges GolomingiDavid GurtnerReinout HeeckMarkus HofstetterMarkus KobelMichael LocherMartin von LöwisPietro Malorgio

Hani AbdeenPhilipp BungeAlexandre BergelJohan BrichauMarco D’AmbrosSimon DenierOrla GreevyMatthias Junker

> 100 men years

Page 93: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration

Page 94: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration is an idea

Page 95: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration is an idea

Page 96: Moose Tutorial at WCRE 2008

Scripting Visualizations with Mondrian

Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland

Abstract

Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.

1 Introduction

Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].

One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.

We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.

2 Mondrian by example

In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.

Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.

Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.

Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.

1

Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests

Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland

{lienhard, girba, greevy, oscar}@iam.unibe.ch

Abstract

Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.

Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing

1 IntroductionCreating automated tests for legacy systems is a key

maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].

The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.

Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].

Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.

In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.

To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].

Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-

1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).

Practical Object-Oriented Back-in-Time Debugging

Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerland

Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.

1 Introduction

When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].

Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].

The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.

Enriching Reverse Engineering withAnnotations!

Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/

Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.

1 Introduction

Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.

Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].

In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,

2008, pp. 660-674.

Semantic Clustering: Identifying Topics inSource Code

Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1

aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France

Abstract

Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.

Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.

Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:

Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”

Preprint submitted to Elsevier Science 11 October 2006

The Story of Moose: an Agile Reengineering Environment

Oscar NierstraszSoftware Composition Group

University of BerneSwitzerland

Stephane DucasseSoftware Composition Group

University of BerneSwitzerland

www.iam.unibe.ch/!scg

Tudor GırbaSoftware Composition Group

University of BerneSwitzerland

ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.

Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering

General TermsMeasurement, Design, Experimentation

KeywordsReverse engineering, Reengineering, Metrics, Visualization

1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-

tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].

The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].

What may not be clear from this very simplified picture isthat various kinds of documents are available to the software

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.

a z

xxx

yyy

Yyy

Xxx

z

Requirements

Code

Designs

model capture and analysis

problem assessment

migration

Figure 1: The Reengineering life cycle.

reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].

Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.

Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.

Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution

Page 97: Moose Tutorial at WCRE 2008

Scripting Visualizations with Mondrian

Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland

Abstract

Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.

1 Introduction

Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].

One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.

We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.

2 Mondrian by example

In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.

Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.

Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.

Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.

1

Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests

Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland

{lienhard, girba, greevy, oscar}@iam.unibe.ch

Abstract

Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.

Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing

1 IntroductionCreating automated tests for legacy systems is a key

maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].

The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.

Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].

Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.

In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.

To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].

Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-

1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).

Practical Object-Oriented Back-in-Time Debugging

Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerland

Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.

1 Introduction

When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].

Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].

The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.

Enriching Reverse Engineering withAnnotations!

Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/

Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.

1 Introduction

Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.

Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].

In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,

2008, pp. 660-674.

Semantic Clustering: Identifying Topics inSource Code

Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1

aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France

Abstract

Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.

Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.

Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:

Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”

Preprint submitted to Elsevier Science 11 October 2006

The Story of Moose: an Agile Reengineering Environment

Oscar NierstraszSoftware Composition Group

University of BerneSwitzerland

Stephane DucasseSoftware Composition Group

University of BerneSwitzerland

www.iam.unibe.ch/!scg

Tudor GırbaSoftware Composition Group

University of BerneSwitzerland

ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.

Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering

General TermsMeasurement, Design, Experimentation

KeywordsReverse engineering, Reengineering, Metrics, Visualization

1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-

tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].

The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].

What may not be clear from this very simplified picture isthat various kinds of documents are available to the software

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.

a z

xxx

yyy

Yyy

Xxx

z

Requirements

Code

Designs

model capture and analysis

problem assessment

migration

Figure 1: The Reengineering life cycle.

reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].

Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.

Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.

Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution

Research should be more than papers

Page 98: Moose Tutorial at WCRE 2008

Scripting Visualizations with Mondrian

Michael Meyer and Tudor GırbaSoftware Composition Group, University of Berne, Switzerland

Abstract

Most visualization tools focus on a finite set of dedicatedvisualizations that are adjustable via a user-interface. Inthis demo, we present Mondrian, a new visualization enginedesigned to minimize the time-to-solution. We achieve thisby working directly on the underlying data, by making nest-ing an integral part of the model and by defining a powerfulscripting language that can be used to define visualizations.We support exploring data in an interactive way by provid-ing hooks for various events. Users can register actions forthese events in the visualization script.

1 Introduction

Visualization is an established tool to reason about data.Given a wanted visualization, we can typically find toolsthat take as input a certain format and that provide theneeded visualization [4].

One drawback of the approach is that, when a deep rea-soning is required, we need to refer back to the capabili-ties of the original tool that manipulates the original data.Another drawback is that it actually duplicates the requiredresources unnecessarily: the data is present both in the orig-inal tool, and in the visualization tool. Several tools take amiddle ground approach and choose to work close with thedata by either offering integration with other services [1],or providing the services themselves [2]. However, whenanother type of service is required, the integration is lost.

We present Mondrian, a visualization engine that imple-ments a radically different approach. Instead of provid-ing a required data format, we provide a simple interfacethrough which the programmer can easily script the visu-alization in a declarative fashion (more information can befound in [3]). That is, our solution works directly with theobjects in the data model, and instead of duplicating the ob-jects by model transformation, we transform the messagessent to the original objects via meta-model transformations.

2 Mondrian by example

In this section we give a simple step-by-step example ofhow to script a visualization using Mondrian. The examplebuilds on a small model of a source code with 32 classes.The task we propose is to provide a on overview of the hi-erarchies.

Creating a view and adding nodes. Suppose we canask the model object for the classes. We can add thoseclasses to a newly created view by creating a node for eachclass, where each node is represented as a Rectangle. In thecase above, NOA, NOM and LOC are methods in the objectrepresenting a class and return the value of the correspond-ing metric.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view open.

Adding edges and layouting. To show how classes in-herit from each other, we can add an edge for each inheri-tance relationship. In our example, supposing that we canask the model for all the inheritance objects, and given aninheritance object, we will create an edge between the nodeholding the superclass and the node holding the subclass.We layout the nodes in a tree.view := ViewRenderer new.view newShape rectangle; width: #NOA; height: #NOM; linearColor: #LOC within: model classes; withBorder.view nodes: model classes.view edges: model inheritances from: #superclass to: #subclass.view treeLayout.view open.

Nesting. To obtain more details for the classes, wewould like to see which are the methods inside. To nest wespecify for each node the view that goes inside. Supposingthat we can ask each class in the model about its methods,we can add those methods to the class by specifying theview for each class.

1

Test Blueprints — Exposing Side Effects inExecution Traces to Support Writing Unit Tests

Adrian Lienhard*, Tudor Gırba, Orla Greevy and Oscar NierstraszSoftware Composition Group, University of Bern, Switzerland

{lienhard, girba, greevy, oscar}@iam.unibe.ch

Abstract

Writing unit tests for legacy systems is a key maintenancetask. When writing tests for object-oriented programs, ob-jects need to be set up and the expected effects of executingthe unit under test need to be verified. If developers lackinternal knowledge of a system, the task of writing tests isnon-trivial. To address this problem, we propose an ap-proach that exposes side effects detected in example runs ofthe system and uses these side effects to guide the developerwhen writing tests. We introduce a visualization called TestBlueprint, through which we identify what the required fix-ture is and what assertions are needed to verify the correctbehavior of a unit under test. The dynamic analysis tech-nique that underlies our approach is based on both tracingmethod executions and on tracking the flow of objects atruntime. To demonstrate the usefulness of our approach wepresent results from two case studies.

Keywords: Dynamic Analysis, Object Flow Analysis,Software Maintenance, Unit Testing

1 IntroductionCreating automated tests for legacy systems is a key

maintenance task [9]. Tests are used to assess if legacy be-havior has been preserved after performing modifications orextensions to the code. Unit testing (i.e., tests based on theXUnit frameworks [1]) is an established and widely usedtesting technique. It is now generally recognized as an es-sential phase in the software development life cycle to en-sure software quality, as it can lead to early detection ofdefects, even if they are subtle and well hidden [2].

The task of writing a unit test involves (i) choosing anappropriate program unit, (ii) creating a fixture, (iii) execut-ing the unit under test within the context of the fixture, and(iv) verifying the expected behavior of the unit using asser-tions [1]. All these actions require detailed knowledge ofthe system. Therefore, the task of writing unit tests mayprove difficult as developers are often faced with unfamiliarlegacy systems.

Implementing a fixture and all the relevant assertions re-quired can be challenging if the code is the only source ofinformation. One reason is that the gap between static struc-ture and runtime behavior is particularly large with object-oriented programs. Side effects1 make program behaviormore difficult to predict. Often, encapsulation and complexchains of method executions hide where side effects are pro-duced [2]. Developers usually resort to using debuggers toobtain detailed information about the side effects, but thisimplies low level manual analysis that is tedious and timeconsuming [25].

Thus, the underlying research question of the work wepresent in this paper is: how can we support developersfaced with the task of writing unit tests for unfamiliar legacycode? The approach we propose is based on analyzing run-time executions of a program. Parts of a program execu-tion, selected by the developer, serve as examples for newunit tests. Rather than manually stepping through the ex-ecution with a debugger, we perform dynamic analysis toderive information to support the task of writing tests with-out requiring a detailed understanding of the source code.

In our experimental tool, we present a visual represen-tation of the dynamic information in a diagram similar tothe UML object diagram [11]. We call this diagram a TestBlueprint as it serves as a plan for implementing a test. Itreveals the minimal required fixture and the side effects thatare produced during the execution of a particular programunit. Thus, the Test Blueprint reveals the exact informationthat should be verified with a corresponding test.

To generate a Test Blueprint, we need to accurately an-alyze object usage, object reference transfers, and the sideeffects that are produced as a result of a program execution.To do so, we perform a dynamic Object Flow Analysis inconjunction with conventional execution tracing [17].

Object Flow Analysis is a novel dynamic analysis whichtracks the transfer of object references in a program execu-tion. In previous work, we demonstrated how we success-

1We refer to side effect as the program state modifications produced bya behavior. We consider the term program state to be limited to the scopeof the application under analysis (i.e., excluding socket or display updates).

Practical Object-Oriented Back-in-Time Debugging

Adrian Lienhard, Tudor Gırba and Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerland

Abstract. Back-in-time debuggers are extremely useful tools for identifying thecauses of bugs, as they allow us to inspect the past states of objects that are nolonger present in the current execution stack. Unfortunately the “omniscient” ap-proaches that try to remember all previous states are impractical because theyeither consume too much space or they are far too slow. Several approaches relyon heuristics to limit these penalties, but they ultimately end up throwing outtoo much relevant information. In this paper we propose a practical approachto back-in-time debugging that attempts to keep track of only the relevant pastdata. In contrast to other approaches, we keep object history information togetherwith the regular objects in the application memory. Although seemingly counter-intuitive, this approach has the effect that past data that is not reachable from cur-rent application objects (and hence, no longer relevant) is automatically garbagecollected. In this paper we describe the technical details of our approach, andwe present benchmarks that demonstrate that memory consumption stays withinpractical bounds. Furthermore since our approach works at the virtual machinelevel, the performance penalty is significantly less than with other approaches.

1 Introduction

When debugging object-oriented systems, the hardest task is to find the actual rootcause of the failure as this can be far from where the bug actually manifests itself [1].In a recent study, Liblit et al. examined bug symptoms for various programs and foundthat in 50% of the cases the execution stack contains essentially no information aboutthe bug’s cause [2].

Classical debuggers are not always up to the task, since they only provide access toinformation that is still in the run-time stack. In particular, the information needed totrack down these difficult bugs includes (1) how an object reference got here, and (2)the previous values of an object’s fields. For this reason it is helpful to have previous ob-ject states and object reference flow information at hand during debugging. Techniquesand tools like back-in-time debuggers, which allow one to inspect previous programstates and step backwards in the control flow, have gained increasing attention recently[3,4,5,6].

The ideal support for a back-in-time debugger is provided by an omniscient imple-mentation that remembers the complete object history, but such solutions are imprac-tical because they generate enormous amounts of information. Storing the data to diskinstead of keeping it in memory can alleviate the problem, but it only postpones theend, and it has the drawback of further increasing the runtime overhead. Current imple-mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor100 or more for non-trivial programs.

Enriching Reverse Engineering withAnnotations!

Andrea Bruhlmann, Tudor Gırba, Orla Greevy, Oscar Nierstrasz

Software Composition Group, University of Bern, Switzerlandhttp://scg.unibe.ch/

Abstract. Much of the knowledge about software systems is implicit,and therefore di!cult to recover by purely automated techniques. Archi-tectural layers and the externally visible features of software systems aretwo examples of information that can be di!cult to detect from sourcecode alone, and that would benefit from additional human knowledge.Typical approaches to reasoning about data involve encoding an explicitmeta-model and expressing analyses at that level. Due to its informal na-ture, however, human knowledge can be di!cult to characterize up-frontand integrate into such a meta-model. We propose a generic, annotation-based approach to capture such knowledge during the reverse engineeringprocess. Annotation types can be iteratively defined, refined and trans-formed, without requiring a fixed meta-model to be defined in advance.We show how our approach supports reverse engineering by implement-ing it in a tool called Metanool and by applying it to (i) analyzing archi-tectural layering, (ii) tracking reengineering tasks, (iii) detecting designflaws, and (iv) analyzing features.

1 Introduction

Most reverse engineering techniques focus on automatically extracting infor-mation from the source code without taking external human knowledge intoconsideration. More often than not however, important external information isavailable (e.g., developer knowledge or domain specific knowledge) which wouldgreatly enhance analyses if it could be taken into account.

Only few reverse engineering approaches integrate such external human knowl-edge into the analysis. For example, reflexion models have been proposed for ar-chitecture recovery by capturing developer knowledge and then manually map-ping this knowledge to the source code [1,2]. Another example is provided byIntensional Views which make use of rules that encode external constraints andare checked against the actual source code [3].

In this paper we propose a generic framework based on annotations to en-hance a reverse engineered model with external knowledge so that automaticanalyses can take this knowledge into account. A key feature of our approach! Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,

2008, pp. 660-674.

Semantic Clustering: Identifying Topics inSource Code

Adrian Kuhn a,1 Stephane Ducasse b,2 Tudor Gırba a,1

aSoftware Composition Group, University of Berne, SwitzerlandbLanguage and Software Evolution Group, LISTIC, Universite de Savoie, France

Abstract

Many of the existing approaches in Software Comprehension focus on program pro-gram structure or external documentation. However, by analyzing formal informa-tion the informal semantics contained in the vocabulary of source code are over-looked. To understand software as a whole, we need to enrich software analysis withthe developer knowledge hidden in the code naming. This paper proposes the useof information retrieval to exploit linguistic information found in source code, suchas identifier names and comments. We introduce Semantic Clustering, a techniquebased on Latent Semantic Indexing and clustering to group source artifacts that usesimilar vocabulary. We call these groups semantic clusters and we interpret themas linguistic topics that reveal the intention of the code. We compare the topicsto each other, identify links between them, provide automatically retrieved labels,and use a visualization to illustrate how they are distributed over the system. Ourapproach is language independent as it works at the level of identifier names. Tovalidate our approach we applied it on several case studies, two of which we presentin this paper.

Note: Some of the visualizations presented make heavy use of colors. Please obtaina color copy of the article for better understanding.

Key words: reverse engineering, clustering, latent semantic indexing, visualizationPACS:

Email addresses: [email protected] (Adrian Kuhn), [email protected](Stephane Ducasse), [email protected] (Tudor Gırba).1 We gratefully acknowledge the financial support of the Swiss National ScienceFoundation for the project “Recast: Evolution of Object-Oriented Applications”(SNF 2000-061655.00/1)2 We gratefully acknowledge the financial support of the french ANR for the project“Cook: Rearchitecturisation des applications a objets”

Preprint submitted to Elsevier Science 11 October 2006

The Story of Moose: an Agile Reengineering Environment

Oscar NierstraszSoftware Composition Group

University of BerneSwitzerland

Stephane DucasseSoftware Composition Group

University of BerneSwitzerland

www.iam.unibe.ch/!scg

Tudor GırbaSoftware Composition Group

University of BerneSwitzerland

ABSTRACTMoose is a language-independent environment for reverse-and re-engineering complex software systems. Moose pro-vides a set of services including a common meta-model, met-rics evaluation and visualization, a model repository, andgeneric GUI support for querying, browsing and grouping.The development e!ort invested in Moose has paid o! inprecisely those research activities that benefit from applyinga combination of complementary techniques. We describehow Moose has evolved over the years, we draw a numberof lessons learned from our experience, and we outline thepresent and future of Moose.

Categories and Subject DescriptorsD.2.7 [Software Engineering]: Maintenance—Restructur-ing, reverse engineering, and reengineering

General TermsMeasurement, Design, Experimentation

KeywordsReverse engineering, Reengineering, Metrics, Visualization

1. INTRODUCTIONSoftware systems need to evolve continuously to be e!ec-

tive [41]. As systems evolve, their structure decays, unlesse!ort is undertaken to reengineer them [41, 44, 23, 11].

The reengineering process comprises various activities, in-cluding model capture and analysis (i.e., reverse engineer-ing), assessment of problems to be repaired, and migrationfrom the legacy software towards the reengineered system.Although in practice this is an ongoing and iterative process,we can idealize it (see Figure 1) as a transformation throughvarious abstraction layers from legacy code towards a newsystem [11, 13, 35].

What may not be clear from this very simplified picture isthat various kinds of documents are available to the software

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September5–9, 2005, Lisbon, Portugal.Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.

a z

xxx

yyy

Yyy

Xxx

z

Requirements

Code

Designs

model capture and analysis

problem assessment

migration

Figure 1: The Reengineering life cycle.

reengineer. In addition to the code base, there may be doc-umentation (though often out of sync with the code), bugreports, tests and test data, database schemas, and espe-cially the version history of the code base. Other importantsources of information include the various stakeholders (i.e.,users, developers, maintainers, etc.), and the running systemitself. The reengineer will neither rely on a single source ofinformation, nor on a single technique for extracting andanalyzing that information [11].

Reengineering is a complex task, and it usually involvesseveral techniques. The more data we have at hand, themore techniques we require to apply to understand this data.These techniques range from data mining, to data presen-tation and to data manipulation. Di!erent techniques areimplemented in di!erent tools, by di!erent people. An in-frastructure is needed for integrating all these tools.

Moose is a reengineering environment that o!ers a com-mon infrastructure for various reverse- and re-engineeringtools [22]. At the core of Moose is a common meta-modelfor representing software systems in a language-independentway. Around this core are provided various services thatare available to the di!erent tools. These services includemetrics evaluation and visualization, a repository for storingmultiple models, a meta-meta model for tailoring the Moosemeta-model, and a generic GUI for browsing, querying andgrouping.

Moose has been developed over nearly ten years, and hasitself been extensively reengineered during the time that ithas evolved. Initially Moose was little more than a com-mon meta-model for integrating various ad hoc tools. As itbecame apparent that these tools would benefit immenselyfrom a common infrastructure, we invested in the evolution

Research should be more than paperscan

Page 99: Moose Tutorial at WCRE 2008

addFolder addPage

Research is a puzzle

Page 100: Moose Tutorial at WCRE 2008

Greevy etal 2007

Page 101: Moose Tutorial at WCRE 2008

Research is a puzzle

Page 102: Moose Tutorial at WCRE 2008

Wettel, Lanza 2008

Page 103: Moose Tutorial at WCRE 2008

Research should be opencan

Page 104: Moose Tutorial at WCRE 2008

Research should impact industrycan

Page 105: Moose Tutorial at WCRE 2008

is an analysis tool

is a modeling platform

is a visualization platform

is a tool building platform

is a collaboration is an idea

Page 106: Moose Tutorial at WCRE 2008

moose.unibe.ch