mdsheet: a framework for model-driven spreadsheet engineering · spreadsheet evolution refinement...

4
MDSheet: A Framework for Model-Driven Spreadsheet Engineering acome Cunha * , Jo˜ ao Paulo Fernandes *† Jorge Mendes * , Jo˜ ao Saraiva * * HASLab / INESC TEC, Universidade do Minho, Portugal Departamento de Engenharia Inform´ atica, Universidade do Porto, Portugal {jacome,jpaulo,jorgemendes,jas}@di.uminho.pt Abstract—In this paper, we present MDSHEET, a framework for the embedding, evolution and inference of spreadsheet models. This framework offers a model-driven software de- velopment mechanism for spreadsheet users. Keywords-Spreadsheets; Model-Driven Engineering (MDE); Software Evolution; Embedded DSLs; Model Inference I. I NTRODUCTION Spreadsheets are widely used by non-professional pro- grammers, the so-called end users, to develop business applications. Spreadsheet systems offer end users a high level of flexibility, making it easier to get started working with them. This freedom, however, comes with a price: spreadsheets are error prone as shown by several industrial case studies reporting that an average of 94% of the analyzed spreadsheets contained errors [1]. As programming systems, spreadsheets lack the support provided by modern programming languages/environments, like for example, higher-level abstractions and powerful type and modular systems. As a result, they are prone to errors. In order to improve end-users productivity, several techniques have been proposed, which guide end users to safely and correctly edit spreadsheets, like, for example, the use of spreadsheet templates [2], ClassSheets [3], [4], and the inclusion of visual objects to provide editing assistance in spreadsheets. All these approaches propose a form of end user model-driven software development: a spreadsheet busi- ness model is defined, from which a customized spreadsheet application is then generated guaranteeing the consistency of the spreadsheet data with the underlying model. In this paper, we present MDSHEET, a unifying frame- work where we have integrated the following modelling, manipulation and co-evolution spreadsheet techniques: Embedding of ClassSheet models: ClassSheets are a powerful and widely used modelling language to define the business logic of a spreadsheet. MDSHEET em- beds this modelling language in a spreadsheet system, providing a coherent environment for model driven spreadsheet engineering, as proposed in [5]. This work is funded by the ERDF through the Programme COMPETE and by the Portuguese Government through FCT - Foundation for Science and Technology, project ref. PTDC/EIA-CCO/108613/2008. The three first authors were also supported by FCT grants SFRH/BPD/73358/2010, SFRH/ BPD/46987/2008, BI4-2011 PTDC/EIA-CCO/108613/2008, respectively. Co-evolution of ClassSheets and instances: Like any other software artifact, spreadsheets evolve over time. MDSHEET uses a formal setting where the co-evolution of the embedded ClassSheet model and the spreadsheet instance is performed [6]. Inference of ClassSheets: Being one of the most used programming languages, there are huge amounts of legacy spreadsheets. In order to provide a MDE envi- ronment for such legacy spreadsheets, we have imple- mented in MDSHEET the model inference technique that we have proposed in [3]. The functionality of MDSHEET has furthermore been integrated in a widely used spreadsheet system. With our extension, we believe to provide a controlled environment for the safe development of spreadsheets, with opportunities for reducing errors in spreadsheets. In the next three Sections, each of this features is briefly explained. The video that accompanies this paper is also divided in these same parts. II. EMBEDDING CLASSSHEETS IN SPREADSHEETS ClassSheets [4] are a high-level, object-oriented formal- ism to specify the business logic of spreadsheets. Class- Sheets allow users to express business object structures within a spreadsheet using concepts from the UML. In fact in [7] we have described a technique to automatically map ClassSheets to UML. Using the ClassSheets model, it is possible to define spreadsheet tables and to give them names, to define labels for the table’s columns, to specify the types of the values such columns may contain and also the way the table expands (e.g., horizontally or vertically). Besides a textual (and formal) definition, ClassSheets also have a visual representation which very much resembles spreadsheets themselves [8]. We have embedded such visual model representation that mimics the well-known embed- ding of a domain specific language in a general purpose one. Like in such embeddings, we inherit all the powerful features of the host language: in our case, the powerful interactive interface offered by the (host) spreadsheet system. This ap- proach has two key advantages: first, we do not have to build and maintain a complex interactive tool. Second, we provide ClassSheet model developers the programming environment they are used to: a spreadsheet environment. Furthermore,

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MDSheet: A Framework for Model-Driven Spreadsheet Engineering · spreadsheet evolution refinement rules. The catalog includes rules such as make it expandable that makes a block

MDSheet: A Framework for Model-Driven Spreadsheet Engineering

Jacome Cunha∗, Joao Paulo Fernandes∗† Jorge Mendes∗, Joao Saraiva∗∗HASLab / INESC TEC, Universidade do Minho, Portugal

†Departamento de Engenharia Informatica, Universidade do Porto, Portugal{jacome,jpaulo,jorgemendes,jas}@di.uminho.pt

Abstract—In this paper, we present MDSHEET, a frameworkfor the embedding, evolution and inference of spreadsheetmodels. This framework offers a model-driven software de-velopment mechanism for spreadsheet users.

Keywords-Spreadsheets; Model-Driven Engineering (MDE);Software Evolution; Embedded DSLs; Model Inference

I. INTRODUCTION

Spreadsheets are widely used by non-professional pro-grammers, the so-called end users, to develop businessapplications. Spreadsheet systems offer end users a highlevel of flexibility, making it easier to get started workingwith them. This freedom, however, comes with a price:spreadsheets are error prone as shown by several industrialcase studies reporting that an average of 94% of the analyzedspreadsheets contained errors [1].

As programming systems, spreadsheets lack the supportprovided by modern programming languages/environments,like for example, higher-level abstractions and powerfultype and modular systems. As a result, they are prone toerrors. In order to improve end-users productivity, severaltechniques have been proposed, which guide end users tosafely and correctly edit spreadsheets, like, for example, theuse of spreadsheet templates [2], ClassSheets [3], [4], andthe inclusion of visual objects to provide editing assistancein spreadsheets. All these approaches propose a form of enduser model-driven software development: a spreadsheet busi-ness model is defined, from which a customized spreadsheetapplication is then generated guaranteeing the consistency ofthe spreadsheet data with the underlying model.

In this paper, we present MDSHEET, a unifying frame-work where we have integrated the following modelling,manipulation and co-evolution spreadsheet techniques:

• Embedding of ClassSheet models: ClassSheets are apowerful and widely used modelling language to definethe business logic of a spreadsheet. MDSHEET em-beds this modelling language in a spreadsheet system,providing a coherent environment for model drivenspreadsheet engineering, as proposed in [5].

This work is funded by the ERDF through the Programme COMPETEand by the Portuguese Government through FCT - Foundation for Scienceand Technology, project ref. PTDC/EIA-CCO/108613/2008. The three firstauthors were also supported by FCT grants SFRH/BPD/73358/2010, SFRH/BPD/46987/2008, BI4-2011 PTDC/EIA-CCO/108613/2008, respectively.

• Co-evolution of ClassSheets and instances: Like anyother software artifact, spreadsheets evolve over time.MDSHEET uses a formal setting where the co-evolutionof the embedded ClassSheet model and the spreadsheetinstance is performed [6].

• Inference of ClassSheets: Being one of the most usedprogramming languages, there are huge amounts oflegacy spreadsheets. In order to provide a MDE envi-ronment for such legacy spreadsheets, we have imple-mented in MDSHEET the model inference techniquethat we have proposed in [3].

The functionality of MDSHEET has furthermore beenintegrated in a widely used spreadsheet system. With ourextension, we believe to provide a controlled environmentfor the safe development of spreadsheets, with opportunitiesfor reducing errors in spreadsheets.

In the next three Sections, each of this features is brieflyexplained. The video that accompanies this paper is alsodivided in these same parts.

II. EMBEDDING CLASSSHEETS IN SPREADSHEETS

ClassSheets [4] are a high-level, object-oriented formal-ism to specify the business logic of spreadsheets. Class-Sheets allow users to express business object structureswithin a spreadsheet using concepts from the UML. In factin [7] we have described a technique to automatically mapClassSheets to UML.

Using the ClassSheets model, it is possible to definespreadsheet tables and to give them names, to define labelsfor the table’s columns, to specify the types of the valuessuch columns may contain and also the way the tableexpands (e.g., horizontally or vertically).

Besides a textual (and formal) definition, ClassSheets alsohave a visual representation which very much resemblesspreadsheets themselves [8]. We have embedded such visualmodel representation that mimics the well-known embed-ding of a domain specific language in a general purpose one.Like in such embeddings, we inherit all the powerful featuresof the host language: in our case, the powerful interactiveinterface offered by the (host) spreadsheet system. This ap-proach has two key advantages: first, we do not have to buildand maintain a complex interactive tool. Second, we provideClassSheet model developers the programming environmentthey are used to: a spreadsheet environment. Furthermore,

Page 2: MDSheet: A Framework for Model-Driven Spreadsheet Engineering · spreadsheet evolution refinement rules. The catalog includes rules such as make it expandable that makes a block

because the ClassSheet model and the spreadsheet data aredefined in the same environment, we now have the powerto ensure that both are synchronized.

In order to illustrate our embedding we present in Figure 1an embedded ClassSheet model (Figure 1b) and one of itspossible instances (Figure 1a).

(a) Planes table. (b) Planes ClassSheet model.

Figure 1. Planes example.

The ClassSheet model represents planes, where a plane isdefined by an N-Number, Model and Name. In row 3, it isdefined the type, and the default value, associated with eachcolumn: in this example all columns hold strings with anempty string as default value. The fourth row of the modelcontains vertical ellipses in all columns. This means that itis possible for these columns to expand vertically: the tablesthat conform to this model can have as many rows as needed.

Reusing the ClassSheet table we built before, we cannow model a table to register concrete flights by the airlinecompany, as shown in the right-hand side of Figure 2.

The colors in the model are used to distinguish thedifferent entities represented, namely, pilots, references topilots in the scheduling table, reference to planes in thescheduling table and the flight scheduling itself.1

A. Generating Spreadsheets from ClassSheet Models

Following the Gencel approach [9], the previously de-scribed models can be translated into initial spreadsheetstogether with tailor-made versions of update operations.These operations are defined to perform the tasks of insertionor deletion in such a way that the spreadsheet correctnessis always preserved. In fact, the spreadsheet in the left-hand side of the Figure 2 was produced by our embeddingmechanism: the initial spreadsheet will contain the labels inbold on the model, the initial formulas and buttons (greyrows/columns labelled with elipsis) to add new vertical orhorizontal blocks of cells.

By using this MDE approach, the end user is guided inthe introduction of data that conforms to the underlyingmodel: for example, rows with type integer only acceptinteger values. Another key feature of this approach is thatblocks of cells are automatically produced, for example, toadd a new flight, which is a relationship between a pilotand a plane, the user must click on the button in row 6. Thesystem will then add a new row, also updating the necessaryformulas: it will update the formulas in cells E7, I7 and K7to include the new added row. This mechanism prevents the

1For space limitations, we refrain from showing again the planes table.

user from editing the spreadsheet without correctly updatingits formulas, and therefore from corrupting it.

III. CO-EVOLUTION OF MODELS AND INSTANCES

One key advantage of using a model-driven softwaredevelopment process is the ability to interact both with themodel (a ClassSheet in our case) and its instance (i.e., thespreadsheet data). This is usually a complex task becausethe model and the instances need to be synchronized! In thissection we present a set of spreadsheet co-evolution rules.Such rules define evolution steps for ClassSheet models andtheir instances and guarantee synchronization. These rulesare specified using data-refinement theory which providesan algebraic framework for calculating with data types andcorresponding values. It consists of type-level coupled withvalue-level transformations. The type-level transformationsdeal with the evolution of the model and the value-leveltransformations with the instances of the model (e.g. values).

We have designed an appropriate representation of spread-sheet models, including the fundamental notions of for-mulæand references [6]. For models and their instances,we have designed coupled transformation rules that coverspecific evolution steps, such as the insertion of columns inall occurrences of a repeated block of cells. The rules aredivided into three categories: combinators, used as helperrules, semantic rules, intended to change the model itself(e.g. add a column), and layout rules, designed to changethe visual arrangement of data (e.g. swap two columns).

Combinator Rules: The first set of rules, combinators, in-clude rules such as after, which means “apply the argumentrule after the argument label”. These combinators receive arule as an argument and apply it in a specific place of themodel, and thus, they are refinements or isomorphisms if theargument rule is a refinement or an isomorphism.

Semantic Rules: In [6] we have introduced a full catalog ofspreadsheet evolution refinement rules. The catalog includesrules such as make it expandable that makes a block ofcells expandable (horizontally or vertically), and split thatmoves a column to a new place and replaces it by referencesto the new locations. Their full definitions and HASKELLimplementations can be found in [6]. As an example, wegraphically present in Figure 3 the insert a column rule:

Sheet 1New Sheet 1Sheet 0

Forward transformation

Backward transformation

Figure 3. Adding/removing a column visually.

Page 3: MDSheet: A Framework for Model-Driven Spreadsheet Engineering · spreadsheet evolution refinement rules. The catalog includes rules such as make it expandable that makes a block

Figure 2. Spreadsheet of an airline company and an abstract model representing it.

Sheet 1 represents the original spreadsheet with an ex-isting column which is transformed in New Sheet 1 whenapplied the forward transformation. Applying the backwardfunction, we get the original spreadsheet, Sheet 1, and anew sheet containing the removed column, Sheet 0. Whenapplying the forward function Sheet 0 is not used, and thus,not necessary to exist at that point in time.

The forward transformation, that is, to add a new columnis available in the spreadsheet environment as the buttonCol+ (e.g., right-hand part of Figure 2) while the backwardfunction, that is, to remove a column, in the Col- button.

Layout Rules: As the name suggests, layout rules areintended to change the arrangement of spreadsheets only,and not to add or remove any particular information. Thisset of rules includes evolution steps for changing theorientation of a spreadsheet from vertical to horizontal or torearrange cells according to some conventions, for example.

Having defined a set of model evolution steps, we cannow evolve the model and have the data automatically co-evolved. For example, in the ClassSheet of Figure 2, if wewish to add an attribute meal to a flight, we can do it inseveral steps: first, we add a column named Meal to theFlights class (in the last column of the dark green area,after column E). Second, we set its default value to ”NO”.Then, these model evolution steps are automatically reflectedin the data, with a new column Meal being inserted by theframework in the two Flights instances in the data (newcolumns are added after column E and I). Also, the formulasof columns K are automatically updated.

IV. CLASSSHEET MODEL INFERENCE

Consider the example spreadsheet shown in Figure 4. Thisspreadsheet represents again flights of a company, but in aless organized (yet probably more common) manner.

The business logic that underlies this spreadsheet is notimmediately clear and is quite difficult to infer for a non-

Figure 4. A spreadsheet to store an airline company flights.

expert. In this section we will describe a strategy to infersuch a business logic from the data in a spreadsheet.

Objects that are contained in such a spreadsheet and therelationships between them are reflected by the presence offunctional dependencies between spreadsheet columns.

It is possible to construct a relational model from a set ofobserved functional dependencies [3]. Such a model consistsof a set of relations (each given by a set of column names)and expresses the basic business model present in the spread-sheet. Each relation of such a model results from group-ing functional dependencies together. For example, for thespreadsheet in Figure 4 we can infer the following relationalmodel (underlined columns indicate those on which theother columns are functionally dependent (primary keys); #indicates attributes referencing other table columns (foreignkeys); tables between symbols <> represent relationships):Pilots (ID, Name, Flight hours)Planes (N-Number, Model, Name)<Flights> (#ID, #N-Number, Depart, Destination, Date, Hours)

The model has two relations, Pilots and Planes, and arelationship connecting them, Flights. A relational model isvery expressive, but it is not quite suitable for spreadsheetsFrom this relational model we can generate the ClassSheetshown in Figure 2, as explained in [3].

The architecture of our approach is sketched in Figure 5.

V. TOOL ARCHITECTURE

In this section we present the architecture of our model-driven spreadsheet environment. In this environment, endusers can interact both with the ClassSheet model andthe spreadsheet data. Our techniques guarantee the syn-chronization of the two representations. In this setting, the

Page 4: MDSheet: A Framework for Model-Driven Spreadsheet Engineering · spreadsheet evolution refinement rules. The catalog includes rules such as make it expandable that makes a block

A ⇀ BC D ⇀ E

are provided through a tool, called Gencel, which is an ex-tension to MS Excel that restricts the definition and pos-sible evolution of a concrete spreadsheet according to thespecification given by the template. Thus, illegal updateoperations like a partial copying or moving of vertical (vex)or horizontal (hex) recurring groups are prevented. Further-more, formulas are updated correctly and automatically. Forexample, in the case of inserting new instances of vex or hexgroups in a concrete generated spreadsheet. For details onthe type system and spreadsheet generation, we refer to [12].

Figures 2 and 3 demonstrate a somewhat more involvedexample of a Vitsl template and a corresponding Gencelspreadsheet application for a budget calculation. We willuse this application to illustrate some limitations of theVitsl/Gencel approach as a motivation for the proposedClassSheet model.

Figure 2 shows that a hex group has been defined by clus-tering columns C, D, and E. The underlying problem domainrequirement was that for each year (and for each category)the values Qnty, Cost and their product form a logical unitand should occur in the spreadsheet. The only way to ex-press this logical clustering of three cells within Vitsl is bythe omission of layout-oriented notations as the two smallvertical bars in the header row between C, D, and E. In Gen-cel, this grouping causes the corresponding insertion anddeletion of groups of three columns as blocks.2

Now imagine that the Vitsl designer would have groupedonly the cells D and E, which would solely be visible inVitsl by an additional bar within the header row betweenthe columns C and D. This notationally minimally differ-ent Vitsl model would have resulted in a completely differ-ent spreadsheet application, in which the horizontal repeti-tion would have been restricted to the two columns D andE. This different grouping would express that the quantityvalue is fixed for all years, while only the cost value mightvary yearly.

Another possible source for an error-prone spreadsheetmodel is due to the indication of references in formu-las by means of cell-oriented addresses like C4*D4. Here,once more, the use of business logic-oriented notations likeQnty*Cost helps to prevent the design of incorrect data com-putations.

Therefore, since Vitsl is limited to the support of layout-oriented clustering constraints and cannot express problem-domain-oriented logical clustering according to business ob-jects explicitly, the semantic gap between problem domainrequirements and a spreadsheet application still forms a ma-jor obstacle to yield trustable spreadsheet applications.

3. CLASS SHEETS

In this section, we will introduce our approach of a high-level, object-oriented model for spreadsheet applications. Aformalization of the approach is presented in the three sub-sequent sections.

In order to motivate the introduction of a businessapplication-oriented structure on top of a layout-orientedVitsl template structure, we discuss in the following threesimple example spreadsheet applications. The first one (seeFigure 5, left), the so-called income sheet, consists of a list

2Note that when merging cells in MS Excel all but the firstcell entries are lost, so that this groupwise operation is notpossible at all in MS Excel.

of data values, which are summed up and the sum of whichis shown in a separate cell. From an object-oriented pointof view, one can see a summation object, which aggregratesa list of objects bearing single data values. Looking at thelayout structure, the list of single value objects, consistingof a header Item and a list of value objects, is embedded intothe layout of the summation object, consisting of a headerentry Income and a footer with the label Total and an ag-gregation formula assigned to an attribute named total. Wecall such an object-oriented extended template a ClassSheetsince it defines classes together with their attributes andaggregational relationships.

1

A

Income

2

3

Item

value = 0

4 Total

5 total = SUM(Item.value)

...

total : Int

Income

value : Int = 0

Item

*

SUM(Item.value)

IncomeˆItemˆ0↓ˆTotalˆSUM((0,−2))

Figure 5: A simple one-dimensional ClassSheet.

Thus, ClassSheets consists of a list of attribute definitionsgrouped by classes and are arranged on a two dimensionalgrid. Additional labels are used to annotate the concreterepresentation. Class names are set in boldface in contrastto attribute names and labels, which are set in normal face.In addition, colored borders are used to depict the differentclasses within a ClassSheet.3

Class parts may be spread over header and footer en-tries, which results in a bracket-like structure indicated by asquare-bracket-like notation of (open) class rectangles. Forexample, in Figure 5, the red class Income is split into aheader and footer part that surrounds the blue class Item.References to other entries, being expressed in Vitsl by co-ordinates, are defined by using attribute names, as shown inthe SUM formula in the example. Summarizing, ClassSheetssubsume all the information of an equivalent Vitsl templateand can thus easily be translated into an equivalent abstractVitsl expression (see Figure 5, bottom). Similarly, a UML-like representation may be derived from a ClassSheet (seeFigure 5, right) by forgetting all layout information. Ag-gregation formulas are added as notes to the correspond-ing attribute definition. Those attributes, called derivedattributes in UML, are tagged by a spreadsheet symbol onthe right of the attribute definition. All other attributes thevalues of which are shown in a spreadsheet are tagged by aspreadsheet symbol on the left of the attribute definition.

Within a spreadsheet design process, it is intended thatthe designer works in the first place with a sophisticatedClassSheet editor. In addition, within a fully-fledged designenvironment, the UML class diagram presentation might beoffered to the (expert) designer, who is knowledgeable inUML, to illustrate her design decisions. In the following we

3We assume in the following that through the use of PDFthis is visible to the reader

are provided through a tool, called Gencel, which is an ex-tension to MS Excel that restricts the definition and pos-sible evolution of a concrete spreadsheet according to thespecification given by the template. Thus, illegal updateoperations like a partial copying or moving of vertical (vex)or horizontal (hex) recurring groups are prevented. Further-more, formulas are updated correctly and automatically. Forexample, in the case of inserting new instances of vex or hexgroups in a concrete generated spreadsheet. For details onthe type system and spreadsheet generation, we refer to [12].

Figures 2 and 3 demonstrate a somewhat more involvedexample of a Vitsl template and a corresponding Gencelspreadsheet application for a budget calculation. We willuse this application to illustrate some limitations of theVitsl/Gencel approach as a motivation for the proposedClassSheet model.

Figure 2 shows that a hex group has been defined by clus-tering columns C, D, and E. The underlying problem domainrequirement was that for each year (and for each category)the values Qnty, Cost and their product form a logical unitand should occur in the spreadsheet. The only way to ex-press this logical clustering of three cells within Vitsl is bythe omission of layout-oriented notations as the two smallvertical bars in the header row between C, D, and E. In Gen-cel, this grouping causes the corresponding insertion anddeletion of groups of three columns as blocks.2

Now imagine that the Vitsl designer would have groupedonly the cells D and E, which would solely be visible inVitsl by an additional bar within the header row betweenthe columns C and D. This notationally minimally differ-ent Vitsl model would have resulted in a completely differ-ent spreadsheet application, in which the horizontal repeti-tion would have been restricted to the two columns D andE. This different grouping would express that the quantityvalue is fixed for all years, while only the cost value mightvary yearly.

Another possible source for an error-prone spreadsheetmodel is due to the indication of references in formu-las by means of cell-oriented addresses like C4*D4. Here,once more, the use of business logic-oriented notations likeQnty*Cost helps to prevent the design of incorrect data com-putations.

Therefore, since Vitsl is limited to the support of layout-oriented clustering constraints and cannot express problem-domain-oriented logical clustering according to business ob-jects explicitly, the semantic gap between problem domainrequirements and a spreadsheet application still forms a ma-jor obstacle to yield trustable spreadsheet applications.

3. CLASS SHEETS

In this section, we will introduce our approach of a high-level, object-oriented model for spreadsheet applications. Aformalization of the approach is presented in the three sub-sequent sections.

In order to motivate the introduction of a businessapplication-oriented structure on top of a layout-orientedVitsl template structure, we discuss in the following threesimple example spreadsheet applications. The first one (seeFigure 5, left), the so-called income sheet, consists of a list

2Note that when merging cells in MS Excel all but the firstcell entries are lost, so that this groupwise operation is notpossible at all in MS Excel.

of data values, which are summed up and the sum of whichis shown in a separate cell. From an object-oriented pointof view, one can see a summation object, which aggregratesa list of objects bearing single data values. Looking at thelayout structure, the list of single value objects, consistingof a header Item and a list of value objects, is embedded intothe layout of the summation object, consisting of a headerentry Income and a footer with the label Total and an ag-gregation formula assigned to an attribute named total. Wecall such an object-oriented extended template a ClassSheetsince it defines classes together with their attributes andaggregational relationships.

1

A

Income

2

3

Item

value = 0

4 Total

5 total = SUM(Item.value)

...

total : Int

Income

value : Int = 0

Item

*

SUM(Item.value)

IncomeˆItemˆ0↓ˆTotalˆSUM((0,−2))

Figure 5: A simple one-dimensional ClassSheet.

Thus, ClassSheets consists of a list of attribute definitionsgrouped by classes and are arranged on a two dimensionalgrid. Additional labels are used to annotate the concreterepresentation. Class names are set in boldface in contrastto attribute names and labels, which are set in normal face.In addition, colored borders are used to depict the differentclasses within a ClassSheet.3

Class parts may be spread over header and footer en-tries, which results in a bracket-like structure indicated by asquare-bracket-like notation of (open) class rectangles. Forexample, in Figure 5, the red class Income is split into aheader and footer part that surrounds the blue class Item.References to other entries, being expressed in Vitsl by co-ordinates, are defined by using attribute names, as shown inthe SUM formula in the example. Summarizing, ClassSheetssubsume all the information of an equivalent Vitsl templateand can thus easily be translated into an equivalent abstractVitsl expression (see Figure 5, bottom). Similarly, a UML-like representation may be derived from a ClassSheet (seeFigure 5, right) by forgetting all layout information. Ag-gregation formulas are added as notes to the correspond-ing attribute definition. Those attributes, called derivedattributes in UML, are tagged by a spreadsheet symbol onthe right of the attribute definition. All other attributes thevalues of which are shown in a spreadsheet are tagged by aspreadsheet symbol on the left of the attribute definition.

Within a spreadsheet design process, it is intended thatthe designer works in the first place with a sophisticatedClassSheet editor. In addition, within a fully-fledged designenvironment, the UML class diagram presentation might beoffered to the (expert) designer, who is knowledgeable inUML, to illustrate her design decisions. In the following we

3We assume in the following that through the use of PDFthis is visible to the reader

...

Generate SS AppDetect FDs

Infer CS To UML

Original SS Improved SS

To RDBTo other

paradigms

Figure 5. ClassSheet inference process.

spreadsheets consists of two sheets: Sheet 0, containingthe embedded ClassSheet model and Sheet 1, containingthe spreadsheet data that conforms to the model. We havedefined an add-on to a widely used spreadsheet system, theOpenOffice.org system, so end users can evolve their modelsby using predefined buttons in the spreadsheet environmentFor each button, we defined a OpenOffice.org BASIC scriptthat interprets the desired functionality, and send the con-tents of the spreadsheet (both the model and the data) tothe MDSHEET framework. The MDSHEET framework wasdeveloped in HASKELL, and implements the co-evolution ofthe spreadsheet models and data. The global architecture ofthe tool we developed is presented in Figure 6.

Sync

OOBasic sends sheet 0 (model) to the MDSheet back-end

Button pressed

Haskell CS data type

Application of evolution rule chosen by the end user

New Haskell CS data type

Forward and backward transformations

New Haskell spreadsheet representation

OOBasic sends sheet 1 (data) to MDSheet the back-end

Haskell spreadsheet representation

Sync

Application of the forward/backward tansformation

Sheet 0Sheet 1

Sheet 0Sheet 1

Sync

Sync

From the model we can generate a template

ClassSheet model inferenceData migration

Figure 6. Spreadsheet model-driven environment.

In this environment users can build ClassSheets fromscratch using the provided buttons. Also, we consider modelinference from spreadsheet data [3]. This is particularlyimportant when considering legacy spreadsheets. Moreover,the generated refactored spreadsheet includes some businesslogic rules (expressed as spreadsheet formulas) that assistend users in the safe and correct introduction/editing of data.

Tool and demonstration video availability: The MD-SHEET tool and a video with a demonstration of its ca-pabilities are available at the SSaaPP project web page:http://ssaapp.di.uminho.pt/twiki/bin/view/Main/Software.

VI. CONCLUSIONS

In this paper, we have proposed a framework for model-driven spreadsheet development. While having models andinstances in the same environment potentiates further analy-sis, we expect most end users to interact with data instancesonly. In order to fully explore this situation, but also torealize the impact of our work in practice, we are alreadypreparing an empirical study with human spreadsheet users.

REFERENCES

[1] R. Panko, “Facing the problem of spreadsheet errors,” DecisionLine, 37(5), 2006.

[2] R. Abraham and M. Erwig, “Inferring templates from spread-sheets,” in Proc. of the 28th Int. Conf. on Software Engineering.New York, NY, USA: ACM, 2006, pp. 182–191.

[3] J. Cunha, M. Erwig, and J. Saraiva, “Automatically inferringClassSheet models from spreadsheets,” in 2010 IEEE Sym-posium on Visual Languages and Human-Centric Computing.IEEE Computer Society, 2010, pp. 93–100.

[4] G. Engels and M. Erwig, “ClassSheets: automatic generation ofspreadsheet applications from object-oriented specifications,”in 20th IEEE/ACM Int. Conf. on Automated Sof. Eng., LongBeach, USA. ACM, 2005, pp. 124–133.

[5] J. Cunha, J. Mendes, J. P. Fernandes, and J. Saraiva, “Em-bedding and evolution of spreadsheet models in spreadsheetsystems,” in IEEE Symp. on Visual Languages and Human-Centric Computing. IEEE CS, 2011, pp. 186–201.

[6] J. Cunha, J. Visser, T. Alves, and J. Saraiva, “Type-safe evolu-tion of spreadsheets,” in Int. Conf. on Fundamental Approachesto Software Engineering, ser. FASE’11/ETAPS’11. Berlin,Heidelberg: Springer-Verlag, 2011, pp. 186–201.

[7] J. Cunha, J. P. Fernandes, and J. Saraiva, “From RelationalClassSheets to UML+OCL,” in SAC’12: the Software Engi-neering Track at the 27th Annual ACM Symposium On AppliedComputing. ACM, March 2012, pp. 1151–1158.

[8] M. Erwig, R. Abraham, I. Cooperstein, and S. Kollmansberger,“Automatic generation and maintenance of correct spread-sheets,” in Proc. of the 27th Int. Conf. on Software Eng. NewYork, NY, USA: ACM, 2005, pp. 136–145.

[9] R. Abraham, M. Erwig, S. Kollmansberger, and E. Seifert,“Visual specifications of correct spreadsheets,” in IEEE Sym-posium on Visual Languages and Human-Centric Computing.IEEE Computer Society, 2005, pp. 189–196.