configuration management with logical structures

Configuration Management with Logical Structures

Yi-Jing Lin

IBM T. J. Watson Research Center

P.O. Box 704

Yorktown Heights, NY 10598, USA

[email protected]

Abstract

When designing sofware, programmers usual~ think interms of modules that are represented as fimctions andclasses. But using existing contguration management sys-tems, programmers have to deal with versions and conrg-urations that are o~anized by files and directories. This isinconvenient and error-prone, since there is a gap betweenhandling source code and managing configurations.

Wepresent afiameworkfor programming environmentsthat handles versions and configurations directly in termsof the functions and classes in source code. We show thatwith this framework, cony$guration management issues insojlware reuse and cooperative programming become eas-iez We also present a prototype environment that has beendeveloped to verifi our ideas.

1 Introduction

Modem programming languages support constructs likefimctions and classes that let programmers decomposesource programs into modules. However, existing pro-gramming environments do not allow programmers tohandle configuration management (CM) directly withthese fimctions and classes. System building and versioncontrol are usually organized by files and directories.When controlling versions, progranuners deal with ver-sions of files and versions of directories. When describingthe system building process, programmers specifies thedependencies between source files and object files.

In this paper we present a ffarnework for programmingenvironments that handles configuration managementdirectly in terms of the fimctions and classes in the sourcecode. We define the operations required for this purpose,study their semantics, and find a general strategy to sup-port them. We show that our framework can handle a largemodule as a single unit, and this ability can make the con-figuration management issues in software reuse and coop-erative pro gramming easier. We also describe a prototypeenvironment we have implemented to verifi our ideas.

Our framework is based on a model with tied supportfor system building and version control. This model iscentered around a set of objects that represent functions

This work is conductedat BrownUniversityandfundedby IBM OraduateFellowship,the National ScienceFoundationunderGrantNumberCCR-9113226,ARPAorder8225,andONRgrantNOO014-91-J-4052

Steven P. Reiss

Department of Computer Science

Box 1910, Brown University

Providence, RI 02912, USA

[email protected]. edu

and classes. Most activities are carried out by the opera-tions of these objects. In contrast, most existing configura-tion management systems carry out their functions with aset of tools that sit on top of a passive database or file sys-tem.

Our framework has a strong object-oriented flavor, but itis not limited to supporting object-oriented programming.Any programming language that supports separate compi-lation can fit into our framework. Currently, we are focus-ing on supporting C and C++, though our framework canalso handle languages like Modula-2, Modula-3, Ada, andPascal without major modtication.

Sotiare Design Configuration Management I

Software Design Configuration Management

Figure 1: FromCMwith files to CMwith logical structures

There are limitations to our approach, though. Since ourfiwnework is based on the modukmization of programs, itcannot properly handle documents generated before aproject is decomposed into modules. Our environment isdesigned to support configuration management in softwaredesign, implementation, and maintenance, but not foractivities like requirement specifications and feasibilitystudies. Also, our framework is aimed to support thedevelopment of new programs. It cannot manage existingcode without doing some significant transformation,

2 The Problem

System building and version control are two of the most

0270-5257/96 $5.00 @ 1996 IEEE 298Proceedings of ICSE-18

important problems in configuration management. Higher-level activities like cooperative progr amming, software

reuse, and software maintenance all rely on good supportto handle these two problems. Although existing configu-ration management systems support a rich and diverse setof functions for system building and version control, theyare similar in that their mechanisms are mostly defied interms of files and directories.

An obvious limitation to handling version control interms of files and directories is that we cannot have ver-sions of sub-file entities like fictions and classes. But amore important problem is that we cannot handle the ver-sions of high-level fimctions and classes with simple oper-ations.

Suppose that we are developing a C program using aversion control tool that supports versions of files and ver-sions of directories, and we want to keep the current statusof a function F in this program. An easy solution to thisproblem is to create a version of the file that contains F.However, a version of the containing file does not guaran-tee a fixed status of F. Since F may use fictions in otherfiles, we have to create versions of the files that F dependson. Then since the files F depends on may depend on otherfiles, we have to create versions for those files too. Fur-thermore, since different versions of F may use differentfimctions and thus depends on different files, we have tocreate versions of the makefiles that manages F and thefiles F depends on. At the same time, since there are muRi-ple versions for each source file and makefile, we have totag them properly so we know which version of F maps towhich versions of source files and makefiles. If we usemismatched versions of files, the result will be unpredict-able.

Determining which files F depends on is not trivial,especially when F is a high-level function that directly orindirectly uses many lower-level functions. The files Fdepends on may scattered in multiple directories. If wemiss a file that F actually depends on, then the behavior ofF is subject to changes when we revert to an earlier ver-sion. A safe bet is to create versions for all the files Fpos-sibly depends on. But this is usually an overkill. Startingwith the intention of keeping the current status of a singlefimction, we may end up in creating a version of the wholedirectory or even the whole project, which contains manyfunctions that F does not use.

The major problem with handling system building interms of files is that the dependencies between files sel-dom match the decomposition stmctures of functions andclasses. Understanding and maintaining the mappingbetween these two structures is usually difficult andtedious.

Since MAKE [9] and its descendants are the most popu-lar system building tools in existing environments, let ustake it as an example. A typical makefile describes thedependencies between files as a three layer directed acy-clic graph (DAG), in which a set of executable depend ona set of object files, then the object files depend on a set ofsource files, This DAG of dependencies is very shallow,and apparently cannot reflect the logical structure of a

source program with more levels of decomposition. Forlarge projects, we can combine object files into libraryfiles before linking them into the executable, and thuscreate more levels in this hierarchy. However, unless wecreate a library for each major functions and classes, anduse many levels of libraries of libraries, this structure stillcannot faithfdly reflect the tme structure of the sourcecode.

The situation is even worse when we are developinglarge programs that use multiple makefiles. Since make-files lack formal communication channels like parametersand return values, the exchange of information betweenmakefiles can only rely on implicit protocols. If we have amakefile A that uses the result of another malcefile B, thenthe author of A has to know which object files are gener-ated by B. If we add, rename, or delete a file from the mod-ule managed by B, then makefile A has to be modifiedcarefidly. Since the mapping between the source code andthe dependencies described in makefiles is riot trivial, thismodification may be difficult. Also, if we have anothermakefile C that relies on the result of makefile A, then theauthor of C has to know not only what -4 generates, butalso what B generates.

As we know, the major benefit of modularization is theseparation of concern. A function or a class encapsulatesits internal complexities, so that its clients can deal onlywith its simplified interface. But as we have shown, thisseparation of concern is not available if we lhandle config-uration management in terms of files. We have to know theinternal structure of a fimction or a class when we are han-dling its versions and configurations.

3 Configuration Management in POEM

To solve the problems listed in the previous section, weare developing a programming environment called POEM(Programmable Object-centered Environment) [16].POEM manage system building and version controldirectly in terms of the fictions and classes in the sourcecode. It lets users interact with the logical structure of theirsoftware via a graphical user interface, and allows multi-ple programmers to develop soflsvare cooperatively over alocal area nelsvork.

Figure 2 shows a typical screen of POEM. There are twoworlazreas windows on the lefl hand side, and three editorson the right hand side. The small boxes iIIl the workareawindows are softiare units. Each of them represent afimction or a class. If we click mouse button on a softwareunit, then POEM will load its interjace, implementation,and documentation into the corresponding editors on theright hand side. The interface specifies the resources pro-vided by a sotlware unit. It is similar to a “h” file in tradi-tional Clc++ programming environments. Theimplementation contains the code that realizes what isspecified in its corresponding interface. It is similar to a“.c” file in traditional C/C++ programming environments.The documentation is a textual description of the softwareunit. Programmers can put anything their think appropriatethere.

299

class Poem Interface <Widget _ws_toplevel;

_s~ighl ight_color:_class.h&d @t.color:-backg?ound_col or:

os-Lmt<WIdget> -wlter-menu.items:

PoemInterface: :PoemInterface( D1spL@ disp, char* us_n\

_ws = new WorkSpace{ws_name, read.only>:if (.open_fal 1 = -ws->open.fal 1( ) ) {

_et-r_msg = -ws-Wr_msg( ):

lass Poem Interface Documentation!Jersmn: 2uthor: Y1-Jmg Lm

Date: 4/21/35

enera 1 Descr I t xon:LThis IS the raphlcal user Interface for POEM

140rkFlreas.It uses motIf widgets.

Versmn Descr@mn:1. Fixed the care dump problem caused by mowing

softkwe units.

!@te /~o/poerWorkspaces/u{yjl/POEtlUI.us.dlr/l/2/ppe

Figure 2: A typical screen of POEM

The dash-line arrows between software units are

uses_interface links. If there is a uses_interface link fromsoftware unit A to software unit B, it means that the imple-mentation of A depends on the interface of B. It is analo-gous to including a “h” file in a “.c” file under traditionalenvironments. The solid-line arrows between softwareunits are t_uses_interface links. If there is at_uses_interface links tl-om software unit A to software

unit B, it means that the interface of A depends on theinterface of B. It is analogous to including a “h” tile inanother “h” file under traditional environments.

A workarea window supports the fimctionality of a typi-cal graphical editor. Programmers can create new softwareunit boxes, move them around, and draw links betweenthem interactively.

POEM uses existing C and C++ compilers in the UNIXoperating system as its backend. It does not require modi-fication to the syntax of C and C++. But since theuses_interface links and t_uses_interface links alreadydescribe the exchanging of definitions between software

units, the “#include” directives should not be used whenthe corresponding links exist. Using links and “#include”directives simultaneously is redundant and may causeintegrity problems.

3.1 System Building

The example in Figure 2 is actually the implementing ofPOEM under POEM itself. The main fonction of POEM is

represented as a sol%vare unit located near the top of theworkarea A. To build the executable code of POEM, weselect this soflware unit by clicking mouse button on it,and choose BuildExe under the Build menu. POEM willcompile all the soflsvare units whose object code is out-dated, and link their object code into a executable pro-gram. To run the program, we can choose Run under the

Build menu.What POEM does about system building is similar to

what MAKE does. It checks the dependencies betweencomponents of a program and compiles only the outdatedones. But there are three differences. First, we do not haveto write a separate makefile. The soflsvare units and theirlinks in the workareas windows work like a graphicalmakefile. They give POEM all the necessary dependencyinformation. Second, we do not have to know the locationsof the object code. The handling of object code is hiddenfrom users. Third, the relations we specified in the workar-eas are the logical relations between fimctions and classes,not the compilation and linking dependencies betweenfiles. By looking at the graph shown in workarea windows,we can easily get a big picture of the general structure ofour program.

3.2 Version Control

If we want to keep the current status of a class or a fimc-tion, we can select its corresponding soi%vare unit andchoose Snapshot under the Version menu. Figure 3 shows

300

the result of applying the snapshot operation on a sotlwareunit poemui. The black boxes are jixed versions of solt-ware units that are created by the snapshot operation.POEM creates a fixed version for all the modified softwareunits that can be reached from poemui, and link themtogether in the same ways as in the current working ver-sion. POEM also makes sure that if a software unit isfixed, then all the soflware units it can reach via links arealso tixed.

----

%!!?!%Figure 3: Making snapshot of a class

Fixed software units are read only, but they can be usedin the same way as other sol%vare hits are. If we click ona fixed soflware unit, POEM will bring up its interface,implementation, and documentation in the editors. Othersoftware units can also use a fixed software unit by point-ing links to it. There is no need for check-in or check-outoperations. POEM internally use RCS [24] to save spaceon fixed software units, but most of those actions are hid-den from users.

POEM allows multiple versions of a software unit toreside in the same workarea. There is no need to create aseparate directory or workspace for each version of aproject. Operations on one version of a software unit willnot interfere with other versions.

3.3 Cooperative Programming

The software units in one workarea may have linkspointing to software units in other workareas. In theworkareas in Figure 2, white boxes are locally definedsotlware units, and grey boxes are actually the shadows ofsoftsvare units defined in other workareas. For example,the grey box labeled poemui:l near the bottom ofworkarea ~ is a shadow of the earlier version of poemui :2,which is located at the top of workarea B. The shadowboxes are created automatically when users try to draw alink across the workarea boundaries.

The software unit poemui:l is a C++ class that usesmany other fi.mctions and classes. Some of those functions

and classes are defined in other workareas that are notshown in Figure 2. But in spite of this complexity, it is rep-resented as a single shadow box in its client, work-area A.When we are using a software unit from another workarea,we can ignore all its internal complexity.

In this example, workarea B is actuallly owned byanother programmer working on a different machme. As aclient, we cannot modi~ its contents, but we can use thesoftware units it contains by pointing links to them. Toreduce interference, programmers are recommended to useonly the older but stable versions of software units devel-oped by other programmers.

3.4 Software Reuse

In POEM, reusable libraries are represented as specialworkareas. Figure 4 shows the workarea for the Motiflibrary. Each software unit in this workarea correspond toa Motif widget class or a Motif fhnction. If we want to useone of them, we can simply establish a link to it. Thesesoftware units behave just like normal ones. If we clickmouse on a library soflware unit, POEM will load itsheader file into the interface editor, and its manual pagesinto the documentation editor.

*,.. . ,,4, .”. W.. ,,.+. ..,,, “.. ,., ,, ..:~1 ‘%

Figure 4: The workarea for Motif library

In existing environments, reusing a library function orclass is more tedious. To reuse a library function or class,programmers have to locate its archive file and its headerfile, and then modify the makefiles to incorporate them. Atthe same time, programmers have to locate the corre-sponding manual pages in some remote directory. This isboth tedious and error-prone. If there are multiple versionsof the same library, then programmers ma~y accidentallyget the header file from one version and the archive filefrom another version. The manual pages read by program-mers may not match the software either.

An more important advantage of POEM is that we candirectly reuse functions and classes that are not made intolibraries. For example, if we want to reuse the class repre-sented by the software unit poemui, which is defined inworkarea B of Figure 2, we can simply establish linksfrom our new software units to poemui. This k shown inFigure 5. Although poemui depends on many other sot%ware units, the reuse can be accomplished with a single

operation. There is no need to copy or modi~ sofbvare

301

units in the original workarea.

F[gure 5: Reusing a class’”

Actually, POEM does not differentiate between a reusedsothvare &it and a locally created one. All software unitsin a project are treated the same. We can follow the linksto a reused software unit just like browsing any other soft-ware units. If we want to modi~ the source code of areused software unit, we can invoke the revise operation ofthe reused software unit to get a modifiable version. Theability to tailor a reused software unit is important whenwe are reusing high-level functions and classes.

In contrast, reusing a t%nction or class that is not madeinto a library in a traditional environment is rather diffi-cult. We have to know the location of all the object files itgenerates, and all the libraries it uses. Besides, modifyingmakefiles to incorporate a large module is usually difficult.

Another problem with existing environments is thatmodules can be reused easily only when they are madeinto libraries. Since making large modules into libraries is ,not trivial, programmers seldom do this with their code.As a result, many chances for software reuse are missed.

4 The Model

The model of POEM has four major concepts - so@wareunits, subsystems, workareas and classes ofsofiare units.In this section, we discuss these basic concepts and theirroles in system building and version control.

/TB1 \

Figure 6: The POEM model

From a programmer’s point of view, POEM consists ofa set of soflsvare units that are interconnected by links. Asillustrated in the inset of Figure 6, a software uriit containsa set of data attributes, a set of operations, and a set oflinks. Data attributes are encapsulated and not directlyaccessible to programmers. Operations can be invoked byother software units or directly by programmers. Links areconsidered as parts of the software units they originatefrom, but can be modified directly by programmers. Tosupport the basic functions of our framework, we onlyneed two kinds of links - uses_interface links andt_uses_interface links.

A software unit usually represent a fiction or a class inC and C++. But if it is desired, we can put a set of closelyrelated fictions or classes into a single software unit.Individual functions and classes in the same software unit,however, cannot be accessedseparately.

The subsystem designated by a software unit X, S(4, isthe set of software units that can be reached from X vialinks. X is called the root sojware unit of S(W. A sub-system can usually be operated as a single unit by invok-ing the operations of its root software unit. For example,invoking the build operation of a software unit X will gen-erate the derived objects of the whole subsystem S(2J, notjust those of the root software unit X,

According to our definition, subsystems may overlapeach other. For example, in Figure 6 the subsystem S(A)contains software units A, C, D, X and X the subsystemS(B) contains B, z and Z. Y appears in both subsystems.

Software units are partitioned into mutually exclusiveworkareas. Workareas serve two purposes in our frame-work. First, they divide the name space of software unitsinto manageable units. Second, they define the boundariesbetween programming tasks. Each workarea has a ownerprogrammer. Only the owner can edit software units in aworkarea. The ownership of a workarea, however, can betransferred between programmers.

In each workarea, there are some mutable software unitsthat are under development, and some immutable ones thatrepresent old versions. The mutable ones are called activeversions; the immutable ones are called jixed versions.Usually, a programmer will edit active software units inone workarea, and access fixed software units in otherworkareas at the same time. Fixed software units in remoteworkareas can be accessed directly without check-in orcheck-out operations.

The concept of workarea is different from the concept ofworkspace, which is used in many existing configurationmanagement systems [1] [7] [17] [22]. In systems that useworkspaces, each programmer creates his or her ownworkspace that copies all the files fmm a common work-space. To save space, some of those systems supply mech-anisms to make virtual copies, so that shared files can belogically but not physically duplicated. In our model,workareas are used to partition the set of software units.Software units are not duplicated either physically or logi-cally.

An advantage of using workareas instead of workspacesis that we can handle the sharing of soflsvare artifacts

302

between programmers more naturally. If two programmersuse the same version of a subsystem, then they will auto-matically shared all the source objects and derived objectsof that subsystem. In DSEE [14] and SHAPE [17], thesharing of derived objects is handled by more complicatedmechanisms.

Different kinds of soflware units need different dataattributes and different implementation of their operations.For example, some software units used in a C programmay contain YACC code instead of plain C code. Thesesofhvare units need an extra data attribute to store theintermediate C code generated by YACC, and differentimplementation of operations to handle this difference.

To meet this requirement, we define a set of classes forcommonly used software units, like those for C, C++, andYACC. A class defines a set of data attributes and a set ofoperations. Users can create new software units asinstances of a class. Upon creation, a software unit inheritsall the data attributes and operations from the class it iscreated from. Users can also create new classes of soft-ware units to meet their special needs.

To insure the compatibility between software units, werequire that all new classes should be created as the sub-classes of existing ones. A subclass can redefine the opera-tions it inherits from its parent class, but cannot removethem. By doing so, we can insure that the operationsdefined in system-defied classes will be available in allclasses.

4.1 System Building

With our framework, we want to achieve three majorgoals in handling system building. First, we want to sim-pli~ the specification of the system building process.After establishing the links between software units, pro-grammers should be able to generate the executable of aprogram without writing a separate makefile. Second, wewant to free programmers from the manipulation ofderived objects. The identification of object files andlibraries should be handled automatically, so that program-mers can use a large subsystem without knowing whatderived objects it generates. Lastly, we want to let pro-grammers customize the system building process in termsof subsystems. For example, programmers should be ableto turn on the debugging flag of a subsystem with a simpleaction, no matter how many software units it contains.

System building in our environment is handled by thebuild operations of software units. To generate the execut-able of a program rooted at software unit X, programmersshould invoke the build operation of X. If the building issuccessful, then the executable file will be returned as theresult of the build operation. If the building fails, then theerror messages will be associated with the software unitsthat contain the erroneous code. Programmers can alsobuild the derived objects of individual subsystems byinvoking the build operations of their root software units.

Internally, this system building process is carried outcollaboratively by the build operations of all soflvmre unitsin a subsystem. Each soflware unit compiles its source

code if its derived objects are outdated, propagates thebuild message along the links, and then collect derivedobjects from the subsystems it uses. As illustrated in Fig-ure 7, the propagation of build messages is essentially adepth-first traversal of the graph formed by software unitsand their links.

(%nterface

fObjectcodeLibraries

D

\ . . . H. . L .,,.

\\

\\

B

message

SI ElFigure 7: The system building process

During the system building process, sollxvare artifactsare passed along the uses_interface links andt_uses_interface links. When compiling, a software unitgets the interfaces of its submodules along its links. In thelinking stage, derived objects are also pawed upstreamalong the links. This flow of software artifacts is illustratedin the inset of Figure 7.

An important problem we have to deal with is the cyclesformed by the links. Cycles of links are sometimesunavoidable when soflware units are mutually dependent.These cycles may cause infinite loops in the propagationof build messages. One way to avoid this problem is topass the set of software units already visited in the propa-gation as an argument to the build operation. The buildoperation is propagated only to those software units thatare not yet in the set.

The default parameters used to build the derived objectsof a soflware unit are stored on each software unit. Theseparameters include the compiler and compilation flagsbeing used. Programmers can change these parameters byinvoking the set_attributes operations of a software unit.For example, programmers can request a software unit toput debugging information in its object code by turning onits debugging flag. By default, the set_attributes opera-tions are also propagated along links, so that the same set-ting will take effect on the whole subsystem. Thispropagation can be prohibited when programmers want toconfine the effects of the new setting to the root so fiware

unit.

303

4.2 Version Control

Our version control system has two major goals. First,we want to provide facilities that allow programmers tocreate, select, and use versions in terms of subsystems.Second, we want to simpli& the access to old versionswhile still minimizing the space they occupy.

In terms of version control, we classi& software unitsinto jiked versions and active versions. A tied version is

an immutable copy, an active version is a writable work-ing copy. An active version provides a snapshot operationthat generates an fixed version as its predecessor. A fixedversion provides a revise operation that creates a newactive version as its successor. In the beginning, every newsoftware unit is created as an active version. By repeatedlyapplying snapshot and revise operations, we can create aversion history tree for each software unit. This is illus-trated in Figure 8.

m------clVersion 1.1 Version 1.2

Snapshot Version 1.2

m----- *------+lzVersion 1.1 Version 1.2 Version 1.3

Revise Version 1.2

----.-*,-----QVersion 1.1 Version 1.2’ * ~verslon 1.3

u( Version 1.2.1.1 J

Figure 8: Version control of a single software unit

In our model, invoking the snapshot operation or reviseoperation on a software unit X will affect the whole sub-system S(’). When the snapshot operation is invoked, ourmodel generates fixed versions for all the modified sotl-ware units in S(X), and connect them together in the sameway as in the original version. When the revise operationof a active software unit is invoked, our model will createa subsystem that contains active software units in the localworkarea and fixed software unit in remote workareas.The revise operation does not create active versions inremote workareas because remote workareas are usuallyowned by other programmers, and usually we do not wantto directly modify the code owned by other programmers.

Figure 9 shows the effect of revise and snapshot opera-tions on a subsystem. Notice that since we do not modifJC and X in Figure 9(c), no new snapshots are created forthem in Figure 9(d). However, although D is not directlymodified either, a new snapshot of D is created because ituses a modified version of Y

Figure 9: Version control of a subsystem

In our model, a fixed soflvvare unit designates a fixedsubsystem. We insure that if a software unit X is fixed,then all the software units in S@) are also fixed. By havingthis property, we can guarantee that given the same argu-ment in build operations, S(X) will always generate thesame derived objects.

Like the system building process, the revise and snap-shot operations on a subsystem are carried out by the col-laboration of individual software units. When revising asubsystem, the revise message is propagated along the linkuntil it hits the boundaries of a workarea. Each softwareunit then uses its own revise operation to process its dataattributes. The snapshot operation on a subsystem is car-ried out similarly by propagating the snapshot message.

Our model to handle composite objects is similar to thatof PCTE [10] [4]. But there are two major differences.First, while all the attributes of a stable object in PCTE areimmutable, a fixed software unit in our framework maymodi@ its attributes internally. For example, it may deleteits derived objects and compress its source objects to savespace. The only requirement is that each Iixed softwareunit should keep enough information so that the samederived objects can be regenerated. Second, instead ofusing the same predefine operations for all objects, weallow different software units to have different implemen-tation of their revise and snapshot operations. This is nec-essary because different classes of software units mayhave different data attributes.

In our framework, the selection of versions is made interms of subsystems. When we make a uses_interface ort_uses_interface link to a certain version of software unitX, we are also choosing a certain version of S(X). We donot have to know which versions of software units SWcontains, or whether different versions of SW use differ-ent sets of software units.

Compared with DSEE and SHAPE, version selection inour framework is more hierarchical. The selection of ver-sions is in terms of subsystems instead of files, and eachsoftware unit hides the selections of its subsystems fromits parent sollsvare unit. In DSEE and SHAPE, versions offiles are selected by selection rules. Selection rules may

304

get files that are not compatible, since two reliable files donot imply that their combination are also reliable. Whilethis problem is less likely to happen in our fmrnework, wehave to insure that only one version of a software unit isused by a subsystem. For example, in Figure 9(b) we haveto insure that C’ and D‘ use the same version of X. A pos-sible solution to this problem is to detect these conflictsduring the system building process, and let users set somerules to resolve them.

Our framework also aims to simpli~ the access to oldversions, while reducing the space they occupy. To savespace, a software unit may delete its derived objects andcheck-in its source code in its snapshot operation. But werequire that all these space conserving actions be hiddenfrom the users, so that a fixed version can be accessed inthe same way as an active version is. For example, if afixed software unit checked in all of its source code intosome version repository, then when its edit operation kinvoked, it should check out its source code internallybefore loading the code into editors. We also require thatthe snapshot operation not change the semantics of a sub-system. In other words, a fixed subsystem should keepenough information so that it can regenerate the samederived objects.

Since a fixed sotlware unit may check out its sourceobjects and regenerate its derived objects internally, weneed an operation to remove these objects when the fixedsoftware unit is no longer used. We define a clean-up oper-ation for this purpose. A clean-up operation reduces thespace occupied by a subsystem without affecting itsbehavior. Again, we propagate the clean-up messagesalong links, and let individual software units decide whatto do with their data attributes. The clean-up operationmay be invoked either explicitly by programmers, or inter-nally by the system. When a workarea is short of diskspace, it may invoked the clean-up operations of its least-recently-used software units.

4.3 Discussion

Our framework draws its power from two principles.First, we organize software artifacts at the conzguration

management level in parallel with the modularization atthe source code level. A software unit plays two roles atthe same time. It represents a fimction or a class in sourcecode, and it corresponds to a configuration in configura-tion management. Since we use the same decompositionstructure, the gap between handling source code and man-aging configurations is narrowed. Additionally, since werepresent the relations between modules explicitly aslinks, programmers can examine and traverse themdirectly without looking into the source code. This helpsprogrammers to understand the general structure of a pro-

w.Second our approach applies the principles of modular-

ization and encapsulation to configuration management. Itis well known that modularization and encapsulation arevery useful in managing source programs. Similarly, theycan help greatly in handling configuration management.

As pointed out by Osterweil [19], software processes canbe considered as special programs that are enacted by bothhuman and computers. The data used by these special pro-grams are software artifacts like source code, derivedobjects, system models, documentation, version historyfiles, etc. By applying the principles of modularization andencapsulation on these sofiware artifacts, we can simpli@configuration management greatly. modularization helpsus to decompose the configuration management tasks intomanageable units. Encapsulation helps us to hide the dif-ferent configuration management requirements of individ-ual subsystems.

5 Implementation

As illustrated in Figure 10, the architecture of POEM iscomposed of a user interface layer and an o[y”ect layer ontop of the file system. The object layer is where the soft-ware units are stored. Currently we are using an obj ect-ori-ented database system called ObjectStore [13] toimplement this layer. ObjectStore is basically an extensionof the C++ language that supports persistent objects. Itallows multiple clients to work on the same databasesimultaneously in a distributed environment, and it sup-ports transaction facilities to serialize the operations onobjects.

Figure 10: The architecture of P’OEM

To increase the flexibility of this environment, we sup-ply an interpretive language that allows users to definenew classes of software units, or change the definition ofindividual software units. Since most activities in POEMare carried out by the operations of software units, defin-ing new classes of software units has the effect of tailoringthe whole environment.

In POEM, users do not directly access files and directo-ries of the underlying file system. But instead of storingeverything in the object layer, POEM still stores large soft-ware artifacts like source code, object cocle, and docu-ments as files in the underlying file system. Doing somakes POEM more open to the outside world. Existingtools like editors, compilers, and debuggers can be

305

invoked by the operations of software units to processthese files. We can also utilize existing code and documen-tation by creating software units that reference existingfiles.

The purpose of the user interface layer is to let program-mers access software units in the object layer. This layer isfairly simple because most fimctions of POEM are alreadysupported by software units in the object layer. Its majorresponsibilities are to display the contents of softwareunits, to let users invoke operations of software units, andto show the relationship between software units. A graphi-cal user interface can easily be built, since software unitsand their links map naturally to icons and edges. Opera-tions of a software unit can also be represented as items ina pop-up menu.

The implementation of POEM involves about 8,000lines of C++ code on top of Motif, ObjectStore, YACC,and LEX, and about 1,000 lines of code in the script lan-guage of POEM. Considering the rich functionality POEMsupplies, the implementation is rather easy. This is possi-ble because our model is rather simple, and because weutilize many existing tools.

6 Related Work

MAKE [9] is the most commonly used system buildingtool, but it has rather weak support for the modularizationof makefiles. There is no formal channel of communica-tion between makefiles. If we want to use multiple make-files in a large project, then the passing of arguments hasto rely on implicit protocols. The lack of formal argumentsalso makes the reuse of makefiles more difficult. TheVESTA configuration management system [15] aims tosolve these problems by the modularization and parame-trization of system models. It uses a functional languageto describe system models, and emphasizes the modular-ization and parameterization of system models. However,since the nature of system building requires lots of param-eters for each system building functions, and since it isimpractical to supply all these parameters on each call tothese fimctions, VESTA has to introduce a complex bind-ing mechanism to manage the parameters of system build-ing fimctions. Our framework shares the same goals withVESTA in system building, but uses an object-orientedapproach instead of a fictional approach. The parametersfor system building are stored on sofhvare units instead ofpassed as arguments to functions.

Early version control systems like SCCS [20] and RCS[24] handle the versions of files only. CVS [3] enhancesRCS by letting programmers handle versions of directo-ries. DSEE [14] allows version selection in terms ofthreads that refers to files or other threads. ClearCase [1]enhances the functionality of DSEE and runs on severalplatforms without special supports from the underlyingoperating system.

POEM controls versions in terms of software units,which are objects. Version control in terms of objects isstudied in the area of software development environmentsas well as in the area of computer aided design (CAD).

Zdonik describes an object-oriented database system thatincludes a built-in version control mechanism [25],PCTE+ [4] supports operations to manage versions ofcomposite objects. Katz surveyed version modeling inengineering databases, and proposed a unified framework[12].

Users of POEM do not have to manage derived objectsby themselves. Several other systems also aim to free pro-grammers from the management of derived objects. Cedar[23] and DSEE [14] use source oriented system models.

Derived objects are not directly managed by programmers,but programmers still have to address them indirectly bysome fictions. VESTA hides the management of derivedobjects by passing them as the results of building func-tions.

Users of POEM do not have to write makefiles. Mostexisting configuration management systems, in contrast,use separate makefiles to describe the system building pro-cess. CaseWare [6] stores the information about how tobuild a particular type of object on the object type defin-ition itself, and places the build context information onobjects. But the compositions of configurations are stilldescribed by separate collections called assemblies.

Our framework aims to narrow the gap between han-dling source code and managing configurations. In Ada[1], since packages and subprograms are units of sourcecode as well as units for compilation, the gap betweenhandling system building and managing source code issmaller than in other languages. Besides, an Ada environ-ment is responsible to decide which units need to be re-compiled based on the logical relations between them [5].However, Ada organizes compiled packages in libraries,which have flat structures and thus cannot directly capturethe relationship between packages and subprograms. Adaalso needs additional mechanisms to handle a mappingbetween versions of libraries and their elements. CMVC[18] presented one of such mechanisms.

Gandalf [ 11] uses a module concept where a set of ver-sions implement a single interface. Adele [8] extends thismodel so that interfaces themselves may have versions. Italso describes the relation between interfaces and imple-mentations as an ANIYOR graph. Compared with ourframework, Adele is more flexible to handle variants ofversions, but it is also more complicated.

Rumbaugh [21] proposed a framework to control thepropagation of operations between objects. The propaga-tion policy is based on attributes associated with relations.Our framework also relies on the propagation of opera-tions to handle composite objects. But instead of using thefull power of that rather complicated model, we found thattreating all operations equally and using workareas to con-trol the propagation is sul%ce to meet our requirements.

7 Conclusion

The major goal of this research is to find a t%meworkthat makes configuration management simple yet power-ful. We have shown that it is possible to provide a pro-gramming environment that handles system building and

306

version control according to the logical structure of sourcecode. This makes programming easier because program-mers no longer have to maintain the mapping between thestructure of source code and that of the physical softsvareartifacts. Our framework also handles derived objectsautomatically, and allow programmers to handle large sub-systems as a single unit.

While the basics of this framework is established, thereare still many problems to be solved. We are developing aformal model to describe the mechanisms in our framew-ork, and prove the properties we claimed. We are alsoworking on mechanisms to manage variants of sub-systems, so that programmers can switch between themmore easily.

We also need more fieldtest on our ideas. Until now, ourexperience with POEM has been more than satisfactory.But limited by the resources we have in a university, wehave not use POEM on real-life projects. The largest pro-gram we have tried to develop under POEM is POEMitself, which contains only several thousand lines of sourcecode. We are especially interested in seeing how the work-ing patterns using workareas in POEM will be differentfrom those using workspaces in existing programmingenvironments.

AcknowledgmentsWe would like to thank David Notkin, Peter Wegner,

Stanley Zdonik, Hsueh-I Lu, and Anthony Cassandra fortheir suggestions and comments.

References[1] Atria Software, Inc., “ClearCase Architecture and Database:’

Atria Software, Inc., Massachusetts, USA, August 1993.[21 J. G. P. Barnes. “An Overview of Ada,” in Softiare Practice. .

[3]

[4]

and Experience, vol. 10, pp. 851-887; 1980. -Brian Berliner, ‘LCVS H: Parallelizing Software Develop-ment,” Prisms, Inc., 5465 Mark Dabling Blvd, ColoradoSprings, CO 80918.

Gerard Boudier, Ferdinand Gallo, Regis Minot, and IanThomas, “An Overview of PCTE and PCTE+/’ in Procee-dingsof the ACM SIGSOFT/SIGPLAN Sof~are Engineen”ng

Symposium on Practical Sofmare Development Environ-ments, (Peter Henderson, cd.), pp. 248-257, November1988. Published as ACM SIGSOFT Software Engineering

Notes, Vol 13, No 5, November 1988, and ACM SIGPLANNotices, Vol 24, No 2, February 1989.

[5] John N. Buxton and larry E. Druffel, “Requirements for AnAda Programming Support Environment: Rationale forSTONEMANU in Proceedings of COMPSAC 80, pp. 66-72, 1980.

[6] Martin R. Cagan, “Software Configuration ManagementRedefined? CaseWare, Inc., 108 Pacifica, Irvine, CA

92718, March 1992.

[7] W. Courington, “The Network Software Environment;’ Tech-nical Report Sun FE1 97-O, Sun Microsystems Inc., Febru-ary 1989.

[8] Jacky Estublier, “A Configuration Manager: The Adele database of programs,” in Workshop on sofwam engineen”ng

environments for programming-in-the-large, June 1985[9] Stuart I. Feldman, “Make - a Program for Maintaining Com-

puter Programs~’ in Sof~are - Practice & Expen”ence, 9(4),

pp. 255-265, April 1979.

[10] Ferdinand Gallo, Regis Minot, and Ian Thomas, “TheObject Management System of PCTE as a Software Engi-neering Database Management System,” in Proceedings of

the ACM SIGSOFT/SIGPLAN Sof~are Engineering Sym-

posium on Practical Softiare Development Environments,(Peter Henderson, cd.), pp. 12-15, December 1986.

[11] A. Nico Habermann and David Notkin, “Gandalfi Software

Development Environments:’ in IEEE Transactions onSoftware Engineering, Vol 12, No 12, pp. 1117-1127,December 1986.

[12] Randy H. Katz, “Toward a Unified Framework for VersionModeling in Engineering Databases: ACM Computing Sur-veys, Vol 22, No 4, pp. 375-408, December 1!)90.

[13] Charles Lamb, Gordon Landis, Jack Orenstein, and DanWeinred, “The ObjectStore Database System;’ in Commu-

nications of the ACM, Vol. 34, No. 10, pp. 30-63, October1991.

[14] David B. Leblang and Robert P. Chase, Jr., “Computer-

Aided Soflware Engineering in a Distributed Workstation

Environment’ in SIGPLAN Notices, vol. 19, No. 5, pp.104-113, April 1984.

[15] Roy Levin and Paul R. McJones, “The Vesta Approach toPrecise Configuration of Large Software Systems,” DECSystem Research Center Research Report “No. 105, June1993.

[16] Yi-Jing Lin and Steven P. Reiss, “Configuration Manage-ment in terms of Modules,” in Proceedings of the F@hInternational Worhxhop on Sof~are Configuration Man-agement, April 1995.

[17] Axel Mahler and Andreas Lampen, “An Integrated Toolsetfor Engineering Software Configurations:’ in Pwceedingsof the ACM SIGSOFT/SIGPLAN Sofiwaze Engineen’ng

Symposium on Practical Sof~are Development Enviro-nments, (Peter Henderson, cd.), pp. 191 -2(D0, November1988. Published as ACM SIGSOFT Software EngineeringNotes, Vol 13, No 5, November 1988, and ACM SIGPLANNotices, Vol 24, No 2, February 1989.

[18] Thomas M. Morgan, “Configuration Management and Ver-sion Control in the Rational Programming Environment,” inAda in Industry - Proceedings of the Ada-Europe Interna-

tional Conference, pp. 17-28, June, 1988.[19] Leon Osterweil, “Software Process are Software Too? in

Proceedings of the 9th International Conj2rence on Soft-

ware Engineen”ng, pp. 2-13, Monterey CA, March-April1987.

[20] Marc J. Rochkind, “The Source Code Control System; inIEEE Transactions on SofMare Engineering, pp. 364-370,Vol 1 No 4, December 1975.

[21] James Rurnbaugh, “Controlling Propagation of Operationsusing Attributes on Relations,” ACM OOIDSLA ’88 Pro-ceedings, pp. 285-296, September 1988.

[22] SunPro, “SPARCworksffearnWare Solution Guide, Sun

Microsystems, Inc., California, USA, 1994.[23] “ Warren Teitelman, “A tour through Cedarj’ in IEEE Soft-

waw, pp. 44-73, Apr 1984.

[24] Walter F. Tichy, “RCS - A System for Version Control; inSoftiare - Practice& Experience, Vol. 15, No. 7, pp. 637-654, July 1985.

[25] Stanley B. Zdonik, “Version Management in an Object-Ori-ented database,” in Advanced Programming Environments,(R. Conradi, T. M. Didriksen, and D. H. Wmwik, cd.), pp.405-422.1986.

307

configuration management with logical structures

Documents