introduction to the special issue on software engineering
TRANSCRIPT
introduction to the Special Issue on Software Engineering
Software engineering, like economics and psychol- ogy, is a field of conjecture. In ail three fields, the researcher makes a conjecture and then attempts to substantiate it through an analysis of observed data.
For example, an economist may make a conjecture about the impact of deregulation of airlines on competi- tion. Whether this conjecture is true can be determined only through observation of the state of the industry after deregulation and comparison with the state of the industry before deregulation. Likewise, a psychologist may form a hypothesis that a person’s short-term memory consists of a stack-type data structure. This theory can be validated only by having a subject try to remember words, and determining which of the words are forgotten first as the limit of short-term memory is approached.
Clearly, economics and psychology share a similar paradigm (conjecture, then proof through obse~ation); it is significant, however, that their specific methodolo- gies differ. The economist examines the workings of the “real world” via empirical data, never quite sure if the activity observed is due to the factor under examination or is a side effect of some other unanticipated occur- rence. Conversely, the psychologist, by using finely controlled experimentation, can isolate events and exam- ine them in sharp focus. However, the researcher can never be sure that what is observed in the laboratory setting will generalize to the real world, where many other variables interact with the one of interest.
Software engineering is rooted quite strongly in the conjecture-proof paradigm, and in fact uses both the empirical and experimental approaches to testing a particular hypothesis. For example, a researcher may conjecture that deep control nesting results in programs that are more likely to have errors than programs that avoid nesting through the use of multiple compound conditionals (a necessary trade-off). This conjecture may be validated in one of two ways:
1 The researcher may split a group of programmers into two groups, one of which is to write a program using shallow nesting and many compound condi- tionals. The second group will write the same program with a great deal of nesting but few compound conditionals. After having the subjects develop the programs, the experimentor can then
The Journal of Systems and Software 8, l-2 (1988)
0 1988 Elsevier Science Publishing Co.. Inc., 1988
2.
determine which group’s code contains more errors. A statistical difference in error rates for one group over the other might lead the experimentor to conclude that the hypothesis is either true or false. Unfortunately, in such cases the program to be written is usually quite trivial, the development environment artificial, and the mount of time devoted to carrying out the experiment limited to an hour or less. Thus, it is not at all clear that a manager in industry can generalize these results from the laboratory to a given project. A second approach to validating the hypothesis would be for the researcher to acquire a large number of industrial software systems, along with their error histories. The degree of nesting in each system could be determined, and the level of nesting and error histories among the sample could be statistically compared. However, because the software systems would no doubt come from a wide variety of environments, application domains, and program- mers, the impacts of these differences might over- whelm that of the difference due to the level of nesting. Thus, the manager would not be able to trust these validation efforts any more than those done in the experimental setting.
The purpose of this special issue is to attempt to improve the current situation in empirical software engineering studies. The issue will present tools and techniques that make collecting, sharing, and using empirical software engineering data as painless as possible. The papers in the issue can be divided into three categories.
First, the papers by Sallie Henry (“A Technique for Hiding Proprietary Details While Providing Sufficient Info~ation for Researchers”) and Jim Bieman, Al Raker, Paul Clites, David Gustafson, and Austin Melton (“A Standard Representation of Imperative Language Programs for Data Collection and Software Measures Specification”) discuss an extremely important problem that is often encountered in the attempt to obtain empirical data-the issue of proprietary software.
The typical project manager would be reluctant to give a copy of a $100,000 software system to a professor at T-Test 27 to “study. ” This reluctance should not be surprising-vendors seldom even provide source code to
1
0164-1212/88/$3.50
2 W. Harrison
their customers. The papers just cited describe some
novel methods of producing special program formats
that let us get at the characteristics we might be
interested in, while barring reproduction of the original source code.
The second set of papers is by T. J. Yu, B. A. Nejmeh, H. E. Dunsmore, and V. Y. Shen (“SMDC:
An Interactive Software Metrics Data Collection and Analysis System”), William Farr and Oliver Smith (“A Tool for Statistical Modeling and Estimation of Reliabil-
ity Functions for Software: SMERFS”), and Warren Harrison (‘ ‘MAE: A Syntactic Metric Analysis Environ- ment”). Quite often, even if the data are available, we
might have a hard time summarizing and interpreting it.
These papers report on tools and environments used to analyze and maintain collections of software engineering
data. Finally, the paper by Burt Swanson and Cynthia Beath
(“The Use of Case Study Data in Software Management
Research”) explores the role of case studies in software engineering research. The content of this paper differs radically from that of the other papers, since these
authors look at what might be termed the human side of software, as opposed to the automatic generation and analysis of code characteristics addressed in the other
papers. If not for the help and encouragement of everyone
involved, this special issue would not have been possi-
ble. In particular, I thank Connie Helm for her help in contacting potential authors and reviewers. I also thank Bob Glass for his enthusiasm and encouragement, as
well as for arranging for the review of my paper so the process of anonymous review could be maintained.
Most important, I thank the referees for their hard
work and willingness to participate in this special issue. Some of the referees reviewed as many as three papers. A partial list of the referees follows.
Bahram Adrangi
University of Portland
James M. Bieman
Iowa State University
Curt Cook Oregon State University
Stewart G. Crawford AT&T Bell Laboratories
Nancy J. Currans Hewlett-Packard Corporation
Sallie Henry Virginia Tech
Rocco F. Iuorno IIT Research Institute
Data & Analysis Center for
Software (DACS)
Kenneth Magel
North Dakota State University
Thomas G. Moher University of Illinois at Chicago
Jai Navlakha Florida International University
Brian A. Nejmeh
AT&T Bell Laboratories
Linda M. Ott Michigan Technological University
T. M. Steinbock
SET Laboratories. Inc.
Several others participated in the review of papers; however, they failed to return a release in time to allow
their names to be included in this list. To all these people I extend a heartfelt thank you. I
hope they enjoy reading this special issue as much as I enjoyed organizing it.
Warren Harrison
Portland, Oregon April 21, 1987