empirical studies of assumptions that underlie software cost-estimation models

Empirical studies of assumptions that underlie software cost-estimation models

B A Kitchenham

The paper reviews some of the assumptions built into conventional cost models and identifies whether or not there is empirical evidence to support these assumptions. The results indicate that the assumption that there is a nonlinear relationship between size and effort is not supported, but the assumption of a nonlinear relationshi p between effort and duration is. Second, the assumption that a large number o f subjective productivity adjustment factors is necessary is not supported. In addition, it also appears that a large number of size adjustment factors are unnecessary. Third, the assumption that staff experience and/or staff capability are the most significant cost drivers (after allowing for the effect o f size) is not supported by the data available to the MERMAID project, but neither can it be confirmed from analysis of the COCOMO data set. Finally, the assumption that compression of schedule decreases productivity was not supported. In fact, none of the models o f schedule compression currently included in existing cost models was supported by the data.

software cost estimation, cost estimation, cost-estimation models, productivity

This paper reports work undertaken by the M E R M A I D project to investigate some of the assumptions that underlie software cost estimation. M E R M A I D is a joint collaborative project part-funded by the European Com- mission's Esprit programme. Its aims are to develop and automate improved methods of cost estimation. MER- MAID started in October 1988 and is due to finish in November 1992.

During the first two years of the project, the MER- MAID team has developed an approach to cost estimation based on the collection of local data and the generation of local cost-estimation models from those data ~. This approach has been incorporated into a prototype cost-estimation tool (called M E R M A I D Mark 1). The M E R M A I D approach to cost estimation was based on a review of the assumptions in current cost-estimation models and the formulation of an approach that the team believes overcomes the deficiencies of many of the current models. Therefore, in parallel with the specifica- tion of tools to support the approach, a data collection and analysis exercise has been undertaken with the aim

NCC Consultancy, National Computing Centre Ltd, Oxford Road, Manchester MI 7ED, UK

of evaluating the assumptions underlying the MER- MAID approach and the assumptions underlying current cost-estimation models.

This paper reports the results of analysing data currently available to the M E R M A I D project, including public domain data and data that cooperating companies have made available to the project.

The methods of analysis used in this paper are regression and correlation 2, analysis of variance 3, and principal component analysis 4. Pearson's product moment correlation coefficient is a number in the range - 1 to 1, which indicates the degree of association between two variables; 1 (or - 1 ) is a maximum association, while 0 corresponds to no association (i.e., independence). Regression analysis allows the functional form of a relationship between two variables to be determined. Analy- sis of variance investigates whether values of a variable differ among different groups (e.g., whether projects of a certain type were produced more productively than projects of other types). Principal component analysis investigates the underlying dimensionality of a set of variables to determine whether a large number of variables can be represented by a smaller set of variables. For example, if four different measures (lines of code, number of conditional statements, number of lexical tok- ens, and number of bytes of object code) were used to describe programs it would not be surprising if a principal component analysis indicated that the underlying dimensionality of the data was 1 not 4, because all the measures relate to the same concept, i.e., program size.

E F F O R T - S I Z E A N D E F F O R T - D U R A T I O N R E L A T I O N S H I P S

Most cost-estimation models assume that there is a nonlinear relationship between size and effort. The COCOMO modeP, for example, assumes that the relationship between effort and size takes the form:

effort = a * (size) b (1)

where effort is measured in man-months, size is measured in thousands of delivered source instructions, and a and b take on specified values, depending on the mode of development and which of the three COCOMO models is being used.

For 'Basic' COCOMO, b is assumed to be 1.05 for 'organic' mode, 1.12 for 'semi-detached' mode, and 1.20

Vol 34 No 4 April 1992 0950-5849/92/040211-08 © 1992 Butterworth-Heinemann Ltd 211


Table 1. Exponential term in effort-size models Table 2. Exponential term in effort-duration models

Data set b se(b) Significant Significant Projects difference difference

from 0 from 1

Data set b se(b) Significant Significant Projects difference difference

from 0 from 1

Bailey and Basili 6 0.951 0.068 yes no 19 Belady and Lehman 7 1.062 0.101 yes no 33 DeMarco s 0.716 0.230 yes no 17 Wingfield9 1.059 0.294 yes no 15 Kemerer, 0.856 0.177 yes no 15 Boehms:

All 1.108 0.085 yes no 63 Organic 0.833 0.184 yes no 23 Semi-detached 0.976 0.133 yes no 12 Embedded 1.070 0.104 yes no 28

Kitchenham and Taylor~2:

All 0.814 0.166 yes no 33 ICL 0.472 0.323 no - - 10 BT SX 1.202 0.300 yes no 11 BT SW 0.495 0.185 yes yes 12

MERMAID- 1: All 0.944 0.123 yes no 81 Environment 1 1.049 0.125 yes no 46 Environment 2 1.078 0.105 yes no 25 Environment 3 1.086 0.289 yes no 10

MERMAID-2: All 0.824 0.135 yes no 30 New 0.178 0.134 no - - 8 Ext 1.025 0.158 yes no 20

MERMAID-3 1.141 0.077 yes no 15

Bailey and Basili 6 0.167 0.076 yes yes 19 Belady and Lehman 7 0.42 0.046 yes no 33 DeMarco 8 0.252 0.080 yes no 17 Wingiield 9 1.016 0.625 no - - 15 Kemerer" 0.315 0.158 no - - 15 Boehms:

All 0.375 0.031 yes no 63 Organic 0.438 0.074 yes no 23 Semi-detached 0.458 0.054 yes yes 12 Embedded 0.400 0.057 yes no 28

Kitchenham and Taylor~2:

All 0.262 0.067 yes no 33 ICL 0.260 0.113 yes no 10 BT SX 0.266 0.097 yes no 11 BT SW 0.361 0.235 no - - 12

MERMAID- 1: All 0.466 0.068 yes no 81 Environment 1 0.692 0.105 yes yes 46 Environment 2 0.414 0.098 yes no 25 Environment 3 0.557 0.238 yes no 10

MERMAID-2: All 0.396 0.068 yes no 30 New 0.496 0.232 no - - 8 Ext 0.381 0.077 yes no 20

MERMAID-3 0.240 0.045 yes no 15

for ' e m b e d d e d ' mode . In fact for all modes and all three vers ions o f C O C O M O , the value o f b is assumed to be grea ter than 1. This implies tha t large projects are less p roduc t ive than small projects , i.e., there is a d i s economy o f scale.

M E R M A I D , however , takes the view tha t i f models are based on d a t a f rom a single env i ronment , a l inear mode l (i.e., one in which b is a ssumed to be equal to 1) is l ikely to be sufficient. To test this a s sumpt ion , no t only were pro jec t d a t a ob t a ined f rom three companies , bu t also exist ing d a t a sets were reviewed. Table 1 summar izes the empir ica l results f rom seven different publ i shed da t a sets: Bailey and Basili 6, Belady and L e h m a n 7, D e M a r c o s, and Wingf ie ld 9, all r epo r t ed by Con te et al.~°; Kemererl~; BoehmS; K i t c h e n h a m and Taylor~2; and f rom three d a t a sets ava i lab le to M E R M A I D .

Tab le 1 shows tha t mos t o f the da t a sets had an exponent ia l te rm b tha t cou ld no t be shown to be significantly different f rom 1. Therefore , they exhib i ted a l inear rela- t ionsh ip be tween effort and size (whether the size uni t was funct ion po in t s ~3 o r lines o f code). In teres t ingly enough, this is also t rue for the C O C O M O d a t a set. Thus it appea r s tha t a l inear mode l is usual ly sufficient to descr ibe the re la t ionsh ip between effort and size and, in add i t ion , much o f the de ta i led inves t iga t ion o f the rea- sons for ' e conomies ' o r ' d i seconomies ' o f scale (e.g., Banker and K e m e r e r TM) would seem to be misguided .

A n o t h e r a s sumpt ion bui l t in to m a n y cost mode l s is tha t a non l inea r mode l is needed to predic t d u r a t i o n f rom effort, i.e., the re la t ionship between effort and d u r a t i o n is o f the form:

duration = a * (effor0 b (2)

I t is usual ly assumed tha t the value o f b is between 0.3 and 0.4. The C O C O M O mode l suggests three values (0.32, 0.34, o r 0.38), depend ing on the mode o f development. A n impl ica t ion o f the Rayle igh curve mode l 15 is tha t b = 1/3. A value o f b significantly less than 1 reflects the fact tha t a n u m b e r o f people can work toge ther on a sof tware project .

Table 2 shows tha t mos t o f the da t a sets exhibi ted a significant non l inea r re la t ionship between effort and dura t ion , and in mos t cases the exponent ia l term was not significantly different f rom 0.333 (using p = 0.05 as the level o f significance).

C O S T A N D S I Z E D R I V E R S

Many cost-estimation models have a large number of adjustment factors built into them. These are called 'cost drivers'. They are means of assessing the importance of various factors that are believed to affect the amount of effort required to p roduce a p roduc t o f a given size (i.e., in fact they are p roduc t iv i ty ad jus tmen t factors) . They include factors such as ' p r o d u c t complex i ty ' , ' ana lys t abi l i ty ' , ' p r o g r a m m i n g - l a n g u a g e experience ' , ' re l iabi l i ty requi rements ' , etc. These fac tors are usual ly j udged on a scale o f the sort 'very high ' , 'h igh ' , ' no rma l ' , ' low' , 'very low' . F o r each fac tor included in the model , there is a numer ica l ad jus tmen t tha t mus t be appl ied to the effort es t imate , for each value o f the scale.

In C O C O M O , the ad jus tments are mult ipl icat ive , i.e.,

212 Information and Software Technology

very high reliability corresponds to a value of 1.4, so the effort estimate is multiplied by 1.4 if the product has very high reliability requirements.

The MERMAID project has a number of criticisms of this approach:

• In models such as COCOMO many adjustment factors are treated as if they are independent of one another, but there is evidence that they are not 16.

• The models assume that the factors they include are applicable in all organizations.

• The factors require a subjective evaluation, but it is difficult to ensure that different estimators make subjective assessments in the way that the model builder intended.

MERMAID assumes that within a single environment only a few factors will be important. This would lead to the ability to build local models that are much simpler than the general-purpose models.

To investigate whether the MERMAID approach was viable, data from two companies were studied. In addition, the original COCOMO database was analysed. The two MERMAID data sets are described in more detail in the Appendix.

PRODUCTIVITY A D J U S T M E N T F A C T O R S

For the MERMAID-1 data set, the relationship between productivity (i.e., size/effort) and the following factors was investigated using analysis of variance:

• team experience • project leader experience • year of completion • development environment. This factor took on three

different values: 1 corresponding to a traditional Cobol development; 2 corresponding to a traditional development environment upgraded to include code and report generators; and 3 corresponding to a 4GL (fourth-generation language) environment.

The only factor that showed a significant relationship with productivity was development environment. The average productivity (in raw function points per hour) for each development type is as follows:

• development environment 1 productivity = 0.063 (46 projects)



Development environment 3 is significantly more productive than either of the other two development environments (p < 0.001). There is no significant difference in productivity between development environments 1 and 2.

B A K I T C H E N H A M

For the MERMAID-2 data set, 27 projects had raw function-point data, effort data, and information about the levels of 21 'productivity' factors. The relationship between the productivity factors and productivity (raw function points per hour) was investigated using analysis of variance (based on three groups, a high factor level group with values 4 or 5, a medium factor level group with value 3, and a low factor level group with values 1 or 2).

The analysis of variance indicated that only two factors were related to productivity; programming-language level (p < 0.001) and working environment (p < 0.05).

Programming-language level depends on the language type, with assembly code as the lowest level, third-generation languages as the medium level, and 4GLs as the highest level. The average productivity for projects with programming-language levels of 1 or 2 was 0.05 function points per hour (three projects), for projects with programming-language levels of 3 was 0.05 (19 projects), and the average productivity for projects with programming-language levels of 4 or 5 was 0.16 (five projects).

Working environment depends on the amount of office space available. Projects with a poor working environment had an average productivity of 0.05 function points per hour (nine projects), projects with normal working environment also had an average productivity of 0.05 (nine projects), and projects with a good working environment had an average productivity of 0.13 (six projects).

These results confirm the view of the MERMAID project team that few adjustment factors have a significant effect on productivity within one environment. There are several points of interest. First, both data sets confirm that significant productivity improvements are associated with the use of 4GLs, but do not show support for productivity improvements due to general improvements in software engineering methods and tools. The MERMAID-I data set revealed no difference between development environments 1 and 2. The MER- MAID-2 data set showed no improvement as a result of high levels of tool use or extensive use of structured methods.

In addition, they cast doubt on the significance given to staff factors in cost-estimation models. Both data sets included information about personnel experience, but neither showed any evidence that more experienced personnel improved project productivity.

Importance of staff

The importance placed on staff characteristics in the COCOMO model is a major source of the belief that they are significant productivity factors in cost-estimation models. Therefore, in the light of the MERMAID results, the author undertook an analysis of the COCOMO data set. This involved performing an analysis of variance of productivity (i.e., KDSI (thousands of delivered source instructions) per month) for each of the following factors:

Vol 34 No 4 April 1992 213


Table 3. Productivity (KDSI per month) for variety of staff factors (COCOMO data se 0

Level Analyst Application Program- Virtual Language of factor ability experience mer machine experience

ability experience

High 0.27 (40) 0.28 (28) 0.30 (34) 0.36 (23) 0.40 (22) Normal 0.38 (16) 0.32 (29) 0.30 (20) 0.33 (18) 0.27 (27) Low 0.17 (7) 0.18 (6) 0.18 (9) 0.17(22) 0.15(14)

• analyst ability • applications experience • programmer ability • programming-language experience • virtual machine experience

The results are shown in Table 3. Analysis of variance confirmed that there is no statistically significant effect on productivity of the level of analyst ability, applications experience, or programmer ability.

There did appear to be a significant improvement in productivity between projects with staff of high virtual machine experience and those with staff of low virtual machine experience (p < 0.05). The same effect was observed with programming-language experience. How- ever, projects with staff of normal experience had similar productivity levels to projects with highly experienced staff. In addition, projects with teams that had high virtual machine experience were usually the projects with high programming-language experience, so the effects are not independent. Also it must be pointed out that projects with high values of programming-language experience and virtual machine experience were projects that scored high on 'use of tools'. The data suggest that staff with experience of the development environment were able to make good use of tools and, thus, achieved higher productivity than other staff.

In general, however, it must be concluded that empirical support for the belief that there are strong, predic- table productivity effects due to team ability and experience is rather weak. It is important to point out, however, that these results do not imply that there are no differences in individual capability. It is more likely that software managers organize software development teams in such a way that different staffcapabilities are balanced rather than accentuated (i.e., normal management pro- cesses ensure that differences are 'averaged out'). From the viewpoint of software cost estimation, it does not appear that team differences can be measured in such a way that they can be used to improve the precision of effort estimates.

Independence o f productivity factors

Using the MERMAID-2 data set, it was also possible to check whether the selected cost drivers were independent. Principal component analysis of the 21 productivity factor levels available for 28 of the projects indicated that seven principal components accounted for 76.2% of

the variability in the data and none of the smaller principal components accounted for more than 5% of the variability. This suggests that the productivity factors are not independent and supports the conclusion that Subra- manian and Breslawski drew from a similar analysis of the COCOMO cost drivers 16.

F U N C T I O N - P O I N T A D J U S T M E N T F A C T O R S

Cost drivers are used to adjust effort predictions in cost models, so it is not surprising that size models such as function points include "size' drivers. For function points 13, Albrecht and Gaffney recommended using 14 adjustment factors rated in terms of their 'Degree of influence', which varied from 0, meaning no influence (or not relevant), to 5, meaning very influential. (The 14 factors are listed in the Appendix.)

Using Albrecht's approach, raw function points are the sum of the weighted counts of the number of:

• simple, average, and complex external outputs • simple, average, and complex external inputs • simple, average, and complex external enquiries • simple, average, and complex external interfaces to

other systems • simple, average, and complex internal logical files

The weights are based on the complexity of the feature being counted. The assessment of complexity is based on the number of logical files accesses and/or data items affected by each feature.

The raw function-point count (RFP) is then adjusted on the basis of the degree of influence of each size driver as follows:

Adjusted function points = RFP*TCF (3)

where TCF (the technology adjustment factor) is calcu- lated as follows:

TCF = 0.65 + 0.01*Sum(DIi) (4)

Sum(DIi) is the sum of the degree of influence rating for each of the 14 factors.

The effect of using TCF as a multiplicative adjustment factor is to allow the raw function-point count to increase or decrease by a maximum of 35%.

Similar criticisms apply to size drivers as apply to cost drivers, with the additional problem that some of the 'size' drivers look very similar to 'cost' drivers. So if adjusted function points are used to assess product size and are then put into a cost model that applies cost drivers to its effort prediction there is a real danger of applying an adjustment for certain factors twice.

Using the MERMAID-1 data set, it was possible to investigate whether the use of the adjustment factor actually improved the observed relationship between size and effort. The Pearson correlation coefficient between raw function points and effort was 0.706 and between the


25000 f • ~" 20000 ==

o ~ 15000 1• ~1 1 1:= 10000 [- o ~ •

'" t- 5000 / V l W " , ~ =r ~ t ,

0 0 200 400 600 800 1000 1200 Raw function points

Figure 1. Effort and raw function points for MERMAID-1 data set

F .-,t o

o :I= UJ

25000 I • 20000 • 15000 • • • • ,o o /

•

I ~ i ~ d ' • P I • I I I 0 0 200 400 600 800 1000 1200 Adjusted function points

Figure 2. Effort versus adjusted function points for MER- MAID-1 data set

adjusted function points and effort was 0.738. Thus the adjustment factors made a small (but not statistically significant) improvement to the relationship between size and effort. The relationship between effort and raw function points is shown in Figure 1 and the relationship between effort and adjusted function points in Figure 2.

The MERMAID-2 data set comprised new projects and enhancement projects, and significant relationships between size and effort were only observed for the enhancement projects. For the enhancement projects, the correlation coefficient between raw function points and effort was 0.93 (for 16 projects; note that the original data for two projects are not available). The correlation coefficient between adjusted function points and effort was 0.93 (for 18 projects). Thus the function-point adjustment factors had no effect on the relationship between size and effort.

These results support those of an earlier study by Kemerer 1]. Thus analysis of three independent data sets suggests that for a single environment the use of the adjustment factors is not necessary.

Using the MERMAID-2 data set, it was possible to investigate whether any specific technology factor was important.

The adjustment factors are intended to increase the assessment of product size to cater for the additional development effort needed for projects with additional requirements. Thus the adjustment factors are similar to productivity factors, such that the productivity values

B A K I T C H E N H A M

based on raw function points should be high for products with low levels of the adjustment factors and low for products with high levels of the adjustment factors.

The importance of the individual adjustment factors was assessed by investigating the relationship between the technology factor level of a project and its productivity (i.e., raw function points per hour). This was done using analysis of variance. The analysis of variance was based on dividing the projects into three groups - - those with low values of the adjustment factor (i.e., a value of 0 or 1), those with moderate values (i.e., 2 or 3), and those with high values (i.e., 4 or 5).

The analysis of variance indicated that only three of the adjustment factors were related to productivity:

• factor 1, 'data/control information sent/received over communication lines' (p < 0.05)

• factor 6, 'online data entry and control functions' (p < 0.05)

• factor 8, 'online update for internal files' (p < 0.01)

Independence of function-point adjustment factors

One of the problems with cost drivers is that they are treated as independent when they are correlated, but as far as the author knows noone has checked whether the same problem affects size drivers. However, the MER- MAID-2 data set included the values of the 14 function- point adjustment factors for 28 projects. It was therefore possible to investigate the underlying dimensionality of the adjustment factors using principal component analysis. The six largest principal components accounted for 85.5% of the variability of the data, and none of the remaining components accounted for more than 5% of the variability of the data. This indicates that the 14 technology factors could be represented by six new factors and confirms that the original factors were not independent.

Thus the results of these studies suggest that size drivers are not very effective and confirm the view of the MERMAID project team that simple models are likely to be sufficient within a single environment.

T I M E S C A L E C O M P R E S S I O N

There are two schools of thought about what happens if there is an attempt to reduce product schedules:

• The COCOMO modeP suggests that the decreasing of timescales increases effort, and the increasing of timescales also increases effort.

• The implications of Putnam's Rayleigh curve model ]5 are that the decreasing of timescales increases effort, whereas the increasing of timescales decreases effort.

In a recent paper, Jeffery 17 suggested that both views were incorrect. He investigated the relationship between schedule compression and effort reduction by calculating the extent of schedule compression and effort compres-

Vol 34 No 4 April 1992 215


'rE O

"o d)

.E

O

3.0 2 . 8 - -

2 . 6 - 2 . 4 - 2 . 2 - 2 . 0 - 1 . 8 - 1 . 6 - 1 . 4 - 1 . 2 - [] t.o ~ 0 . 8 - E] 0.6 - [] 0.4 0.2 0.0 I I

0.0 0.5

E B B

[]

mE] E] []

®2° m ~

I J .0 1.5 2.0 2.5 3.0

o

o

<

4.0

3 . 5 -

3 . 0 -

2 . 5 -

2 . 0 -

1 . 5 -

1.0

0 . 5 -

0.0 0.0

[ ] [ ]

O o • 1 []

l i ~E []

-, [ ] I

O O []

O0

u

[ ] [ ]

n

[ ] [ ]

0.5 1.0 1.5 2.0 2.5 3.0

Actual duration/estimated duration Actual duration/estimated duration

Figure 3. Duration and effort compression for MER- MAID-1 data set

Figure 5. Effort and duration compression for COCOMO data set

3.0

.9 2.s

"10 "o ¢ 2.0

E 1.5

.g 1.0

-~ 0.5 <

0.0 0.02

[ ]

¢°+ .-"2. [] N~I []

°N$ + 0 E

[]

I I 0.04 0.06

B ra m

ra B B,.ta [] ~l Irl

e e ~ []

B I I I

0.08 0.10 0.12 0.14

Productivity (raw function points/hour)

Figure 4. Duration compression and productivity for MERM.4ID-1 data set

sion for a number of projects and then constructing a scatterplot of effort compression against schedule compression. He measured effort compression as:

eff_comp = (actual effort) / (estimated effort) (5)

and schedule compression as:

sch_comp = (actual duration) / (estimated duration) (6)

Thus a value of eft comp less than 1 identified a project that was completed with less than the expected effort and so was 'effort compressed', whereas a value greater than 1 identified a project that was completed with more than the expected amount of effort and so was 'effort extended'. Similarly, projects with sch_comp less than 1 were 'schedule compressed' and projects with sch_comp greater than 1 were 'schedule relaxed'.

Jeffery's scatterplot showed that there were schedule- compressed projects that were also effort compressed, which contradicted both Putnam's and Boehm's models.

When a similar analysis was performed on the MER- MAID-2 data set the scatterplot shown in Figure 3 was produced. This scatterplot shows a similar pattern to the one Jeffery found in his data set. In addition, Figure 4

C .9

"o "O

._E

_8

"o

<

[ ] 3 -

-B 2 5 q

~ C ~ [ i n O 0 [ ] [ ]

~ o []

O 0 [ ]

0 I I I 0.0 0.2 0.4 0.6 0.8

[ ] [ ] [] []

I I 1.0 1.2 1.4

Productivity (KDSI per month)

Figure 6. Duration compression and productivity for COCOMO data set

shows schedule compression plotted against productivity. This scatterplot has a small negative correlation ( -0 .27 , p < 0.05), which suggests mild support for the view that the compressing of timescales increases productivity.

It is even more interesting to perform a similar analysis on the COCOMO data set. Figure 5 shows the scatterplot of effort compression against schedule compression, and Figure 6 shows schedule compression plotted against productivity. In this case, the increase in productivity as schedule compression increases is even more marked, with a correlation of - 0 . 5 4 (p < 0.001).

Boehm based his assessment of the effects of schedule compression on a completely different analysis. Each of the projects in his data set was ranked on a five-point schedule compression scale (SCED). The average value for each scale point was as in Table 4. Boehm did not have any examples of very slack timescale projects in his data set. It can be seen that the average productivity levels for each group follow the pattern he suggests. However, the differences between the group means are not statistically significant.

There is clearly a difference between the impact of

2 1 6 Information and Software Technology

Table 4. Boehm's five-point schedule compression scale for project data set

Scale point Productivity Number of (KDSI per month) projects

Severe compression 0.147 8 Compressed 0.205 10 Normal 0.340 34 Slack 0.299 11

schedule compression based on the SCED productivity factor and the results presented in Figure 6. This is because the projects that Boehm classified as under schedule constraints were not the same set of projects that completed in an unexpectedly short time (although there was some limited overlap). This tends to indicate that it is possible to reduce schedule, but not if you want to!

This interpretation is supported by a study performed by Jeffery and Lawrence lg (reported by DeMarco and Lister'9). Jeffery and Lawrence ~8 related productivity levels to the person/group responsible for setting project targets. They found that projects in which the supervisor set the targets were the least productive. When the pro- grammers set the targets, projects were more productive. When a third party set the targets, projects were even more productive. However, the most productive projects were those that had no official targets set at all!

D I S C U S S I O N

The results of these analyses provide general support for the MERMAID project approach to software cost estimation. In particular, it seems that fairly simple models can be used within a single environment.

The results also suggest that the software engineering community needs to review a large number of its basic assumptions about project costs. For example:

• It has been generally assumed that team experience (or team capability) is a major influence on productivity. However, the results from the analysis of the MER- MAID-1 and MERMAID-2 data sets did not confirm this assumption. Both data sets included information about project leader experience and team experience, and MERMAID-2 also included information about team experience with tools. None of these factors had a statistically significant effect on productivity. This does not imply that there are no differences between individual members of staff; it does imply that assessments of team experience/capability are not likely to be useful adjustment factors in software cost models.

• The only consistency among productivity factors was that both the MERMAID-1 and MERMAID-2 data sets confirmed that 4GLs were associated with greater productivity than lower-level languages. However, the improvements were in the order of 300-400%, which are much greater than the adjustments for tool use usually included in cost-estimation models. In addition, other tool and method improvements were not associated with improvements in productivity.

B A KITCHENHAM

• All the models of the effects of schedule compression on effort and productivity appear to be invalid. The results to date seem to suggest that it is possible to compress duration and achieve high productivity, but not if you really want to! (Most readers will recognise this as confirmation that the only truly universal law of human intensive systems is Murphy's Law.)

The MERMAID project intends to address the issue of resource modelling (i.e., the interaction between effort, duration, and staffing levels over time) in its next phase. The evidence to date suggests that the MERMAID project team has given itself a substantial problem to tackle!

A C K N O W L E D G E M E N T S

The MERMAID project is part-funded by the European Commission as project P2046 of the Esprit programme. The MERMAID project partners are Volmac Software Groep N.V., City University, Data Management Spa, NCC Ltd, University College Cork, and Valtion Teknil- linen Tutkimuskeskus.

The following organizations (listed in alphabetical order) are currently working with the MERMAID project to collect/provide software project data: Anglian Water, DMR, GPT, Lloyds Bank, NASA and Univer- sity of Maryland Software Engineering Laboratory, National Westminster Bank, and SD-Scicon.

R E F E R E N C E S

1 Kok, P, Kitchenham, B A and Kirakowski, J 'The MER- MAID approach to software cost estimation' in Proc. Esprit Technical Week (1990)

2 Draper, N R and Smith, 14 Applied regression analysis (2nd ed) John Wiley (1981)

3 Coehran, W G and Cox, G Experimental designs (2nd ed) John Wiley (1952)

4 Morrisoa, D F Multivariate statistical methods McGraw- Hill (1967)

5 Boehm, B W Software engineering economics Prentice Hall (1981)

6 Bailey, J W and Basili, V R 'A meta-model for software development resource expenditure' in Proc. Fifth Int. Conf. Software Engineering IEEE Computer Society Press (1981)

7 Belady, L A and Lehman, M M 'The characteristics of large systems' in Weger P (ell) Research directions in software technology MIT Press (1979)

8 DeMareo, T Yourdon 78-80 project survey Yourdon, Inc., New York, NY, USA

9 Wingtield, C G 'USAACSC experience with SLIM' Report 1,4 WAR 360-5 USA Army Institute for Research in Man- agement Information and Computer Science, Atlanta, GA, USA (1982)

10 Conte, S D, Dansmore, H E and Shen, V Y Software engineering metrics and models Benjamin/Cummings (1987)

11 Kemerer, C F 'An empirical validation of software cost estimation models' Commun. ACM Vol 30 No 5 (1987) pp 416-429

12 Kitehenham, B A and Taylor, N R 'Software project development cost estimation' J. Syst. Soft. Vol 5 No 4 (1985) pp 267-278

13 Albrecht, A J and Galfney, J 'Software function, source lines of code and development effort prediction' IEEE Trans. Soft. Eng. Vol 9 No 6 (1983) pp 639-648

Vol 34 No 4 April 1992 217


14 Banker, R D and Kemerer, C F 'Scale economies in new software development' IEEE Trans. Soft. Eng. Vol 15 No 10 (1989)

15 Putnam, L H 'A general empirical solution to the macro software sizing and estimation problem' IEEE Trans. Soft. Eng. Vol 4 No 4 (1978)

16 Subramanian, G H and Breslawski, S 'A case for dimensionality reduction in software development effort estimates' TR-89-02 Dept of Computer and Information Science, Temple University, Philadelphia, PA, USA (1989)

17 Jeffery, D R 'Time-sensitive cost models in commercial MIS environments' IEEE Trans. Soft. Eng. Vol 13 No 7 (1987) pp 852-859

18 Jeffery, D R and Lawrence, M J 'Managing programming productivity' J. Syst. Soft. Vol 5 No 1 (1985)

19 DeMarco, T and Lister, T Peopleware Dorset House (1987)

APPENDIX: DESCRIPTION OF TWO MERMAID DATA SETS The MERMAID-1 data set comprises data on 81 projects and includes the following data:

• the experience of the project team (years) • the experience of the project manager (years) • year of project completion • project duration (months) • project effort (hours) • number of transactions • number of entities • raw function points (measured as the sum of the number of

transactions and the number of entities, which is not the conventional Albrecht approach)

• adjusted function points (i.e., function points adjusted using a technical complexity factor)

• type of development environment used by project. This factor took on three different values, 1 corresponding to a traditional Cobol development, 2 corresponding to a traditional development environment upgraded to include code and report generators, and 3 corresponding to a 4GL environment.

The MERMAID-2 data set comprised 30 projects, for which a number of different data items were collected, including:

• size measured in raw and adjusted function points (using Albrecht's counting methods)

• effort (hours) for the project as a whole • duration (months) for the project as a whole • the value of the following 14 function-point adjustment

factors (for 28 of the projects) measured on a six-point scale of 'degree of influence' from 0 (not applicable) to 5 (very influential):

(1) data/control information sent/received over commu- nications lines

(2) distributed data or processing functions are characteristic

(3) application performance objectives stated/approved by user

(4) heavily used operational configuration involving spe- cification design

(5) high transaction rates influenced design (6) online data entry and control functions (7) online functions designed for user efficiency (8) online update for internal files (9) complex processing is characteristic

(10) application specifically designed to be usable in other applications

(11) conversion and installation ease is characteristic (12) operational ease characteristic (Start-up, Back-up,

Recovery) (13) designed to be installed at multiple sites (14) designed to facilitate change

• the value of 21 of 'productivity' factors. These were measured on a five-point scale, where 1 corresponds to a very strong negative influence, 3 corresponds to no special influence, and 5 corresponds to a very strong positive influence. The set of factors comprised:

(1) user involvement (2) user commitment (3) user experience with application (4) staff turnover (5) computer resource availability (6) system response time (7) development time constraints (8) staff constraints (9) experience of team

(10) requirements stability (11) system familiarity (12) problem complexity (13) complexity of user interface (14) structured methods use (15) familiarity with structured methods (16) tools/software use (17) team experience with tools (18) programming-language level (19) familiarity with programming language (20) project management experience of project leader (21) working environment


empirical studies of assumptions that underlie software cost-estimation models

Documents