are deeply nested conditionals less readable?

Are Deeply Nested Conditionals Less Readable?

There is some debate over the effect of using deeply nested control structures upon programmer comprehension. In order to test the effect of deeply nested IF-THEN- ELSE statements, we split 148 computer science students of varing backgrounds into two groups. One group received a listing of a program that made excessive use of deeply nested control structures. The other group received the listing of a fun~tiona~fy equivalent program that did not make use of deeply nested IF-THEN-ELSE& Both groups answered the same list of questions about the program they were assigned. The results indicate no significant difference in the average performance on the questions between the two groups.

INTRODUCTION

There has been a considerable amount of interest in conditional statement nesting. The consensus seems to be that the more deeply nested the IF-THEN-ELSEs, the less readable the program.

Kernighan and Pfaugher f I] recommend never using an IF inside a THEN (they call such a structure a bushy decision tree) and suggest using an IF . . . ELSEIF. . . ELSEIF. . . ELSE structure (which they call a skinny tree) for multiway branches. They reason that understanding a sequence of nested decisions requires the use of a short-term memory pushdown stack to keep track of the conditions. Thus, excessive nesting of decisions, would seem to overload the finite capacity of the stack. The stack does not appear to play such an im~rtant role in using nonnested decisions, however.

Based on similar reasoning, several computer program complexity metrics [ 2,3] include levels of nesting of conditional statements as an important factor. The

Address correspondence to Warren Harrison, School of Busi- ness, University of Portland, Portland, OR 97203.

more deeply nested a predicate, the more it contributes to the complexity of a program.

Many articles have suggested schemes for reducing the IF-TIIEN-ELSE nesting depth in programs. Some have suggested using a statement such as a CASE with an iterative DO to reduce nesting depth [4]. Others have argued for an extension to current programing languages such as the SELECT statement ]5,6]. They argure that indentation is little help for deeply nested IF-THEN-ELSEs because it leaves little room for the statements and there are many statements between a THEN and its matching ELSE.

However, an experiment by Richards, Green, and Manton [7] showed very little difference between nested bushy decision trees and skinny nested decision trees. Their implementation of the skinny decision tree omitted the ELSEs and used a sequence of IF statements which made it ciose to a CASE statement, Their subjects were pro~amme~ with I or more years of pru grammin~ experience, The subjects performed slightly better on forward reasoning questions (this task requires identifying the output of the program given the input} for a nested bushy tree version of a program and slightly worse on backward reasoning questions (where the task requires specifying the input to the program given its output).

It seems that avoiding deeply nested conditional structures and bushy decision trees are two widely ac- cepted rules of programming style. However, at least one experiment has shown no difference in comprehension between deeply nested and nonnested versions of the same program ]gJ.

In this paper, we report on the results of a series of experiments undertaken to further explain the nested IF-THEN-ELSE paradox. In the experiment of Rich- ards et al. [7] the IF-THEN-ELSEs in the bushy programs were only nested three levels deep. We thought

The Journal of Systems and Software 6,335-341 (19g6) OElsevier Science Publishing Co., Inc., 1986

335 0164-1212/86,‘%3.50

336 W. Harrison and C. Cook

that if a person does use a mental pushdown stack in comprehending programs with nested conditionals, then a greater nesting depth would be needed to overload the mental stack, and thus to indicate a difference in understandability. In several preliminary experiments we used a program that computed the cost of a long distance phone call. The bushy version of the program had a nesting depth of six, and the shallow verison was a sequence of IF statements. There was no significant difference in the performance of the subjects on either forward or backward reasoning questions.

From these preliminary experiments, we concluded that to test the pushdown overload hypothesis an even- greater nesting depth was needed, and hence a larger program. We felt that a nesting depth of 10 should be sufficient. With this nesting depth, we felt that a person should experience a short-term memory stack overload. The bushy tree version of the program used in our experiment was nested 10 levels deep and consisted of 130 lines of code. The shallow nested version consisted of several occurrences of a sequence of IF statements inside an IF statement (nesting depth of two) and in- volved 100 lines of code.

Our results suggest that the mental pushdown hypothesis requires additional study. It also appears that programming style rules based on this hypothesis should be re-examined.

DESCRIPTION OF THE EXPERIMENT

The experiment had the goal of comparing the effect of deeply nested and shallow nested conditional statements on program comprehension.

Table 1. Average Characteristics of Subjects Participating in the Study

Programming Courses

Programs Written GPA N

cs212 SD” ClS315 SD cs317 SD cs319 SD CS420 SD cs517 SD Overall SD

2.75 (.775) 4.84

(1.44) 5.54

(1.72) 9.27

(2.74) 9.89

(4.43) 9.00

(4.10) 6.86

(3.63)

29.75 (31.01) 34.36

(24.84) 34.80

(22.82) 43.55

(20.68) 58.70

(39.03) 50.50

(40.31) 41.11

(29.28)

3.05 16 (.434)

3.26 31 (.319) 3.17 35

(.314) 3.35 33

(.366) 3.21 27

(.430) 3.62 6

(.075) 3.23 148

(.370)

‘Standard deviation.

1. How many syllables will this function calculate are in the word teaches?

2. Given the word chick, list one way that a single vowel (a, e, i, o, u) may be added to the end that will cause the syllable count as calculated by the function to remain at 1 (note that the result need not be a valid word).

3. How many syllables will this function calculate are in the word knife?

4. Given the word crouton list one way that the vowels (a, e, i, o, u) may be used to replace the letter t and still retain the same syllable count as calculated by the function (note that the result need not be a valid word).

5. How many syllables will this function calculate are in the word mayonaise?

6. Show one way that the letters in the word laughter can be rear- ranged so that the function will calculate a syllable count of 3 instead of 2 (note that the result need not be a valid word).

7. Give one combination of any four letters (a-z) for which the function will return a syllable count of zero.

Figure 1. Questions asked of the subjects about the syllable counting function.

Subjects

The subjects were 148 Oregon State University and University of Oregon students enrolled in six different computer science courses, ranging from a compiler writing course to a second quarter Pascal programming course.

Materials

The first page of the test booklets was a background questionnaire and instruction sheet. The second page was a list of seven questions. This was followed by a two-page Pascal program listing. The background questionnaire was filled out by the subjects before the ex- perimetal task was performed. It asked them to esti- mate the number of previous programming courses taken, the number of programs written, their GPA, and class standing. Table 1 describes some of the most important characteristics [9] of each group, based on the questionnaire. As can be seen, this group of subjects provided a broad spectrum of experience and ability.

Of the seven questions, three were forward reasoning and four were backward reasoning. We defined a forward reasoning question as one that requires the subject to work forward from some given input, and derive the program output [ 111. A backward reasoning question is defined as one that requires the subject to work back- wards from some given program output, and derive the corresponding input [ 111. The questions are shown in Figure 1. We classified questions 1, 3, and 5 as forward reasoning and questions 2, 4, 6, and 7 as backward reasoning questions.

Each subject received one version of a listing of a Pascal function that calculated the number of syllables

337

(*******C*******tt**t*******~************,

(*function sylcnt will count the number*) (*of syllables in a word of length len *j (t**********t***t***********************j function sylcntfword:string;len:integer):integer; var first:char; (*"current" symbdl*j

second:char; (*following symbol*) thfrd:char; (*symbol following second') prev:char; (*symbol before current*) scount:integer; (*number of syllables') i:integer; (*utility variable*) vowels:set of char; (*the vowels*)

iuy,aouy,ey,mnt.rtlc.lstc,lstcn.wbl,hr:set of char; begin scount:=O; vowels:=l'a' 'e' 'i'.'o','u','y'l; iuy:=C'i' 'u: 'y:]; aouy:=C'a','o','u','y'l;

'n'l;

for i:=l to len do begin

first:=word[i]; second:=wordCi+ll; third:=word[i+Z]; if i>l then

prev:=word[i-11 else

prev:=' '; if (first in vowels)and(second in vowels.1 then

begin if(first='a'jand

not(second in iuyjand ((secondo'e')or(iol))then

begin scount:=scount+l;

end; if(first='o'jand notcsecond in aouyjthen


end: if(first='u'jand not(second in eyjthen

begin scount:=scount+l:

end: ff(first='i'jand

in a word. The two versions of the program (see Figures 2 and 3) were identical implementations of the syllable counting algorithm described in [lo]. In one version, the conditional statements were nested 10 levels deep while in the other version, which used a sequence of IF statements, the conditionals were only nested two levels deep. Both were functionally equivalent as the shallow nested version was created from the deeply nested version by substituting compound predicates for levels of nesting. The deeply nested version was 130 lines long with 28 IF statements and 28 simple predicates. The shallow nested version was 100 lines long with 14 IF statements and 36 predicates.

Procedure

Test booklets were distributed randomly in each class. Approximately half the subjects received the deeply

(second='e'jand ((third in mntjor ((third='r'jand(prev in rtlcjjjrnen


end; if(first='i')and

(second='a'jand nottprev in lstcjthen


end; if(first='i'jand

if

end; if(first

begin

(second='o'jand notcprev in lstcnjthen


end: first in ey then begin

scount:=scount+l; end;

in vowelsjand nottsecond in vowelsjthen


end; (*for*) len >2 then begin

if(word[len]='e')and not(word[len-11 in hrjand (not(wordClen-11 in wmbljor (word[len-21 in vowelsjjthen

begin scount:=scount-I;

end; end; len > 3 then begin

if(wordClenl='s'jand (wordrlen-ll='e'jand not(wordClen-21 in hrjand (not(wordClen-21 in wmbljor

(uord[len-31 in vowelsjjthen begin

scount:=scount-1; end;

end:

sylcnt:=scount; end; (*sylcnt*j

Figure 2. The nonnested version of the program used in the experiment.

nested version and half the subjects received the shallow nested version. The characteristics of each group are shown in Table 2. After completing the background questionnaire, the subjects were given 30 minutes to an- swer the questions. In the one class (CS420), one question (#2) was thrown out due to an error discovered during administration. While this error was corrected for subsequent administrations, the subjects from this class were scored on six questions instead of seven.

As a measure of subject performance, we used the percentage of correct answers for

1. Forward reasoning questions 2. Backward reasoning questions 3. All questions

338 W. Harrison and C. Cook

Table 2. Average Characteristics of the Subjects in the Table 4. Average Percent Correct for Subjects in Each Two Groups Group Using Backward Reasoning Questions

Programming Programs Courses Written GPA N

Nested 6.55 36.6 3.24 69 SD” (3.45) (25.4) (.38) Nonnested 7.13 45.0 3.23 79 SD (3.77) (32.0) C.37)

T-Scores - .96 - 1.76 .15

“Standard deviation.

This provided us with three different measures of subject performance. In evaluating the subject’s answers, no partial credit was given-each question was graded as correct or incorrect. The performance of the deeply nested and shallow nested groups is shown in Tables 3, 4, and 5 along with the associated t-score [ 121 for the average percentage correct in each group for statistically significant differences. As can be seen from the table, there is no statistically significant difference between the two aggregate groups. However, several subgroups seem to exhibit a significant difference in av- erages scores (viz., CIS3 15 and CS317 for backward reasoning).

cs212 SD” CIS315 SD es317 SD es319 SD CS420 SD es517 SD Overall SD

25.0 (35.4) 42.3

(21.4) 59.7

(32.2)

(Z::) 66.7

(27.2) 66.7

(57.7) 54.0

(33.0)

8 40.6 8 -1.11 (18.6)

13 63.9 I8 - 2.09 (32.3)

18 38.2 17 2.05 (29.5)

14 67.1 19 -0.59 (30.1)

13 66.7 14 0.00 (29.2)

3 41.7 3 0.63 (38.2)

69 56.0 79 -0.42 (32.0)

“Standard deviation.

DISCUSSION OF THE RESULTS

The average scores between groups for forward and backward reasoning questions, as well as all questions, were not significantly different at the 0.05 level. This suggests that the degree of understandability of each version of the program is not significantly different.

There are several possible explanations for this result.

To control for the possibility that the classification 1. It is possible that there is some difference in under- by course did not distinguish sufficiently the student’s standability between the two versions, but that this level of experience, we broke the subjects into three difference exists only between subjects of some given groups, based on the number of previous programming level of sophistication. It is possible that until this courses. The associated statistics are shown in Table 6. “threshold” level of expertise is reached, the addi- As can be seen, no statistically significant difference tional complexity contributed by the control struc- was found between nested and nonnested decisions ture arrangement is not significant. If this were the using this approach, either. case, however, then a difference in performance

Tabie 3. Average Percent Correct for Subjects in Each Table 5. Average Percent Correct for Subjects in Each Group Using Forward Reasoning Questions Group Using AI1 Questions

Nested N

es212 16.7 8 SD” (25.2) CIS315 48.7 13

&7 y;.;) (25:6) 18

SD cs319 42.9 14 SD (20.4) CS420 69.2 13 SD (31.8) css17 55.6 3 SD (38.3) Overall 50.0 69 SD (30.0)

‘Standard deviation.

Norrnested N

37.5 8 (21.4) 46.3 18

(32.6) 39.2 17

(35.8) 52.6 19

(32.0) 47.6 14

(28.4) 66.7 3

(33.3) 46.0 79

(31.0)

T-Score

-1.78

.21

1.56

-1.00

1.87

-0.38

0.66

Nested N Nonnested N T-Score

CS212 SD” CIS3I.5 SD cs317 SD es319 SD CS420 SD cs517 SD Overall SD

Nested N Nonnested N

21.4 8 39.3 8 (24.1) (16.6) 45.1 13 56.4 18

(20.1) (27.5) 57.9 18 38.7 17

(26.1) (23.6) 53.1 14 60.9 19

(22.7) (26.4) 68.0 13 57.1 14

(24.0) (22.4) 61.9 3 52.4 3

(43.6) (36.0) 52.0 69 52.0 19

(27.0) (26.0)

T-Score

- 1.72

- 1.24

2.29

-0.89

1.21

0.29

0.10

%andard deviation.

339

Table 6. Average Percent Correct for Subjects in Each Group, by Number of Programming Courses Taken, for Forward, Backward, and All Questions

Nested N Nonnested N T-Score

<Five Programming Courses

Foward Reasoning Questions 36.1 Standard Deviation (34.0) Backward Reasoning Questions 43.7 Standard Deviation (34.0)

All Questions Standard Deviation <Nine, but >4 Programming Courses Forward Reasoning Questions Standard Deviation Backward Reasoning Questions Standard Deviation All Questions Standard Deviation >8 Programming Courses Forward Reasoning Questions Standard Deviation Backward Reasoning Questions Standard Deviation All Questions Standard Deviation

40.6 (30.0)

54.0 (25.8)

54.0 (31.7) 55.0

(24.1)

55.0 (30.0) 66.0

(31.0) 62.0

(26.9)

20

20

20

34

34

34

15

15

15

36.8 (27.0)

53.9 (26.7)

46.6 (20.7)

45.0 (32.4)

54.0 (32.7) 50.0

(26.9)

54.0 (31.7) 60.0

(33.9) 57.0

(27.5)

19

19

19

35 1.300

35 .020

35 .720

25 ,470

25 ,570

25 .470

- ,020

,303

-

,730

would be noticeable in Table 6, where the subjects were grouped by number of previous programming courses. This assumes, of course, that the number of programming courses taken is a suitable measure of “sophistication.” It is possible that there really is a difference but that a more complex example is needed to illustrate it. This would imply that the “mental stack” is larger than previously thought. However, the materials used in class seem to be close to the limits of maxi- mum complexity for a typical program module in a high-level program which can fit on three or fewer pages. Thus, while the lack of differing performance may be due to an insufficiently complex example, it would be extremely difficult to test in an experimen- tal setting. In fact, such an effect may not even be pertinent to most acutal programs written. As indicated by the results, there may be no difference in comprehension between the versions. This is the same conclusion reached by Green [8]. The same set of predicates have to be stacked for each version. Green believes the only difference is that for the shallow nested version they arrive all together (or in groups) while in the deeply nested version they arrive one (or a few) at a time. Thus, equal amounts of stacking occur, regardless of what format (viz. nested or compound) the predicates are presented in. If this is the case, then no differences between the

two versions would be apparent. This seems to be the most reasonable explanation.

CONCLUSIONS

Based on the results of our experiments, we must con- clude that there is no difference between using deeply nested control structures and using a series of compound conditionals to avoid control structure nesting. This would suggest that much of the “folklore” sur- rounding programming style should be reexamined.

Before such programming rules can be dismissed, however, more experimentation using a larger variety of materials must be performed.

ACKNOWLEDGMENTS

We would like to express our appreciation to those in- dividuals who allowed us to use their students for our experiments. They include Teresa Harrison, Eissa Ibrim, and David Sandberg at Oregon State University and Jean Rogers at the University of Oregon. For their assistance, we are grateful.

REFERENCES

1. B. W. Kernighan and P.J. Plauger, Programming Style: Examples and Counterexample, ACM Computing Sur- veys pp. 303-3 19 (Dee 1974).

(**r***t****t****t*ttt*tt*************~~***)

(*function sylcnt will count the number*) (*of syllables in a word of length len *I (******tt**t**t*******t*tt*tttt***************)

function sylcnt(word:string;len:integer):integer; var first:char: (*"current" svmbol*)

second:char; i*following symbol*) third:char; (*symbol following second*) prev:char; (*symbol before "current"*) scount:integer; [*number of syllables+) i:integer; (*utility variable*) vowels:set of char; (*the vowels*) iuy,aouy,ey,mnt,rtlc.lstc,lstcn,mnbl,hr:set of

begin scaunt:=O: vowels:=C'a' 'e' 'i','o','u','y'l; iuy:=['i' 'u: 'Y:l; aouy:=C'a'.'o','u'.'y'l; ey:=l'e'.'v'l: mit:=[‘m’ in1 ‘t’].

,'I rtlc:=C'r',:t'.'l c'l; lstc:=['l' s','t':'c'l; lstcn:=['l' 'S' 't' 'c'.'n'l; wrnbl:=['w','m','b'.'l']; hr:=['h','r'l; for i:=l to len do

begin first:=wordCil; second:.word[i+ll; third:=word[i+Z!: if i>l then

prev:=wordCi-11 else

prev:=' '; if first in vowels then

begin if second in vowels then

char;

begin if first='a' then

begin if not(second in fuylthen

begin if(secondo'e')or(fol)then

begin

scount:=scount*l; end;

end: end

else if first='o' then begin

if not(second in aouyjthen begin


end else if first='u' then

begin if nottsecond in ey) then


end; end

else if first='i' then begin

if seconds'e' then begin

if(third in mnt)or (;:;;;d='r')and(prev in rtlc))


end else if second='a' then

begin if not(prev in lstclthen

begin scount:-scount+l;

end; end

els;e;:nsecond='o' then

if not(prev in lstcnlthen begin

scount:=scount+l; end:

then

end end

else if first in ey then begin


end else if not(second in vowels) then


end; end; (*first in vowels*)

end; (*for*) if len >2 then

begin if word[len]='e' then

begin if not(word[len-11 In hrlthen

begin if not(word[len-11 in wmbl)or

(wordclen-21 in vowels)then begin

scount:=scount-I; end

end end

end; if len > 3 then

begin if word[len]='s' then

begin if wordrlen-lI='e' then

begin if not(wordClen-21 in hrlthen

begin if not(wordClen-21 in wmbl)or

fword[len-31 in vowelslthen begin

scount:=scount-I; end

end end

end end;

sylcnt:=scount; end; (*sylcnt*)

Figure 3. Nested version of the program used in the experiment.

2. W. Harrison and K. Magel, A Complexity Measure Based on Nesting Level, ACM SIGPLAN Notices pp. 63-74 (March 1981).

3.

4.

5.

P. Piwowarski, A Nesting Level Complexity Measure, ACM SIGPLAN Notices pp. 44-50 (Sept 1982). G. Hill, An Improvement Over Deeply Nested IF- THEN-ELSE Control Structures, ACM SIGPLAN No- tices pp. 52-56 (Aug 1982). G. Weinberg, D. Geller, and T. Plum, IF-THEN-ELSE Considered Harmful, ACM SIGPLAN Notices pp. 34- 43 (Aug 1975).

6.

I.

P. Newman, IF-THEN-ELSE Again, ACM SIGPLAN Notices pp. 106-l 11 (Dee 1983).

8.

V. Richards, T. Green, and J. Manton, What Does Problem Representation Affect: Chunk Size, Memory Load or Mental Process? Memo #3 19, MRC Social and Applied Psychology Unit, University of Sheffield, 1979. T. Green, IFS and THENs: Is Nesting Just for the Birds?, Software-Practice and Experience 10, 373-38 1 (1980).

9. T. Moher, and G. Schneider, Methods for Improving

Deeply Nested Conditionals 341

Controlled Experimentation in Software Engineering, Proceedings of the Fifth International Conference on Software Engineering, March 1981, pp. 224-233.

10. I. E. Fang, By Computer: Flesch’s Reading Ease Score and a Syllable Counter, Behavioral Science 13,249-25 1

(1968).

1 I. M. Sime, T. Green, and D. Guest, Scope Marking in Computer Conditionals-A Psychological Evaluation, international Journal of Man-Machine Studies 9, 107- I18 (1977).

12. SPSS, Inc. SPS!i’x User’s Guide, McGraw-Hill, New York, 1983.

are deeply nested conditionals less readable?

Documents