an improved preference data collection method: balanced incomplete block designs

An Improved Preference Data Collection Method: Balanced Incomplete Block Designs

David R. Rink Northern Illinois University

Numerous preference data collection methodologies exist. However, one of the relatively efficient methods has received far less attention than deserved. It is the balanced incomplete block (BIB) design. In this article, the author provides an overview of BIB designs, discusses corresponding advantages and limitations, presents appropriate statistical procedures for analyzing preference data, and illustrates the methodology underlying the development and application o f BIB designs with a real-world marketing problem.

INTRODUCTION

Numerous scaling methods exist for the purpose of collecting subjective information from respondents. These methods can be classified according to whether the emphasis is to be placed on stimuli, subjects, or responses. This paper focuses on the stimulus-centered approach, where the researcher examines systematic variation across a set of stimuli presented to a group of homogeneous subjects. Although many stimulus- centered approaches exist for collecting judgmental information (Green and Tull 1975), only single stimulus methods, paired comparisons, and ranking techniques are extensively used (Wiley 1978). Since this paper is concerned with preference data, only paired comparison and ranking techniques will be discussed. 1

In paired comparison methods, n (n- l ) /2 subsets of size 2 are individually ordered. These methods require a large number of observations. When the number of stimuli becomes large, deriving all pairs of stimuli can become tedious and time consuming. Not only does the number of pairs increase much faster than the number of stimuli, but so does "the amount of overdetermination ... and, hence, the number of degrees of freedom left after fitting the data" (Torgerson 1958, 191). Also, the respondent may become tired answering the large number of paired comparisons that are necessary in multidimensional scaling and conjoint measurement methods (Green 1974; Green and Carmone 1970).

With ranking procedures, the entire set of n stimuli is ordered. 2 The respondent may become frustrated if asked to rank more than eight items, and he/she may skip the question or select the most and least preferred items, ignoring the rest. Furthermore, the ability of respondents "to rank objects effectively and reliably may be a function of the number of comparative judgments to be made. For example, after 10 different brands of bourbon have been tasted, the discriminatory powers of the observers may legitimately be questioned" (Gibbons 1971, 257).

The problems of respondent fatigue and frustration can be substantially reduced by using one of the balanced incomplete block (BIB) designs, such as lattice designs and Youden squares. BIB designs "reduce considerably the number of independent pairwise comparisons which the subject is forced to make" (Green and Carmone 1970, 85), thereby improving response reliability and increasing data collection effectiveness.

�9 1987, Academy of Marketing Science Journal of the Academy of Marketing Science Spring, 1987, Vol. 15, No. 1,054-061 0092-0703/87/1501-0054 $2.00

PURPOSE

The purpose of this paper is to explain and show how BIB designs can circumvent the problems inherent in research designs where the respondent must rank many objects. First, the general nature of BIB designs is discussed, including associated advantages and limitations. Next, appropriate statistical analysis tech-

JAMS 54 SPRING, 1987

AN IMPROVED PREFERENCE DATA COLLECTION METHOD: RINK BALANCED INCOMPLETE BLOCK DESIGNS

niques are presented. Finally, the methodology underlying the development and application of BIB designs is explained and illustrated with a real-world marketing problem.

GENERAL NATURE OF BIB DESIGNS

In the typical randomized complete block design, every treatment (or object) is applied to every block (or respondent). However, when the number of treatments is large and the block size is limited, it may be impractical for all t rea tments to be applied to each block for comparison at the same time. In this case, an incomplete block design may be used (Conover 1971). "A reasonable thing is to divide the objects into blocks in accordance with a (BIB) design and to present the objects in one block for ranking at one time. In the whole experiment, each object will be compared with each other object the same number of times in all" (Cox 1958, 232).

If the design is balanced so

1. every block contains k experimental units (k < t),

2. every treatment appears in r blocks (r < b), and

3. every treatment appears with every other treatment an equal number of times, 3

the design is then called a balanced incomplete block (BIB) design (Conover 1971, 275-276).

Every BIB design must satisfy both of these defining relations (Federer 1955):

tr = kb (1) (t - 1)h = r(k - l) (2)

where t = k = b = r =

k =

number of treatments to be examined, number of experimental units per block (k < t), total number of blocks, number of times each treatment appears (r < b), and number of blocks in which the i th treatment and the jth treatment appear together (X is the same for all pairs of treatments).

Three of the earliest contributors in the construction of BIB designs were Bose (1939), Yates (1936a,b), and Youden (1937). The extension of BIB methods from their origin in agricultural research to the behavioral sciences seems to have arisen through the methodology for paired comparisons (Bradley and Terry 1952; David 1963). BIB methods are suitable in cases involving "subjective ranking by a small panel of judges for the detection of differences" (Bradley and Terry 1952, 335). Therefore, these methods are suitable in situations where "individuals are asked to make a comparative rating of different objects that are presented to them" (Cochran and Cox 1957, 440).

ADVANTAGES AND LIMITATIONS OF BIB DESIGNS

The following advantages accrue from the use of BIB designs:

1. Although the number of comparisons per respondent are reduced, the phenomenon of balancing across stimulus pairs and subjects enables respondents as a group to rank many objects (Green and Tull 1975).

2. The researcher is no longer bound unequivocally to the restriction that the number of t reatments must equal the number of experimental units in a block. BIB methods allow the number of treatments to exceed the number of experimental units per block (Yates 1936a).

3. The procedure by which the rankings of several judges are combined "permits an overall test of significance without the usual assumption that members of a panel agree upon the nature of the differences to be detected" (Bradley and Terry 1952, 325).

4. BIB designs increase the precision of the study because balancing and replication reduce standard deviation or variability (Cox 1958).

5. Because BIB procedures specify that every two treatments must occur together in a block the same number of times, estimates of t reatment effects, experimental error, and block differences can be obtained (Yates 1936a).

6. Since each respondent is ranking a subset of the total number of objects, BIB designs will save the respondent time, making possible a higher completion rate, cut interview costs, and improve the reliability of the results.

7. Nonparametric statistical techniques may be used in BIB designs. These methods are easier to learn and use than their parametric counterparts.

8. As long as the observations (even if nonnumeric) are amenable to ranking according to some criterion of interest, the researcher does not have to invent a scale of measurement (Conover 1971).

The following limitations apply to BIB designs:

1. When the number of treatments is large, the number of required replications also tends to be large. This disadvantage disappears if the notion of balance is dispensed. However, this will cause some loss in efficiency and will create "the inconvenience of having slight variations in accuracy for different sets of treatment comparisons" (Yates 1936a, 123).

2. "When some treatment comparisons are required to have higher precision than others" (Cox 1958, 244), other incomplete block designs, e.g., partially balanced incomplete blocks, 4 should be used in lieu of BIB designs.

3. Once the number of blocks (or respondents) is determined in BIB designs, the researcher must acquire this exact number - - no more, no less. The researcher



therefore may be compelled to adopt some form of personal interview procedure to collect information, which can be expensive and time consuming.

4. Unlike conjoint analysis and nonmetric multidimensional scaling, BIB methods are not suited to play a role in more general schemes that can resolve overall measures of favorableness or part- worths (Green and Wind 1973).

5. BIB designs are applicable only to situations in which respondents' preferences (or perceptions) are homogeneous. This assumption may never prevail in the strictest sense.

STATISTICAL ANALYSIS TECHNIQUES FOR BIB DESIGN DATA

Several approaches have been suggested for analyzing BIB design data. The traditional technique utilized for evaluating quantitative variables in BIB designs is the analysis of variance. However, the specific form of the analysis of variance for BIB designs "differs according to the nature of the design, the number of replications, and the restrictions" (Banks 1974, 493). The analyses of variance for various BIB designs may be found in several sources (Banks 1974; Cochran and Cox 1957), and will not be discussed in this paper. An analytical procedure that is computationally simpler than traditional analysis of variance techniques in- corporates the Durbin test (Durbin 1951), coefficient of concordance (Gibbons 1971; Kendall 1955), and Guttman scale (Guttman 1946).

Durbin Test

Because the observations in BIB designs may consist merely of ranks and would not meet the normality assumptions required for applying parametric techniques, the Durbin test is appropriate. Two key assumptions underlie the use of the Durbin test in BIB designs:

1. the blocks (judges) are mutually independent of each other, and

2. within each block the observations may be arranged in increasing order according to some criterion of interest (Conover 1971).

The null and alternative hypotheses commensurate with the Durbin test are (Conover 1971, 277):

H0: Each ranking of the random variables within each block is equally likely. The treatments have identical effects.

HI: At least one treatment tends to yield larger observed values than at least one other treatment.

The Durbin test statistic utilized in BIB designs is defined in convenient computing form as (Conover 1971):

t

12(t-l) X R] - 3 [ r(t-l) (k+l) T= rt(k-1-- (k+l) j= 1 k-1 ] (3)

b where Rj = Y.

i=l R(Xi3 = sum of ranks assigned to the

r observed values under the jth treatment.

In terms of a decision rule, the null hypothesis should be rejected at the alpha level of significance if the Durbin test statistic T exceeds t he ( l - o r ) th quantile of a chi-square random variable with t-I degrees of freedom (Conover 1971).

The theoretical development of the Durbin test "is very similar to that of the Fr iedman test. That is, the exact distribution of the Durbin test statistic is found under the assumption that each arrangement of,the k ranks within a block is equally likely, because of no differences between treatments. There are k! equally likely ways of arranging the ranks within each block, and there are b blocks. Therefore, each arrangement of ranks over the entire array of b blocks is equally likely, and has probabil i ty 1/(k!) b associated with it, because there are (k!) b different arrays possible. The Durbin test statistic is calculated for each array, and then the distribution function is determined, just as it (is) for the Friedman test statistic . . . . The exact distr ibution is not practical to find in most cases, and so the distribution of the Durbin test statistic is approximated by the chi-square distribution with t-I degrees of freedom, if the number of repetitions r of each treatment is large" (Conover 1971, 278).

Guttman Scale

The Durbin statistic would be computed in order to test the null hypothesis of no differences among treatments. Assuming the rejection of the null hypothesis, a Guttman scale (see Appendix for more information) can be derived by means of a computer program named SCAORD, which is on f'de in the Statistics Laboratory at the University of Arkansas. s After plot t ing the results of these calculations on a linear scale, the researcher can determine those objects (treatments) respondents (blocks) considered most important as well as ascertain the intensity with which respondents ranked the objects. 6



TABLE 1 SELECTED PORTIONS OF

RAGHAVARAO'S TABLE 5.10.1 (1971)

Series t b r k h Efficiency

42 15 35 7 3 1 .71 43 15 15 7 7 3 .92 44 15 15 8 8 4 .94 45 15 35 14 6 5 .89

Coefficient of Concordance

The formula used in the BIB design to measure agreement among observers regarding their rankings of objects is defined in convenient computing form a s 7 (Gibbons 1971; Kendall 1955):

t 12 YR ] - 3r2t (k+l):

W = j= l (4)

h2t (t 2- 1)

A MARKETING APPLICATION

The president of a multinational corporation that manufactured industrial adhesives wanted his vice- presidents to rank 15 potential new sales territories from most preferred to least preferred. Because the vice-presidents would probably find it difficult to effectively rank this many items, the president decided

TABLE 2 GENERAL BIB DESIGN FOR RANKING OF SALES

TERRITORIES

Blocks Treatments (Vice- (Sales Territories)

Presidents)

1 1 10 11 12 13 14 15 2 2 7 8 9 13 14 15 3 3 6 8 9 11 12 15 4 4 6 7 9 10 12 14 5 5 6 7 8 10 11 13 6 3 4 5 6 13 14 15 7 2 4 5 7 11 12 15 8 2 3 5 8 10 12 14 9 2 3 4 9 10 11 13

10 1 4 5 8 9 10 15 11 1 3 5 7 9 I1 14 12 I 3 4 7 8 12 13 13 1 2 5 6 9 12 13 14 1 2 4 6 8 11 14 15 1 2 3 6 7 10 15

to adopt one of several BIB designs. He examined a BIB design table, which summarizes the required parameters for each of several series of research designs. Although several such tables exist (Cochran and Cox I957; Cox 1958; Fisher and Yates 1963; Kempthorne 1952; Yates 1936b), the president consulted Table 5.10.1 from Raghavarao's Constructions and Combinatorial Problems in Design of Experiments (1971)

By knowing the number of sales territories (or treatments) to be rated, the president was able to focus on a subset of the 91 listed series. Given t = 15, four series qualified - 42, 43, 44, and 45 (Table 1). Further elimination depended upon qualitative considerations, 8 such as the number of objects (sales territories) the president believed each vice-president could reliably rank. The president felt the vice-presidents could rank seven sales territories more accurately than eight. As there were 15 vice-presidents, the president was constrained to a design where b = 15. This resulted in Series 43 being selected as the optimal BIB design to adopt. Operationally, Series 43 means (Table 1):

1. there will be 15 sales territories (treatments) evaluated (t = 15),

2. there will be 15 vice-presidents (blocks) sampled (b = 15),

3. each sales terri tory (treatment) will be repeated seven times (r = 7) in the BIB design,

4. each vice-president will rank seven sales territories (k = 7), and

5. each sales territory will be compared with every other territory by three vice-presidents (h = 3). 9

With the establishment of the parameters for the BIB design, the general BIB layout could be developed. Table 2 summarizes the general experimental design. In English, the first row of Table 2 means that Vice- President Number 1 would rank Sales Territory Numbers 1, 10, 11, 12, 13, 14, and 15 in order of preference from the most to the least preferred. Vice-President Number 2 would do the same thing, but in terms of Sales Territory Numbers 2, 7, 8, 9, 13, 14, and 15. And so on, until Vice-President Number 15 ranked Sales Territory Numbers 1, 2, 3, 6, 7, 10, and 15.

Within the general BIB design, it was necessary to randomize these three elements:

1. order of sales territory numbers presented to each vice-president to be ranked,

2. assignment of respondents to blocks, and 3. assignment of identification numbers to sales territories.

After conducting the survey, the president obtained the results summarized in Table 3. In English, the first row of Table 3 means that Vice-President Number 1 ranked his set of seven sales territories (see Table 2) from most preferred to least preferred as follows: Sales Territory Numbers 15, 10, 12, 1, 13, 11, and 14. Vice- President Number 2's preferred ranking of his group of seven territories was: Sales Territory Numbers 7, 13, 15, 8, 14, 9, and 2. And so on, until Vice-President Number 15, whose ranking of his set of seven territories was: Sales Territory Numbers 10, 15, 6, 7, 3, 1, and 2.



TABLE 3 DATA RESULTS OF BALANCED INCOMPLETE BLOCK DESIGN

FOR SALES TERRITORY PREFERENCE STUDY

Vice- President Sales Territory Number

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 4 2 6 3 5 7 1 2 7 1 4 6 2 5 3 3 2 6 7 3 4 1 5 4 2 1 4 7 6 5 3 5 3 5 2 4 1 6 7 6 I 3 5 7 4 6 2 7 1 6 2 3 7 5 4 8 1 4 7 6 3 2 5 9 7 4 1 2 3 5 6

10 7 5 6 4 3 1 2 11 6 1 3 7 4 5 2 12 5 4 3 2 6 1 7 13 6 5 4 1 2 3 7 14 5 7 1 2 6 3 4 15 6 7 5 3 4 1 2

Rj~ 39 35 21 21 30 25 23 37 27 17 36 20 38 32 19

After summing the ranks for each sales territory (or column), the Durbin test statistic was computed:

12 (5- 1) Y = [ (39) 2 + (34) 2 + . . . + (19) 2 ]

7 (15) (7 - 1) (7 + 1)

7(15 - 1) (7 -3 [ I] + )

(7 - 1)

= 25.5

For t-I = 15-1 = 14 degrees of freedom, this Durbin T value was statistically significant at the .02994 level. Therefore, given an alpha of .05, the null hypothesis that treatments (or sales territories) were identically preferred was rejected. At least one sales territory tended to yield larger observed values than at least one other territory. Hence, there appeared to exist at least a partial ordering of sales territories among the vice-presidents.

Next, the coefficient of concordance was calculated:

12 [(35) 2 + (34) 2 + . . . + (19)2] - 3 (7)2(15)(7 + 1) 2 W =

= .30357 (3)2(15) [ (15) 2 - 1]

This value of W was statistically significant at the same level as the Durbin test statistic, because the coefficient of concordance is simply a linear t ransformation of the Durbin test statistic. The result of a significant

W was interpreted to mean that the rankings of 15 sales territories across the 15 vice-presidents exhibited some degree of consistency. Furthermore, these results indicated the vice-presidents were using the same criterion in evaluating sales territories. The correctness of the rankings, however, was undeterminable, because an external, objective yardstick of some type did not exist.

Finally, to more clearly ascertain the exact order and spread among the sales territories, thereby indicating the intensity with which vice-presidents ranked territories, a Gut tman scale was derived. The methodology for deriving the Guttman scale for a set of ordinal data is presented by Guttman (1946), and will not be discussed in this paper. Table 4 summarizes the Guttman scale scores for the 15 sales territories. If the most preferred sales territory received a rank of "1" while the least preferred sales terri tory was awarded a rank of "7" by each vice-president, then those sales territories possessing the most negative Guttman scale scores would represent the most preferred sales territories across all respondents.~~ In this example, these sales territories were (in order): Sales Territory Numbers 10, 15, 12, 9, 4, 7, 6, and 3 (Table 4).

The magnitude of the Guttman scale scores represented the spread among sales territories as well as the intensity with which vice-presidents ranked the territories. These results could be readily observed when the data were plotted on a linear scale, as in Figure 1. Because of the reasons embodied within footnote six and the Appendix, the president focused on only the results forthcoming from the Guttman scale. After deriving the linear scales for both mean ranks and Guttman scale scores (Table 4), it was observed that while some sales territories



TABLE 4 GUTTMAN SCALE SCORES AND MEAN RANKS

BY SALES TERRITORY NUMBER

Sales Mean Guttman Territory Number Rank Scale Score

1 5.571 .3262 2 5.0 .3582 3 3.0 -.0205 4 3.0 -.1927 5 4.286 .1204 6 3.571 -.0391 7 3.286 -.1480 8 5.286 .1907 9 3.857 -.2261

10 2.429 -.4898 11 5.143 .0989 12 2.857 -.2756 13 5.429 .4063 14 4.571 .1754 15 2.714 -.2842

had the same average rank, i.e. Sales Territory Numbers 3 and 4, they did not have the same Guttman scale scores. On the basis of different mean ranks, Sales Territory Number 7 was preferred to Sales Territory Number 9, but on the basis of Guttman scale scores, Sales Territory Number 9 was preferred to Sales Territory Number 7. Other similar examples are listed in Table 4. This phenomenon occurred because mean ranks is an average, whereas Guttman scale scores reflect the configuration of ranks among each sales territory. Guttman scaling, therefore, presented a more detailed picture of the actual ranks and spread among the sales territories than the average ranks statistic.

Focusing first on the negative Guttman scale scores, Sales Territory Number 10 was by far the most preferred sales territory (-.4898). Although its two closest rivals, Sales Territory Numbers 15 (-.2842) and 12 (-.2756), were perceived as somewhat similar, they were quite a distance from Sales Territory Number 10 - - at least .2056. The next three sales territories, Sales Territory Numbers 9 (-.2261), 4 (-.1927), and 7 (-.148), were about equi-distant from one another, and Sales Territory Number 9 was about the same distance from Sales Territory Numbers 15 and 12. However, there was a large gap between Sales Territory Number 7 (-.148) and the next group of similarly evaluated sales territories, Sales Territory Numbers 6 (-.0391) and 3 (-.0205) - - .1089 to be exact.

Before turning to the positive Guttman scale scores, it is important to note the wide spread between the lowest negative score and the lowest positive score, which is .1194. As with the two previously observed gaps, the vice-presidents indicated a significantly different preference (or intensity) for one set of sales territories versus another. In the positive half of the linear scale,

four distinct sales territory preference clusters emerged:

1. Sales Territory Numbers 11 (.0989) and 5 (.1204), 2. Sales Territory Numbers 14 (.1754) and 8 (.1907), 3. Sales Territory Numbers 1 (.3263) and 2 (.3582), and 4. Sales Territory Number 13 (.4063).

While the distances between clusters one and two and between clusters three and four were approximately the same (.055 and .0481, respectively), the distance between clusters two and three were not (.1356). This latter distance compared closely with the last two differences noted in the negative half of the linear scale. However, in the aggregate, the magnitude of these positive values means vice-presidents preferred Sales Territory Numbers 11, 5, 14, and 8 over Sales Territory Numbers I, 2, and 13.

FIGURE 1 LINEAR SCALE DEPICTING THE GUTTMAN

SCALE SCORES FOR THE SALES TERRITORY PREFERENCE STUDY

.4063

.3582

.3263

.1907

.1754

.1204

.0989

-.0205

-.0391

-.1480

-.1927 -.2261

-.2756

-.2842

-.4898

+ m

J ~

~ . . < y

m

T

I "

Sales Territory #13

Sales Territory #2 Sales Territory #I

Sales Territory #8 Sales Territory #14 Sales Territory #5 Sales Territory #11

Sales Territory #3

Sales Territory #6

Sales Territory #7

Sales Territory #4 Sales Territory #9

Sales Territory #12

Sales Territory #15

Sales Territory #10



In terms of interpreting these results, the president drew several conclusions:

1. Sales Territory Number 10 was overwhelmingly preferred to the other fourteen territories;

2. Excluding Sales Territory Number 10, the next set of preferred territories, Sales Territory Numbers 15 and 12, were too close to make a conclusion, without additional research;

3. The next sales territories, Sales Territory Numbers 9, 4, and 7, represented the fourth, fifth, and sixth most preferred territories;

4. The next eight sales territories, Sales Territory Numbers 6, 3, 11, 5, 14, 8, 1, and 2, formed four two-sales territory clusters. While it was possible to differentiate among these four clusters, it was difficult to unequivocally distinguish between sales territories within any one cluster, except perhaps between Sales Territory Numbers 1 and 2; and

5. Sales Territory Number 13 was by far the least preferred sales territory among the fifteen territories.

CONCLUSIONS

When respondents are asked to rank many objects, and/or respondents' ranking reliability may be in doubt, the researcher should consider using BIB designs. As long as the observations (even if nonnumeric) are amenable to ranking according to some criterion of interest, the investigator may use BIB approaches. Because balancing and replication reduce variability, BIB procedures tend to increase the study's precision. With a sample taken from a large, heterogeneous population, there is a high probability of much variation across subjects' rankings. This is unlikely with a small sample drawn from a homogeneous population. Hence, a prerequisite for applying BIB methods is the existence of a high degree of similarity among respondents. The real-world example presented in this paper satisfied this requirement by using vice- presidents from the s a m e firm. Also, BIB methods are apt to reduce data collection costs and result in increased completion rates. Since nonparametric statistical techniques have recently been associated with BIB procedures, the marketing researcher will not have to be a quantitative specialist to utilize them. Finally, with the ranking procedure, the researcher does not have to invent/create a scale of measurement he/she can use. This represents the major advantage and therefore, the major significance of BIB methods.

FOOTNOTES

~These techniques are generally classified as variability methods, because the basic data are only ordinal-scaled.

2Graded category sorting and sequential sorting represent less tedious alternatives to complete ranking. However, they encounter the same general problems as complete ranking methods when the number of stimuli becomes large.

3This particular property ensures that the same standard error may be used for comparing every pair of treatments. Since any treatment total is a single operation for all the blocks in which the treatment appears, it facilitates the statistical analysis (Cochran and Cox 1957).

4By relaxing the restriction that each pair of treatment levels must appear the same number of times (h), more general types of incomplete block designs emerge. Clatworthy (1955) has been instrumental in developing this class of designs, especially partially balanced incomplete block (PBIB) designs. PBIB designs with two associate classes (an associate class is a subset of paired treatments which are balanced) are characterized by these three conditions:

1. every treatment appears at most once in every block, 2. each oft treatment levels appears in exactly r replications in b blocks ofk

items each, and 3. each pair of treatment levels occurs either:

a. exactly ,k~ times (first associate) or b. exactly h2 times (second associate).

The purpose for using PBIB designs is similar to that for BIB designs: to reduce the number of stimuli presented at any one time while maintaining some type of balance across presentations (Green 1974). Most textbooks on experimental design provide PBIB design plans and appropriate analytical procedures. PB1B designs are fairly popular in collecting paired comparisons data, among other things.

51nformation on this computer program can be obtained from Professor James E. Dunn, Department of Mathematics, University of Arkansas, Fayetteville, Arkansas.

6Assuming the rejection of the null hypothesis, an alternative method to computing a Guttman scale is to rank the Rj's in order of increasing magnitude. The result would be an indication of those objects which respondents adjudged most pertinent. Another approach that could be useful in lieu of Guttman scaling is average ranks, which relies heavily upon the R~'s. The formula for deriving an average rank for the j~h object (treatment) is: Mean Rank = (R~)/r. The method for determining which objects subjects consider most vital remains basically the same as with ranking the Rj's. Relative to the Guttman scale approach, both of these alternative procedures are ad hoc in nature. Furthermore, while the ranking of treatment columnar sums and average ranks has strong theoretical support in the literature on complete rankings (Kendall 1955; Siegel 1956), it is not clear how well these techniques also apply to blocks of incomplete rankings. Of these three procedures, therefore, the Guttman scale reflects more accurately the intensity of subjects' incomplete rankings of treatments. This will simplify the task of differentiating which objects respondents judged of greatest import.

7In terms of interpreting the value of W, Siegel (1956, 237-238) offers this advice:

A high or significant value of W may be interpreted as meaning that the observers or judges are applying essentially the same standard in ranking the N objects under study . . . . It should be emphasized that a high or significant value of W does not mean that the orderings observed are correct. In fact, they may all be incorrect with respect to some external criterion.

8Most of the previous uses of BIB designs did not focus on ranking experiments. Researchers were normally concerned with response variables that were quantitative in nature. Given this situation, researchers could objectively determine the optimal BIB design to select by first computing each series' efficiency factor and then selecting the series with the largest value. The efficiency factor is defined as "the fraction of total information contained in intrablock comparisons where interblock and intrablock contrasts are of equal accuracy. The efficiency factor is equal to t~./rk" (Federer 1955, 416). The last column in Table 1 summarizes the efficiency value for each series where t = 15. Following the procedure outlined above, Series 44 would have been selected for establishing the most optimal BIB design, if quantitative measurements were involved. However, if qualitative considerations prevailed, an entirely different series may have been preferred.

9Other things being equal, it is suggested that h be greater than or equal to two, because each treatment is then compared at least twice with every other treatment rather than once if ,k = 1. Also, the efficiency factor will be lower in the latter case than in the former instance.

l~ the rank of "7" symbolized the most preferred sales territory, whereas the rank of "1" represented the least preferred sales territory, the investigator would concentrate his/her attention on the highest positive Guttman scale scores.

1,4 M S 60 SPRING, 1987


REFERENCES

Banks, Seymour. 1974. "Experimental Design and Control." In Handbook of Marketing Research. Ed. Robert Ferber. New York: McGraw-Hill Book Company.

Bose, R, C. 1939. "On the Construction of Balanced Incomplete Block Designs." Annuals of Eugenics 9 (June): 353-399.

Bradley, Ralph and Milton Terry. 1952. "Rank Analysis of Incomplete Block Designs." Biometrika 39 (January): 324-345.

Clatworthy, Willard. 1955. "Partially Balanced Incomplete Block Designs with Two Associate Classes and Two Treatments Per Block." Journal of Research of the National Bureau of Standards 54 (April): 177- 190.

Cochran, William and Gertrude Cox. 1957. Experimental Designs. New York: John Wiley & Sons, Inc. Second Edition.

Conover, W. 1971. Practical Nonparametric Statistics. New York: John Wiley & Sons, Inc.

Cox, D. 1958. Planning of Experiments. New York: John Wiley & Sons, Inc.

David, H. 1963. The Method of Paired Comparisons. London: Charles Griffin & Co., Ltd.

Drake, Jerry and Frank Millar. 1969. Marketing Research. Scranton, PA: International Textbook Company.

Durbin, J. 1951. "Incomplete Blocks in Ranking Experiments." British Journal of Psychology, Statistical Section 4 (November): 85-90.

Federer, Walter. 1955. Experimental Design. New York: The Macmillan Company.

Fisher, R. A. and F. Yate~, 1963. Statistical Tables, New York: Hafner Publishing Company. Sixth Edition.

Gibbons, Jean. 1971. Nonparametric Statistical Inference. New York: McGraw-Hill Book Company.

Green, PauI E. 1974. "On the Design of Choice Experiments Involving Multifactor Alternatives." Journal of Consumer Research I (September): 61-68.

and Frank J. Carmone. 1970. Multidimensional Scaling and Related Techniques in Marketing Analysis. Boston: Allyn and Bacon, Inc.

and Donald Tull. 1975. Research for Marketing Decisions. Englewood Cliffs, N J: Prentice-Hall, lnc. Third Edition.

and Yoram Wind. 1973. Muhiattribute Decisions in Marketing: A Measurement Approach. Hinsdale, IL: The Dryden Press.

Guttman, Louis. 1946. "An Approach for Quantifying Paired Comparisons and Rank." Annals of Mathematical Statistics 17 (August): 144-163.

Kempthorne, Oscar. 1952. The Design and Analysis ~'~[ Experiments. New York: John Wiley & Sons, Inc.

Kendall, Maurice. 1955. Rank Correlation Methods. New York: Hafner Publishing Company. Second Edition.

Raghavarao, Damaraju. 1971. Construction and Combinatorial Problems in Design of Experiments. New York: John Wiley & Sons, Inc.

Siegel, Sidney. 1956. Nonparametric Statistics. New York: McGraw-Hill Book Company.

Torgerson, Warren S. 1958. Theory and Methods of Sealing. New York: John Wiley & Sons, Inc.

Uhl, Kenneth and Bertram Sehoner. 1969. Marketing Research. New York: John Wiley & Sons, Inc.

Wiley, James B. 1978. "BIBD: A Data Management Program for "BIBD" Choice Data." Journal of Marketing Research 15 (August): 472-474.

Yales, F. t936a. "Incomplete Randomized Blocks." Annals of Eugenics 7 (March): 121-140.

. t936b. "A New Method of Arranging Variety Trials Involving a Large Number of Varieties." Journal of Agricultural Science 26 (June): 424-455.

Youden, W. J. 1937. "Use of Incomplete Block Replications in Estimating Tobacco-Mosaic Virus." Contributions from Boyce Thompson Institute 9 (November): 41-48.

APPENDIX

Guttman Scaling

The primary purpose of any scaling technique is "to measure the intensity of the respondent's feelings about his/her answer . . . . Intensity refers to the strength of the respondent's feelings regarding the answer to a question" (Drake and Millar 1969, 428).

A Guttman or unidimensional scale is appropriate whenever an ordered set of statements exists and agreement with one statement implies agreement with all statements that are less positive. This notion of agreement is best handled quantitatively, by using scalogram analysis developed by Guttman. In addition to agreeing to a certain statement, the respondent may indicate how strongly he/she feels about his/her response along a five-point scale from "feel very strongly" to "do not feel very strongly at all" (Uhl and Schoner 1969). Conceptually, the following categories exist in the intensity continuum:

IFeelve l Feel Iundeci edt OoNot I Oo ot I Strongly Stongly Feel Feel Very

Strongly Strongly At All

Once items are rank-ordered, metric scaling can be used. In determining the intensity with which respondents ranked objects, a Guttman scale is derived directly from the rankings (Guttman 1946). Hence, the Guttman scale is applicable to preference testing where the primary interest is in the objects under comparison (Cochran and Cox 1957; David 1963).

ABOUT THEAUTHOR

DAVID R. RINK received his B.S. degree in Management and his M.B.A. degree from Indiana University in Bloomington, Indiana. He received his Ph.D. degree from the University of Arkansas in Fayetteville, Arkansas. Currently, Dr. Rink is Professor of Marketing at Northern Illinois University, DeKalb, Illinois. He has authored numerous articles in domestic and international business and professional journals and publications on such topics as strategic market planning, organizational buying, sales management, consumer behavior, pricing, marketing research, and product planning. Dr. Rink has also consulted with major corporations in these areas. He has co-authored two textbooks - - Marketing Research and Applied Marketing Problems.


an improved preference data collection method: balanced incomplete block designs

Documents