a comparison of encoding techniques

7
O~IE(;-I The Int. Jl of Mgmt Sci. Vol I[. \o I. pp ..t.9 55. I9S3 1ijIJS-O-,t.'~3 ";3 !l[,~a,~.!)7gl)3t)~ 0 Prtr.ted in Great Brit,tin. &ll rights rescrsed Cop3rtght FZ 19Y,3 Perg~m,gn Pre,~s Ltd A Comparison of Encoding Techniques DANIEL G BROOKS TIMOTHY J O'LEARY Arizona State University, USA (Receit'ed May 1981: in revisedJbrm March 1982) A limiting constraint of many management science techniques is that inputs from the decision maker based upon his experiences, opinions and intuition are not considered. For those models that do allow this type of input, it is assumed that they can be accurately and precisely defined in a subjective probability distribution. Little attention, however, has been directed towards evaluating the techniques to define these distributions in a management setting. This study investigates the relative merits of four of the most commonly used techniques for the quantification of subjective assessments. When these techniques were used with professionals whose jobs entail evaluation of uncertainty, a clear preference was shown. Additionally, some concluding observations concerning the selection and the application of assessment techniques arc presented. 1. I N T R O D U C T I O N MUCH ATTENTION has been directed toward the problem of accurately eliciting and quanti- fying an individual's judgments about uncer- tain quantities. While this process is of interest in its own right, it is also important for its role in providing the basic inputs for statistical inference and decision making. As statistical techniques become more commonly used tools in many disciplines, there is more interest in practicable techniques for obtaining subjective probability assessments to be used in decision making. The techniques vary in many ways, including the amount of statistical sophistica- tion assumed of the decision maker, the method of elicitation and the mode of re- sponse. Researchers or decision analysts, par- ticularly in non-quantitative disciplines, desire a technique which does not assume extensive prior exposure to statistics on the part of the client (the one whose judgments are being assessed), one with which the client is comfor- table and, of course, one which accurately rep- resents the client's opinions. This study investigates the relative merits of four of the most commonly used techniques for the quantification of subjective assessments of uncertainty. In particular, the bisection, frac- tile, cumulative distribution and probability wheel techniques are compared in an analyst- directed elicitation to ascertain whether or not certain techniques are relatively more accurate in representing the client's subjective judg- ments. Section 2 gives a brief review of some of the literature upon which this study is based. Section 3 describes the general structure of the study and Section 4 gives a detailed description of the study design, the procedure followed in eliciting distributions and the results. Section 5 discusses these results and what implications might be derived from them and Section 6 gives some concluding remarks about prob- lems encountered in the process and what might be done about them. 2. A BRIEF REVIEW The literature in the area of encoding subjec- tive probability assessment is so extensive it is impossible to briefly summarize all the work. A paper by Spetzler & Stael von Holstein [10] gives a lengthy bibliography of research in this 49

Upload: daniel-g-brooks

Post on 26-Aug-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A comparison of encoding techniques

O~IE(;-I The Int. Jl of Mgmt Sci. Vol I[. \ o I. pp ..t.9 55. I9S3 1ijIJS-O-,t.'~3 ";3 !l[,~a,~.!)7gl)3t)~ 0 Prtr.ted i n Great Brit,tin. &ll rights rescrsed Cop3rtght FZ 19Y,3 Perg~m,gn Pre,~s Ltd

A Comparison of Encoding Techniques

D A N I E L G B R O O K S

T I M O T H Y J O ' L E A R Y

Arizona State University, USA

(Receit'ed May 1981: in revisedJbrm March 1982)

A limiting constraint of many management science techniques is that inputs from the decision maker based upon his experiences, opinions and intuition are not considered. For those models that do allow this type of input, it is assumed that they can be accurately and precisely defined in a subjective probability distribution. Little attention, however, has been directed towards evaluating the techniques to define these distributions in a management setting. This study investigates the relative merits of four of the most commonly used techniques for the quantification of subjective assessments. When these techniques were used with professionals whose jobs entail evaluation of uncertainty, a clear preference was shown. Additionally, some concluding observations concerning the selection and the application of assessment techniques arc presented.

1. I N T R O D U C T I O N

MUCH ATTENTION has been directed toward the problem of accurately eliciting and quanti- fying an individual's judgments about uncer- tain quantities. While this process is of interest in its own right, it is also important for its role in providing the basic inputs for statistical inference and decision making. As statistical techniques become more commonly used tools in many disciplines, there is more interest in practicable techniques for obtaining subjective probability assessments to be used in decision making. The techniques vary in many ways, including the amount of statistical sophistica- tion assumed of the decision maker, the method of elicitation and the mode of re- sponse. Researchers or decision analysts, par- ticularly in non-quantitative disciplines, desire a technique which does not assume extensive prior exposure to statistics on the part of the client (the one whose judgments are being assessed), one with which the client is comfor- table and, of course, one which accurately rep- resents the client's opinions.

This study investigates the relative merits of four of the most commonly used techniques for

the quantification of subjective assessments of uncertainty. In particular, the bisection, frac- tile, cumulative distribution and probability wheel techniques are compared in an analyst- directed elicitation to ascertain whether or not certain techniques are relatively more accurate in representing the client's subjective judg- ments.

Section 2 gives a brief review of some of the literature upon which this study is based. Section 3 describes the general structure of the study and Section 4 gives a detailed description of the study design, the procedure followed in eliciting distributions and the results. Section 5 discusses these results and what implications might be derived from them and Section 6 gives some concluding remarks about prob- lems encountered in the process and what might be done about them.

2. A BRIEF REVIEW

The literature in the area of encoding subjec- tive probability assessment is so extensive it is impossible to briefly summarize all the work. A paper by Spetzler & Stael von Holstein [10] gives a lengthy bibliography of research in this

49

Page 2: A comparison of encoding techniques

50 Brook ~. O'Lcart--,4 Comp,trison q[ Encodin~l Techniques

area. Most of this work, however, was proce- dural in nature and, as noted by Hogarth [4]. there was little in the way of systematic empiri- cal investigation. This lack has been reduced since then by the work of some consulting or- ganizations.

The focus of the empirical work has been on three related, but differing, problems:

/l) Testing the "expertness" of an individual by measuring the accuracy with which the assessor is able to describe uncertain behav- ior in the future. The typical approach uses some sort of "scoring rules" to evaluate an expert's ability to accurately describe some "true" distribution.

(2) Testing the ability of assessors to appropri- ately evaluate new information by compar- ing subjective re-evaluation based on new data vs revision using a mechanism such as Bayes theorem.

(3) Testing the ability of an elicitation tech- nique to acct, rately quantify a client's sub- jective feelings independently of how the client uses available information or how expert the client is in his opinions.

It is the third area with which this study is concerned. A major methodological paper by Winkler [11] set much of the direction for em- pirical work in encoding. Much work has been done since then in describing and improving assessment !echniques ([3,5, 1,8, 12,6] and others) and also in comparison of existing tech- niques in practical situations (including [4 & 7]).

3. THE PRESENT STUDY

Most systematic experimental work has focused on the assessor's ability to correctly forecast the future rather than on the reliability of the technique eliciting these forecasts. What work has been done on this reliability suffers from several weaknesses:

(l) it has relied primarily on questionnaires rather than analyst-directed techniques;

(2) it has been conducted primarily with stu- dents acting in artificial scenarios (such as story problems or almanac questions, in

which the subject is asked to estimate the value of some past but. to the subject, un- known variable--for example, the amount of rainfall in Chicago in 1970 or the total points scored in the Superbowl in 1974);

/3) it has dealt primarily with group assess- ment--for example, Ludke et al. [-7] com- pare five different assessment techniques, but the testing was done with students using questionnaires asking almanac-type questions.

In this study the techniques are used in an analyst-directed elicitation procedure with pro- fessionals whose jobs include working with uncertainty. In particular, financial analysts were asked their opinions on the future value of a financial index. Their opinions on this non-arbitrary continuous random variable were elicited by the analyst using different encoding techniques.

The important article by Spetzler & Stael yon Holstein [10] gives a description of the various elicitation techniques used by the De- cision Analysis Group at Stanford Research In- stitute including the use of the probability wheel. They do not present comparative results using the different techniques, however. In fact, Slovic e t al. [9] note that they have been unable to find any research on the use of the probability wheel.

This paper makes use of the general struc- ture employed by Spetzler & Stael yon Holstein and reports comparative results. In particular, the techniques used in the study were categor- ized according to the method of encoding and the response mode. The method can either ask the assessor to assign a probability to some fixed value, or associate a value with some fixed probability. The response mode can be direct (the assessor responds directly by giving the desired value) or indirect (the response is in terms of a bet preference or some other inter- mediate value). Each of the four methods used is representative of one of the four possible combinations of method and mode of response, as shown in Fig. 1.

4. P RO CED U RE

Based on pre-tests with undergraduate and graduate college students, four encoding tech-

Page 3: A comparison of encoding techniques

Omega. l, ol. I I. Y,'o. 1 51

Response Mode (4) direct indirect

fixed value

(directl Method of Encoding

fixed probability

(indirectl

Cumulative Probability wheel

Bisection Fractile

FIG. I. Methods of encodin 9 and modes o f response.

niques were selected as the best representatives of the desired combinations of method and mode characteristics as shown in Fig. 1. To compare these techniques, eleven bond brokers from two major banks were used as subjects. All the subjects had college degrees and some exposure to statistics. Only one, however, had taken a statistics course beyond the introduc- tory level. The analyst used the four techniques with each subject to elicit the subject's proba- bilistic opinions of the value of the Dow-Jones industrial average (D-J average) at closing time on a specified day four months in the future. The four techniques are described here.

(1) Bisection. The subject specified the lowest and highest possible values the D-J average could take. Different values of the D-J average were then suggested by the analyst until one was found which divided this low-high range into two equally-likely intervals of values. This same procedure was then used to divide each of these inter- vals into equally-likely intervals of possible values, continuing until sufficient points had been found.

(2) Fractile. A midvalue was specified by the subject, this value dividing the possible range of values into two equally-likely intervals. For each of these intervals the subject then specified a midpoint, and this procedure continued until sufficient points were found.

(3) Cumulative. The subject was asked to assess the probability that the D-J average would fall either above or below specified values. Requestioning was used until inconsisten- cies were resolved and sufficient points were found.

Probability wheel. The subject was asked whether he would prefer participating in a lottery in which he won if the D-J average exceeded a specified value or a lottery in which he won if the pointer of the wheel fell within an orange-colored wedge of the wheel. The size of the wedge was then varied up and down until the subject was indifferent between the two lotteries. This procedure continued until sufficient points were determined on the distribution and inconsistencies resolved.

It is noted here that the order in which the points of the distribution are specified is prede- termined for both the bisection and fractile techniques. The bisection method begins by specifying the end points of the distribution, and then dividing the range into equaLly-likely intervals. The fractile method starts by specify- ing the median and then works outward both above and below this value by dividing the remaining ranges into equally-likely intervals. Because changing the order in which the points are specified would, in these two cases, essen- tially change the technique being tested, no check for hysteresis was done.

The order of specification for the other two techniques is not pre-set by the technique, although certain patterns are more generally used. In this study a high point and a low point (with respect to the cumulative distribution) were first determined, then points successively closer to the median, alternating between the high end and the low end. This procedure was continued past the median for both the ascent and descent, so the distribution is actually eli- cited twice. This elicitation procedure was con- tinued until inconsistencies were resolved. No pattern of differences was detected when com- paring points specified for increasing probabili- ties vs decreasing probabilities. In each case, what further encoding was necessary to resolve differences was performed.

Before encoding, the analyst followed a three-step procedure of introduction/motiva- tion, structuring and conditioning for each sub- ject. In the first step the concept of encoding and its capability to summarize personal opinions was introduced. The formal concept of uncertainty was then discussed, and two example situations were used to illustrate this formal definition. The second example was a

Page 4: A comparison of encoding techniques

52 Brooks. O'Leary--A Comparison of Encodiny Techniques

realistic problem of estimating the future return from a specific stock. The importance and applicability of this research to financial decision making was then discussed. In the second step. the specific problem of expressing opinions about the D-J average four months in the future was introduced and explained. To avoid any possible misunderstandings, the D-J average was defined, and its recent past behav- ior was briefly discussed. In the conditioning step. each subject was asked to list several of the most important factors that could influence the D-J average over the coming four months, and some discussion of the extent of the effects of these factors followed. If there was some hesitancy on the part of the subject to identify factors, some possible factors were suggested. At the time of this experiment, the list of poss- ible factors included upcoming election ac- tivity, gold prices, winter wheat harvest, the Shah of Iran's health, and the Panama Canal Treaty. Some factors would have little or no influence on the behavior of the D-J average while others' influence was direct. This distinc- tion was pointed out to the subject. The subject was then directed to select two or three of what he considered to be the major influences on the behavior of the D-J average. This was done without the analyst's direction. Then the sub- ject was asked to create two scenarios: the worst combination of major influences and the best combination of major influences.

At this point the encoding process was begun. The order in which the four encoding techniques were used was randomized except that the fractile and bisection techniques were never used back-to-back owing to their obvious similarity. Each technique was used by first describing the basic procedure employed, allowing the subject to practice with the tech- nique on a sample problem and finally using the technique to encode the subject's opinions on the future value of the D-J average.

After using each of the four techniques to encode each subject's assessments, pairwise comparisons of the resulting cumulative distri- butions were made using overlays. (The encod- ing technique associated with each distribution was not identified.) The use of density vs cumu- lative functions for comparison of distributions was considered. Both approaches have some drawbacks, as pointed out by Winkler l-l l]. Initially, density functions are not as confusing to the subject; however, subjects usually try to make them look "normal', i.e. like a normal distribution. In making comparisons, it was feared that the subjects would tend to favor the density function that appeared most normal, rather than considering objectively the opinions each represented. For this reason, cumulative distributions were carefully explained and used for all comparisons.

Each subject rank-ordered the distributions according to how accurately each distribution reflected his opinions. The average rankings for all subjects are shown in Table l under the heading 'average accuracy ranking'. The tech- niques themselves were then discussed and each subject rank-ordered them by how com- fortable he felt in using the technique. Again, the encoding technique associated with each distribution was not identified. This is shown in Table I under the heading the "average pre- ference ranking'.

Example distributions elicited from two sub- jects are shown in Figs 2 and 3. These are from the subjects with the greatest and the least con- sistency.

5. RESULTS AND ANALYSIS

A test to determine whether or not there was a statistically significant difference in the aver- age ranking of each of the techniques (see Table 1) was used. Because the responses were rankings, the Friedman test was used to make

TABLI" I. AVERAGE ACCURACY AND PREFERENCE RANKINGS

Average A ~erage Technique accuracy ranking preference ranking

Bisection 2.3 2.2 Fractile 2.5 2.3 Probability wheel 3.0 3.4 Cumulative 2.2 2.2 T statistic {Friedmanl 2.72 ~ X-" (3) 6.6 ~ X-" (3t

Page 5: A comparison of encoding techniques

Ome~la, Vol. I I. No. 1 53

1.0-

.9-

.8-

)- t-- .7- -J rn

<m.6- O n.- ~ - . 5 .

< _J

~ . 3 -

O

.1-

- - ~ 66o

Cumula t i ve f

///~.~/Fractile

76o 86o ~o DOW-JONES INDUSTRIAL INDEX

FIG. 2. Example distribution from the subject with the greatest consistet, cy.

the comparison. (For a description of this test, see [2].) There was no statistically significant difference in average accuracy ranks at the 5 ~ significance level, but there was among prefer- ence ranks. The probability wheel was the least preferred.

The concensus (average) distribution was found for each subject, and the Kolmogorov- Smirnoff statistic was then used to compare the techniques. This test compares two sample dis- tributions and gives a measure of the difference between the two distributions. This difference

)-

..J m

O t r

uJ > I-

_J

1.0.

, 9 -

. 8 -

.7 -

.6-

,5-

.4 o

.3,

.2.

.1.

Bisection

Fractile

660 760 a6o 9& DOW-JONES INDUSTRIAL INDEX

FIG. 3. Example distribution from the subject with the least consistency.

Page 6: A comparison of encoding techniques

5-t Brooks. O'Lear.v--.4 Comparison oJ' Encoding Techniques

indicates the extent to which the two distribu- tions differ and is used here to see if there were "outlier" distributions, that is, distributions which seemed to consistently differ from the others. It is used as a distance measure only. (See [2] for more details on this test and [ I I ] for an application comparing encoded distribu- tions.)

The cumulative technique gave results which were most frequently nearest the other distri- butions (five of the eleven) and the bisection technique gave results which were most fre- quently the farthest from the others (seven of the eleven). It is thought that this latter result is due to requiring the subject in the bisection technique to start by specifying the high and low values of the variable. Although subjects felt comfortable with the technique, the results from this technique exhibited the most varia- bility.

6. SUMMARY AND CONCLUSIONS

This study compares four different elicitation techniques using bond brokers as subjects to quantify their judgments about future values of the Dow-Jones industrial average. These tech- niques were compared on the basis of per- ceived accuracy and of preference based on overall comfort with the technique. Certain ob- servations can be offered, if not as conclusions, at least as strong impressions.

It seems that people who are familiar with probability, whether by formal instruction in school or by exposure in work, prefer direct responses to direct assessment questions (fixed value). It further appears that, at least with this group, assessments so obtained were also found to be most representative of the true opinions, Because of some discomfort with the indirect procedures, the subjects sometimes tried to do direct assessment mentally, then translate this into the indirect measure requested. This may have affected the accuracy of these techniques.

As reported in most other articles, the pres- ence of the analyst was important to the results. Two of the more important ingredients an analyst injects into the procedure is time and service as a sounding board. The subjects often wanted to hear statements repeated several times to assist them in properly framing the questions in their minds.

Another important factor in this stud',' was the strength of opinions held bv the assessor at the outset. For those with strong opinions, time spent in discussion had a settling effect and responses became more consistent. For those with vague states of knowledge about the variable, the exact opposite seemed true: the more time and discussion that passed, the more confusion there seemed to be. This is in keep- ing with other findings that persons with exten- sive experience in quantifying uncertainty in one area, such as weather forecasters, are not better than any other assessor when they are not in their area of expertise.

The probability wheel seemed to be the tech- nique which most polarized preferences. The subjects were either very positive or very nega- tive about the technique. This may have been influenced by the manner in which it was used by the analysts, although the procedure was checked prior to the study with people familiar with its use in an attempt to control for any confounding influence. The degree of comfort the subject felt with a technique seemed to be directly linked with his ability to understand the questions intuitively. Discomfort seemed to lead to confusion which led to inconsistent results.

To summarize, preference for particular techniques and the accuracy with which those techniques encode the subject's subjective opinions appear to be at least partially corre- lated.

In particular, subjects who have experience in quantifying uncertainty prefer encoding techniques which are direct rather than indirect (i.e. request a probability value rather than some intermediate response) and they prefer to respond to these techniques directly' by specify- ing the probability value rather than using an indirect technique such as the probability wheel or stated preference. Direct responses to a direct assessment technique seemed to lead to the most representative responses for subjects in our experiment.

Future work in this area should con- sider using only one or two encoding tech- niques with each subject. In this work, no calibration was done due to time constraints with the subjects. Because this could influ- ence the perceived accuracy of the assessment, it might also be included in future compari- sons.

Page 7: A comparison of encoding techniques

Omeqa, [bl. II. No. I 55

REFERENCES

1. A:',;DERSX)>; J & NARASlMHAN R 11979) Assessing project implementation risk: a methodological approach. 3[qmt Sci. 25. 512-521.

2. COYOVER WJ (1980) Practical Nonparametric Statistics. John Wiley, New York.

3. GONEDES N & I/IRI Y (19741 Improving sub- jective probability assessment for planning and con- trol in team-like organizations. Jl Account. Rec. 12, 251-269.

4. HOCIARTH RM [1975) Cognitive processes and assess- ment of subjective probability distributions. Jl Am. Scat. Soc. 70, 271-289.

5. Ht:aea G (1974) Methods for quantifying subjective probabilities and multi-attribute utilities. Decis. Sci. 5, 430-458.

6. LtYl~t, EV D. TVERSKV A & Baow:,," R (1979) On the reconciliation of probability assessments. Jl R. Stat, Soc. B.

7. LUDKE RL, StrAUSS F & GUStaJ:SON D (1977) Com- parison of fi~e methods for estimating subjective prob-

ability distributions. Jl Org. Behat'. Hum. Perf 19. 162-179.

8. SCHAEFER R & BORCHERDING K 11973) Assessment of subjective probability distributions..-lcta psvcholoyica. 117-129.

9. Seovtc P. FtSCHOFE B & LICH~YSTE~N S 119771 Behav- ior decision theory. Ann. Ret'. P.svchol. 28, 1-39.

10. SPETZLER CS & STAEL YON HOLSTEIN C [19751 Prob- ability encoding and decision analysis. ),[gmt Sci. 22, 340-358.

11. WtNI<LER RL (1967) The quantification of judgment: some methodological staggestions. Jl Am. Smt. A.~soc. 62, 1105-1120.

12. WISE J & MOCKOVAK W {1973) Descriptive modeling of subjective probabilities. Jl Org. Behac. Hum. Per./'. 15, 292-306.

ADDRESS FOR CORRESPONDENCE: Professor DG Brooks Department ~'Quantitatice S.vstem~, Colle~le of Business Administration. Arizona State Unicersitv, Tempe, .4Z 8528L USA.