5, no the measurement and control of beta change^

7
Academy of Management Review 1980, Vol 5, No 4, 56J-566 The Measurement and Control of Beta Change^ ARTHUR G. BEDEIAN ACHILLES A. ARMENAKIS ROBERT W. GIBSON Auburn University Applied researchers have long been confronted with the complex statistical problems associated with the measurement of change. A method frequency used in such analyses is the pretest-posttest research design. This approach incorporates the often overlooked assumption that the pretest and posttest scores are comparable. However, for the scores to be comparable, a common metric must exist between them. Distinguishing between three types of change in test scores (alpha, beta, and gamma), we present an original statistical procedure for measuring and controlling the confounding influence of beta change — i.e., the problem of scale recalibration in the minds of respondents. Within the constraints imposed by the limita- tions of scientific method, the primary intent of ap- plied research is to identify causes and to evaluate their effects [Mahoney, 1978]. One method fre- quently employed in such efforts is to compare dependent variable values across time to deter- mine if a behavioral change has taken place as a result of some form of planned treatment or in- tervention. Typically, a researcher then proceeds to select and perform those statistical tests judged appropriate for assessing the degree and extent of change achieved, if any. This approach is based on the assumption that the pre-intervention and post- intervention test scores are comparable. However, for the scores to be comparable, a common metric must exist between them, and the possibility of this requirement being violated is particularly likely in the use of self-report measures. In using self- report measures, as Howard and Dailey note, "researchers assume that a subject's standard for 'We are grateful for the helpful comments of William B. Hud- son, James F. Cox, and Amitava Mitra. © 7900 by the Academy of Management 0363-7425 measurement of the dimension being assessed will not change from one testing to the next (pretest to posttest)." They go on to warn that: If the standard of measurement were to change, the posttest ratings vi/ould reflect the shift in addi- tion to actual changes in the subject's level of functioning. Consequently, comparison of pretest with posttest ratings would be confounded by this distortion of the internalized scale, yielding an in- valid interpretation of the effectiveness of the in- tervention [1979, p.144]. Our purpose here is to present a statistical technique for the measurement and control of such confounding. An obvious and immediate ad- vantage of such a technique is that it will not only be of value in differentiating and identifying the types of change observed, but will also allow researchers to rely more on the validity of their findings. Types of Change Of particular pertinence to our discussion is Golembiewski, Billingsley, and Yeager's [1976] 561

Upload: others

Post on 27-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Academy of Management Review 1980, Vol 5, No 4, 56J-566

The Measurement and Control ofBeta Change^

ARTHUR G. BEDEIAN

ACHILLES A. ARMENAKIS

ROBERT W. GIBSONAuburn University

Applied researchers have long been confronted with the complexstatistical problems associated with the measurement of change. Amethod frequency used in such analyses is the pretest-posttest researchdesign. This approach incorporates the often overlooked assumptionthat the pretest and posttest scores are comparable. However, for thescores to be comparable, a common metric must exist between them.Distinguishing between three types of change in test scores (alpha, beta,and gamma), we present an original statistical procedure for measuringand controlling the confounding influence of beta change — i.e., theproblem of scale recalibration in the minds of respondents.

Within the constraints imposed by the limita-tions of scientific method, the primary intent of ap-plied research is to identify causes and to evaluatetheir effects [Mahoney, 1978]. One method fre-quently employed in such efforts is to comparedependent variable values across time to deter-mine if a behavioral change has taken place as aresult of some form of planned treatment or in-tervention. Typically, a researcher then proceeds toselect and perform those statistical tests judgedappropriate for assessing the degree and extent ofchange achieved, if any. This approach is based onthe assumption that the pre-intervention and post-intervention test scores are comparable. However,for the scores to be comparable, a common metricmust exist between them, and the possibility ofthis requirement being violated is particularly likelyin the use of self-report measures. In using self-report measures, as Howard and Dailey note,"researchers assume that a subject's standard for

'We are grateful for the helpful comments of William B. Hud-son, James F. Cox, and Amitava Mitra.© 7900 by the Academy of Management 0363-7425

measurement of the dimension being assessedwill not change from one testing to the next(pretest to posttest)." They go on to warn that:

If the standard of measurement were to change,the posttest ratings vi/ould reflect the shift in addi-tion to actual changes in the subject's level offunctioning. Consequently, comparison of pretestwith posttest ratings would be confounded by thisdistortion of the internalized scale, yielding an in-valid interpretation of the effectiveness of the in-tervention [1979, p.144].

Our purpose here is to present a statisticaltechnique for the measurement and control ofsuch confounding. An obvious and immediate ad-vantage of such a technique is that it will not onlybe of value in differentiating and identifying thetypes of change observed, but will also allowresearchers to rely more on the validity of theirfindings.

Types of Change

Of particular pertinence to our discussion isGolembiewski, Billingsley, and Yeager's [1976]

561

identification of three types of change: alpha, beta,and gamma. Gamma change refers to thereconceptualization or redefinition of a referentvariable. It occurs when subjects change theirbasic understanding, from one testing period toanother, of the criterion being measured. Thus"peer leadership" may mean something quite dif-ferent at Time 1 as compared to Time 2, especiallyif a planned treatment or intervention was directedat enhancing subjects' understanding of this orother related concepts. Such a redefinition, ofcourse, would make a comparison of pretest andposttest responses virtually meaningless, unlessthe accomplishment of gamma change was, infact, the intended purpose of an intervention. Forexample, if as part of an organization's perfor-mance-evaluation program, managers are requiredto evaluate subordinates on their "peerleadership," it would be important for this con-struct to be interpreted similarly by all raters. Tothis end, an intervention might be planned to in-duce a common understanding among allmanagers concerned.

Beta change occurs when the standard ofmeasurement used by a subject to assess an itemchanges from one testing period to another. Suchchange indicates a recalibration of a subject's in-ternalized scale of measurement. It is this changein the standard of measurement that is our focus inthis paper. Beta change is defined as having oc-curred when, discounting for the occurrence ofgamma change, a subject rates a certain behavioras a 2 (on Likert-type scale) at Time 1 and the iden-tical behavior as a 3 at Time 2.

Finally, alpha change is defined as a ratingchange for which both gamma and beta changehave been ruled out. That is to say, neither the sub-jects' understanding of the criterion beingmeasured nor the measurement scale haschanged. As an example, assume that a group ofmanagers is exposed to a planned intervention andthat as a result their behavior is changed. Assumefurther that ratings of their behavior by subor-dinates have been collected according to a stan-dard pretest-posttest research design. If, afterdetermining that neither gamma change nor betachange has occurred, the researcher observes adifference in subject responses from Time 1 toTime 2, alpha, or real change, can be said to haveoccurred.

Previous research on the identification anddetection of the different types of change has beenconcerned with change as a group occurrence.Golembiewski, Billingsley, and Yeager discussedtypes of change as they related to an entire subjectsample. Building on this foundation, Zmud andArmenakis [1978] developed a procedure for de-tecting as well as distinguishing beta from alphachange as a group phenomenon. However, theability to gauge the confounding influences of betaand gamma change requires the analysis of in-dividual subject responses. To meet this require-ment, we are presenting an original statistical pro-cedure for measuring and controlling the con-founding influence of beta change, or what mightbe more descriptively referred to as the problem ofscale recalibration by respondents. Althoughdeveloped independently, our procedure is con-ceptually similar to Thompson's [1963] approach tocalibrating observer bias in time-study paceestimation.

Actual versus Ideal Scores

The statistical procedure to be described hastwo basic requirements: (1) the collection of sub-ject responses on at least two occasions, and (2)the use of a research instrument incorporating twomeasurement subsets: one to gauge respondentperceptions of actual conditions and one to gaugerespondent perceptions of ideal conditions. Figure1 shows how these requirements can be met,using a sample question taken from the widely us-ed Survey of Organizations (SOO) questionnaire[Taylor & Bowers, 1972]. Our method utilizes bothsets of scores, to obtain evidence of scalerecalibration. We have also developed a recal-ibration function to be used when scale recal-ibration has occurred, whereby pretest and post-test scores can be transformed so as to be com-parable. The intent is to convert (where necessary)scales for Time 1 and Time 2 to the same metric,thus making it feasible, once gamma change hasbeen ruled out, to obtain a measurement of alpha(real) change.

Basic differences in the underlying nature of theconditions that actual and ideal scores are intend-ed to measure should be noted, particularly in theiruse across time. Over time, respondent ratings ofactual conditions may reflect not only alpha

562

To what extent do persons in your work group offer each othernew ideas for solving ¡ob-related problems?^

TIME1

ActualThis is how it is now.

TIME 2

Actual

1 2 3 4 5 1 2 3 4 5

This is how I'd like it to be.Ideal Ideal

1 2 3 4 5 1 2 3 4 5

Figure 1Identifying Beta Change

^Source: Taylor, J.; & Bowers, D. G. Survey of orgartizations. Ann Arbor.CRUSK, Institute for Social Research, University of Michigan, 1972.

change, but also confounding influences of betaand gamma change. In marked contrast, however,respondent ratings of ideal conditions, althoughsusceptible (over time) to beta and gamma change,by definition cannot exhibit alpha change. Anyalteration in a respondent's idealized notion of areferent variable would represent a reconcep-tualization of the particular construct in questionand thus (by definition) be reflected as gammachange.

A Numerical Adjustment

For purposes of brevity, the following illustra-tion will be developed with reference to a simplepretest-posttest research design. Onceunderstood, the basic procedure to be presentedcan be readily generalized to two or moretreatments and to two or more independent vari-ables. The procedure is explicitly based on theassumption that although beta change can be ex-pected to occur over time (i.e., from one testing toanother), it is not expected to occur betweenresponses collected at the same time (i.e., duringone testing). That is, we are assuming that amonotonie relationship exists between actual andideal ratings collected at the same time. Thisassumption builds on the more fundamentalassumption that individuals have an internalizedstandard for judging the psychological distancesbetween scale intervals. This internalized standardcan be expected to differ from one testing to the

next (pretest to posttest), but it is not expected todiffer between responses collected on one occa-sion [Howard, Schmeck, & Bray, 1979].

Step OneBuilding on a simple linear regression model,

where Y' = a -\- bX, Step 1 in the development of theproposed recalibration function is to array the Time1 and Time 2 ideal responses for each subject ac-cording to the following format, where Xjrepresents the Time 1 and Y\ the Time 2 raw scoresfor each questionnaire item.

Respondent No. 1

Time 1 ideal(raw) scores

Time 2 ideal(raw) scores

y3

Item

123

n

step Two

Step 2 calls for the fitting of a linear regressionequation to the above data such that Y'\ = a -\- bX\,where Y\ = Time 2 ideal (raw) scores, a = the in-tercept constant, b = regression coefficient orweight, and Xj = Time 1 ideal (raw) scores. This pro-cedure results in an estimate of the a and b coeffi-cients using the least squares deviation critehon.The answer to the question of how good anestimate these temns provide is found by com-puting the resulting equation's standard error ofestimate, the formulation and explanation of whichcan be found in any text dealing with the basics oflinear regression. Of importance here is the factthat, as an index of variation of dispersion aroundthe line of regression, the standard error ofestimate enables one to state with a specificdegree of certainty how good an estimate of theregression equation has been developed.

Step Three

Having now computed the regression line or theline of best fit describing the relationship betweenthe Time 1 and Time 2 ideal scores for each sub-

563

ject, we are in a position to decide (Step 3) whethersignificant beta change has occurred, and if so, thenature of recalibration necessary. This determina-tion can be divided into several parts and centerson inspection of the linear equation of regressionderived above. Thus, in reference to this equation:

1. If Ö is not significantly different fronn 1 and a isnot significantly different from 0, such that Y' =0 + ^X (i.e.. Time 2 ideal scores = Time 1 idealscores), no beta change has occurred. (Somecomputerized statistical packages provide thenecessary statistics to determine if a is signifi-cantly different from 0 and if b is significantlydifferent from 1 [e.g., Barr, Goodnight, Sail, &Helwig, 1976].) Respondent scale calibrationhas thus been constant.

2. If b is not significantly different from 1 but a issignificantly different from 0, such that Y' = a +1X, simple scale displacement has occurred.This would be a case of scale displacement ofa constant magnitude (i.e., equal to the value ofa'). Thus, between Time 1 and Time 2, all 1'smay have become 2's, 2's have become 3's (orvice versa), and so forth. Graphically, thiswould be shown as a shift (upward ordownward) in the intercept constant:

Ideai Scale

Note that in this case (hereafter referred to asType I beta change), the slope (b) of the regres-sion line (representing the change in Vper unitchange in X), remains the same.

3. Ifö é 1, regardless if a = 0, respondent calibra-tion has not been constant and rescaling isnecessary so that Time 2 scores can be ac-curately compared to Time 1 scores. Situationsin which b é ^ will hereafter be referred to asType II beta change.

4. Type II beta change may take at least twoforms: scale interval stretching and scale inter-val sliding. The former occurs when re-spondents lengthen the psychological dis-tance between scale intervals, so that theirwidth is not constant. Thus, as opposed to hav-ing five equal intervals as shown for Time 1 online (i) below, at Time 2 the respondent's judg-ment scale may be recalibrated in any numberof ways, as in lines (ii) and (iii).

(i)

(ii)

(iii)

1 2 3 4 5

1 2 3 4 5

TIME 1

TIME 2a

TIME 2b4 5

As a result of scale interval stretching, abehavior judged a 4 at Time 1 may receive avalue of 3 at Time 2. The argument advancedby Golembiewski, Billingsley, and Yeager[1976] to explain this occurrence is that the"elasticity of distance" is subject to expansionor contraction as the personal standards ofrespondents change.

To determine if scale interval widths have re-mained fairly constant, it is necessary to com-pute whether there is a significant differencebetween the Time 1 ideal and Time 2 idealrating frequencies. Among the tests availablefor this purpose, the Kolomogorov-Smirnovgoodness-of-fit test (K-S) is particularly ap-propriate [Siegel, 1956]. Concerned with thedegree of agreement between two distribu-tions, the K-S treats individual observationsseparately and, unlike the chi-square test, neednot lose information through the combining ofcategories. Additionally, the K-S is applicableto very small samples. If the Time 1 and Time 2ideal rating frequencies are not significantlydifferent, there is support for the interpretationthat no overall interval stretching (or con-tracting) has occurred from Time 1 to Time 2. If,on the contrary, they are found to besignificantly different, there is then reason tobelieve that respondent scale interval widthshave stretched (or contracted).

5. While scale displacement is a simple displace-ment of equal magnitude across all items,scale interval sliding is the shifting of some,but not all, responses to a higher or lower inter-val category. A primary cause for such slidingor shifting in responses is categorization ordiscrete variable representation — that is, therepresentation of ordinal data as point or inter-val data. As Bohrnstedt and Carter [1971, p. 130]have correctly observed, researchers mayassign the interval values 1,2,3, and 4 to the or-dinal categories definitely disagree, probablydisagree, probably agree, and definitely agree,but in doing so, they are assuming that the in-terval values being employed are monotonical-ly related to the original underlying true (or-dinal) scale. This, of course, may not be the

564

case. In particular, such scale transformationsimply that all respondent selections are ot anequal choice (i.e., all 1's are of equal value, all2's are of equal value, and so on), whereas inreality, some choices may be closer thanothers to the next lower or higher intervalcategory. This is depicted on the scale below.Although it is generally assumed that, for ex-ample, a 3 is simply a 3, respondent choices ac-tually fall within a range of 3.00 to 3.99.

0.5

DefinitelyDisagree

ProbablyDisagree

1,75 •

ProbablyAgree

DefinitelyAgree

3 60 • 4,51 2 3 4

A slight response ambivalence could easilyresult in a sliding or shifting of adjacentanswers (from 3.99 to a 4.00, for example) anda whole category value change.

To determine if overall scale intervalsliding has occurred, it is recommended thatthe means of Time 1 and Time 2 ideal scoresbe compared. If they are not significantly dif-ferent, there is support for the interpretationthat no overall consistent sliding of calibra-tion has occurred from Time 1 to Time 2; andvice versa.

6. To more completely verify the occurrence ofeither of the two forms of beta change, it isnecessary to perform a final calculation. Thisis because, even though the means of theTime 1 and Time 2 ideal scores may not besignificantly different, it is possible for item-to-item shifts or sliding to average out to pro-duce no change in overall mean. For instance,at Time 1 a respondent may score one item a2 and another a 4. At Time 2, the same re-spondent may mark the first item a 4 and thesecond a 2. As a consequence, the values off-set one another (i.e., average out). Moreover,slight shifts to adjacent choices, for instance,from a three to a four, may not show up inoverall mean calculations.

To determine if either of the above has oc-curred, the correlation between Time 1 andTime 2 ideal scores should be computed as ameasure of item response consistency. If theresulting correlation is weak, several explana-tions might be offered: (1) the questionnairewas improperly administered at either or bothTime 1 or Time 2; (2) the questionnaire wasmis-scored; or (3) the questionnaire itemswere interpreted differently at Time 1 andTime 2, suggesting the occurrence of gammachange.

At this point, we have simply determined if thereis a need to recalibrate as a result of beta change. Ifthe Time 1 and Time 2 ideal scores are acceptablycorrelated, and if their rating frequencies andmeans are not significantly different, there is

evidence to suggest that beta change has not oc-curred. Oonversely, if the correlation between theTime 1 and Time 2 ideal scores is acceptable, andtheir rating frequencies and means are significant-ly different, beta change has most likely occurred.When it has, the questionnaire results can yet besalvaged by the transformation to be explained inStep 4.

Finally, if the Time 1 and Time 2 ideal scores arenot acceptably correlated, regardless of whether ornot their rating frequencies and means aresignificantly different, the value of the question-naire for detecting alpha change is at best ques-tionable, for the previously mentioned reasons.

Step Four

Presuming the detection of beta change. Step 4involves the transformation of data so that theTime 1 and Time 2 actual scores will be com-parable. Returning to the linear equation ofregression derived earlier, we now simply insert, asthe independent variable X, the Time 1 actual scalescores into the equation itself to compute an ad-justed or recalibrated Time 1 actual score (Y'j. Thenit is a simple matter to test the adjusted Time 1 ac-tual score against the Time 2 actual score to deter-mine the degree of real or alpha change.

Two final methodological points deserve men-tion. First, in the process described above, we havechosen to adjust the Time 1 actual scores to becomparable to their Time 2 counterpart scores.Given the theoretical base, the opposite couldhave just as easily been performed. Second, in in-stances, such as that below, where scale descrip-tors are not stated as absolutes, it is entirely possi-ble to derive an adjusted score of less than 1 ormore than 5.

To a verysmall extent

1

To a smallextent

2

To someextent

3

To a greatextent

4

To a verygreat extent

5

That is to say, since the scale above does not runfromO = "absolutely none" to 5 + CD = "ultimately ex-tremely all," we are consequently dealing with un-bounded data. Adjusted actual scores that fallbeyond five and below one are thus conceivable.

565

Conclusion

Our purpose has been to present a statisticaltechnique for the detection and measurement ofbeta change. To the extent that the procedure cor-rects for such confounding, certain immediate andobvious advantages can be expected. As sug-gested earlier, the technique developed should be

of value in differentiating and identifying the typesof planned change observed in organizational set-tings. Furthermore, it should allow researchers torely much more on the validity of their findings. Toignore the question of the accurate measurementof change is potentially too costly, for bothorganizational clients and the behavioral sciencesin general.

REFERENCES

Barr, A.J.; Goodnight, J.H.; Sail, J.; & Helwig, J.T. A user'sguide to SAS 76. Raleigh, N.C.: SAS Institute, 1976.

Bohrnstedt, G.W.; & Carter, T.M. Robustness in regressionanalysis. In H.L. Costner (Ed.), Sociological methodology1971. San Francisco: Jossey-Bass, 1971.

Golembiewski, R.T.; Billingsley, K.; & Yeager, S. Measuringchange and persistence in human affairs: Types of changegenerated by OD designs. Journal of Applied BehavioralScience, 1976, 12, 133-157.

Howard, G.S.; & Dailey, P.R. Response-shift bias: A source ofcontamination of self-report measures Journal of AppliedPsychology, 1979, 64, 144-150.

Howard, G.S.; Schmeck, R.R.; & Bray, J.H. Internal invalidity instudies employing self-report instruments: A suggestedremedy. Journal of Educational Measurement, 1979, 16,129-135.

Mahoney, M.J. Experimental methods and outcome evalua-tion. Journal of Consulting and Clinical Psychology, 1978,46,660-672.

Siegel, S. Nonparametric statistics for the behavioralsciences. New York: McGraw-Hill, 1956.

Taylor, J.; & Bowers, D.G. Survey of organizations. Ann Ar-bor: CRUSK, Institute for Social Research, University ofMichigan, 1972.

Thompson, D.A. Calibrating observer bias in time study paceestimation. Journal of Industrial Engineering, 1963, 14,345-346.

Zmud, R.W.; & Anmenakis, A.A. Understanding the measure-ment of change. Academy of Management Review, 1978, 3,661-669.

Arthur G. Bedeian is Associate Professor of Manage-ment at Auburn University, Auburn, Alabama.

Achilles A. Armenakis is Associate Professor ofManagement and Director of the Auburn TechnicalAssistance Center, Auburn University, Auburn, Alabama.

Robert W. Gibson is Adjunct Associate Professor of CivilEngineering at Auburn University, Auburn, Alabama.

Received 10/9/79

566

Copyright of Academy of Management Review is the property of Academy of Management and its content may

not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.