an apology for research integration in the study of ...€¦ · 3. cognitive-behavioral therapy 4....

U n i v e r s i t ä t P o t s d a m

Gene V. Glass, Reinhold M. Kliegl

An apology for research integration in thestudy of psychotherapy

first published in:Journal of Consulting and Clinical Psychology 51 (1983) 1, S. 28-41,ISSN 1939-2117, DOI 10.1037/0022-006X.51.1.28

Postprint published at the Institutional Repository of the Potsdam University:In: Postprints der Universität PotsdamHumanwissenschaftliche Reihe ; 145http://opus.kobv.de/ubp/volltexte/2009/4023/http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-40233

Postprints der Universität PotsdamHumanwissenschaftliche Reihe ; 145

An Apology for Research Integration in theStudy of Psychotherapy

Gene V Glass and Reinhold M. KlieglUniversity of Colorado

Criticisms of the integration of psychotherapy-outcome research performed bySmith, Glass, and Miller (1980) are reviewed and answered. An attempt is madeto account for the conflicting points of view in this disagreement in terms ofcertain issues that have engaged philosophers of science in the 20th century. Itis hoped that, in passing, something useful is learned about research of manytypes on psychotherapy.

The integration of psychotherapy-outcomestudies that eventually led Smith and Glass(1977) to publication with Miller of The Ben-efits of Psychotherapy (1980) was born of dis-satisfactions, enumerated below, with howoutcome research was being pursued andused.

1. The chief occupation of the leading psy-chotherapy-outcome researchers seemed, atthe time, to be quibbling over proper meth-odology; writers who commanded the mostprint were those most adept at writing aboutexperimental design and statistics; andmethod became dogma and overshadowedsubstance.

2. The outcome literature had splinteredand disintegrated; hundreds of unrelated ef-forts seemed to defy integration into anythingthat might address the question of the effi-cacy of competing therapies; indeed, it seemedto be believed that such questions could notor ought not be addressed and that incom-mensurability applied not only to theories ofpsychotherapy but also to its practice.

3. The methodology of outcome researchreflected the worst features of positivist op-erationism: trivial quantification of outcomes(Behavioral Avoidance Tests and fear ther-mometers) devoid of technological impor-tance; reliance on statistical hypothesis test-

We wish to acknowledge the helpful criticisms of Rob-ert Cummins, Department of Philosophy, University ofColorado, who assisted in the early stages of thinkingabout this article.

Requests for reprints should be addressed to Gene VGlass, Campus Box 249, University of Colorado, Boulder,Colorado 80309.

ing as the scientific method; disregard of thesearch for "function forms" (Meehl, 1978,p. 825) of practical and theoretical signifi-cance; belief on the part of psychotherapyresearchers that they were engaged in the con-struction of grand theory about human be-havior (instead of mapping a few "context-dependent stochastologicals" Meehl, 1978,pp. 812-3); and the tendency of researchersto ignore gross inconsistencies in findingsfrom one laboratory to the next and to failto draw the proper implications of suchinconsistency for their field and itsmethodology.

4. The synthesis of psychotherapy-out-come research findings labored under thelimitations of box-score (Light & Smith,1971) counts of "statistically significant" re-sults; it fell far short of extracting from theliterature all that could be learned.

With this sense of the shortcomings of thefield more vaguely felt than explicitly known(it was not until Meehl's, 1978, brilliant pa-per on "slow progress in soft psychology" thatwe began to see more clearly what seemedso futile about the course much outcome re-search was taking), Smith and Glass begantheir attempt to synthesize the huge researchliterature on psychotherapy outcomes. Theintegration they sought (a) would be basedon the widest census of the outcome litera-ture that could be accumulated; (b) wouldtreat methodological rules as empirical gen-eralizations whose value must be verifiedrather than as a priori dogma; and (c) wouldpit all major schools of psychotherapy againstone another in a pragmatic contest whereeconomic value and the concerns of public

28

29

policy would decide whether there was a win-ner. The first attempts (Glass, 1976; Smith& Glass, 1977) at accomplishing this taskwere incomplete and sketchily reported. In1980, Smith, Glass, and Miller published alarger and more detailed analysis of the dataunder the title The Benefits of Psychotherapy.A few of the findings of that analysis are sum-marized here.

A total of 475 controlled evaluations of.psychotherapy were found in the literatureof journals, books, and dissertations. Foreach study, the average difference on the out-come measure between the therapy and con-trol groups was divided by the within-control-group standard deviation among persons toform a measure of the magnitude of the effect(effect size) of the psychotherapy:

A — "^therapy ~ -^control

•^control

Measures on more than one outcome werefrequently reported in a single study, or thesame outcome might be measured immedi-ately after therapy and at a follow-up timemonths later. Thus, there were many moreeffect-size measures than there were studies:in fact there were about 1,760 As from 475experiments.

Most reviewers pursuing some synthesisusing the methods of a bookkeeper or novelistare soon overwhelmed by the flood of infor-mation emanating from even a few studies.We respect the methods of statistics for theirpower to organize and extract informationfrom numbers, and we feel they are not usedenough in synthesizing research. A detailedcoding of dozens of other characteristics ofa study was also performed. A few of thesecharacteristics appear in Table 1, an illustra-tion of a study by Reardon and Tosi (Note1). Each study, thus, was described by a mul-tivariate data set amenable to any statisticalanalysis that might cast a little light on psy-chotherapy outcomes (their magnitude, theircovariation with characteristics of therapies,therapists, clients, methods of research, andthe like). The findings of these analyses aretoo numerous and complex to present herein any detail; indeed, they were too volumi-nous even for a 270-page book. But a fewresults will illustrate the approach.

The average of the effect-size measures was.85 (with a standard error of .03); that is, thedifference in the means between groups re-ceiving psychotherapy of any unspecifiedtype for about 16 hours (the average durationof therapy) and untreated control groups was.85 standard deviation units. A .85 standarddeviation effect can be understood moreclearly by referring it to percentages of pop-ulations of persons. Assume that on a generalmeasure of mental health, persons are dis-tributed according to the normal distribu-tion. If two separate distributions are drawnfor those who receive psychotherapy andthose who do not, our data lead us to expect(other considerations aside for the moment)that the two distributions will be separatedby .85 standard deviations at their means.The average or median of the psychotherapycurve is located above 80% of the area underthe control-group curve. This relationshipindicates that the average person receivingpsychotherapy is better off at the end of itthan 80% of persons who do not. The averageperson, who would score at the 50th percen-tile of the untreated control population,would be expected to be at the 8Qth percentileof the control population after psychother-apy. Little evidence was found for the exis-tence of the negative effects of psychotherapy.Only 9% of the effect-size measures were neg-ative.

Table 2 reports the average effect sizes foreach of 17 therapy types and placebo treat-ment, with the standard deviation of the ef-fects, the standard errors of these averages,and the number of effect sizes. There existmany large differences in the size of effectproduced by the therapies studied.

The highest average effect size, 2.38, wasproduced by cognitive therapies that go bysuch labels as systematic rational restructur-ing, rational state-directed therapy, cognitiverehearsal, and fixed-role therapy. Techniques.used in these therapies include active persua-sion and confrontation of dysfunctional ideasand beliefs. The second highest average effectsize was 1.82 standard deviation units, pro-duced by hypnotherapy. Cognitive-behav-ioral therapies such as modeling, self-rein-forcement, covert sensitization, self-controldesensitization, and behavioral rehearsal werethird highest on the ranking of therapeutic

30

Table 1Classification ofReardon and Tosi (1976)

Publication date

Publication form

Training of experimenter

Blinding

Diagnosis

Hospitalization

Intelligence

Client-therapist similarity

Percent male

Solicitation

Assignment of clients

Assignment of therapists

Experimental mortality

Internal validity

Simultaneous comparison

TVpe of treatment

Confidence of classification

Allegiance

Modality

Location

Duration

Therapist experience

Outcome

Effect size

1976

Journal (although the paper was available in unpublished form, both as adissertation and paper read at a professional meeting, it had been acceptedfor journal publication)

Psychology (inferred from department affiliation)

Experimenter was therapist (judged from report)

High-stress delinquents (experimenter's description), delinquent or felon

None

Average (estimated, in the absence of other information)

Moderately dissimilar (because of students' identification as delinquent, ageand sex differences)

16 (stated)

0% (female population)

Self-presentation in response to advertised services (stated)

Random (stated)

Random (stated)

0% from all groups

High

Treatment and placebo compared against control

(1) "Rational-stage directed imagery" (subclass-cognitive; hypnosis or in-tensive muscle relaxation used as aids to induce rational, cognitiverestructuring)

(2) Placebo-relaxation and suggestion to feel better

Rated 4 (many key concepts associated with Ellis, Kelley, and Raimy, pluspersonal communication with the authors)

Definite allegiance toward the therapy (inferred from tone of report)

Group (stated)

Residential facility (stated)

6 hours over 6 weeks (stated)

3 years (inferred from status as doctoral candidates)

Two outcomes were measured by experimenter: Tennessee Self-Concept Scale(TSCS) and the Multiple Affect Adjective Checklist (MAACL; not includedbecause of insufficient data reporting). TSCS rated as a 4 in reactivity(self-report on measure similar to treatment). Measure was taken im-mediately after therapy and 2 weeks later.

Means from TSCS (total) were obtained from a figure. Estimates of standarddeviation were made using probability values. For the rational-stage di-rected imagery treatment, effect sizes were 1.59 for 0 weeks post and 1.59for 2 weeks post; for the placebo treatment, effect sizes were .74 for 0weeks post and -.20 for 2 weeks post.

effectiveness (A. = 1.13). Systematic desen- feet size of .89 standard deviation units wassitization, primarily used to alleviate phobias, achieved by dynamic-eclectic and eclectic-was next highest (A. = 1.05). An average ef- behavioral therapies. Several therapy types

31

Table 2Average, Standard Deviation, Standard Error, and Number of Effects for Each Therapy Type

Type of therapy Average effect size, A. SD No. of effects («)"

1 . Other cognitive therapies2, Hypnotherapy3. Cognitive-behavioral therapy4. Systematic desensitization5. Dynamic-eclectic therapy6. Eclectic-behavioral therapy7. Behavior modification8, Psychodynamic therapy9. Rational-emotive therapy

10. Implosion11. Transactional analysis12. Vocational-personal development13. Gestalt therapy14. Client-centered therapy15. Adlerian therapy16. Placebo treatment17. Undifferentiated counseling18. Reality therapy

Total

2.381.821.131.05.89.89.7-3.69.68.68.67 /.65.64.62.62.56.28.14

.85

2.051.15.83

1.58.86.75.67.50.54.70.91.58.91.87.68.77.55.38

1.25

.27

.26

.07

.08

.08

.12

.05

.05

.08

.09

.17

.08

.11

.07 , '

.18

.05

.06

.13

.03

5719

12737310337

2011085060285968

15015

200979

1,761

"The number of effects, not the number of studies; 475 studies produced 1,761 effects, or about 3.7 effects perstudy.

yielded average effect sizes clustering aroundtwo thirds of a standard deviation; none wassignificantly different from the others: psy-chodynamic therapy, client-centered therapy,Gestalt therapy, rational-emotive therapy,transactional analysis, implosive.therapy, be*havioral modification, and vocational-per-sonal development counseling.

Placebo treatments (e.g., relaxation train-ing, pseudodesensitization therapy, mini-mum-contact attention control groups) pro-duced an average effect size in comparisonswith untreated control groups of .56 standarddeviation units. Any unadjusted comparisonof this effect'with the effects of the therapieswould be invidious. Placebos were often usedin studies of the treatment of monosympto-matic anxieties. In such studies, the effect ofthe psychotherapy was about twice as greatas the effect of the placebo.

Two types of therapy gave noticeably smalleffects. Undifferentiated counseling (definedas therapy reported without descriptive in-formation or references to a theory) had anaverage effect size of .28. The smallest effectsize in the table is for reality therapy (A. =. 14). However, only one controlled evaluationof this therapy was found. All nine effect sizeswere produced by this one study.

The therapy ,effects of Table 2 are con-trolled in the sense that therapies were com-pared with control groups within studies. Thecomparison of average effect sizes amongtypes of therapies in Table 2 does not controlfor the interaction of therapy effectivenesswith other variables. Persons seeking psycho-therapy help do not randomly assign them-selves to the different types. Some types oftherapy are specifically designed for a narrowrange of psychological problems. Certaintherapies appeal to less seriously disturbedclients. Hence, the differences in therapeuticeffect reported in table 2 reflect variation intherapeutic effectiveness plus its interactionwith client characteristics, diagnostic types,therapist experience, choice of outcome cri-teria, and the like. This is not to say that thecomparisons in Table 2 are without value tothe questions raised about psychotherapy, bylaymen, policy makers, and even profession-als. However, more questions (often thoseraised by researchers) can be answeredthrough control of some of the variation rep-resented in Table 2.

About 50 of the studies identified for themeta-analysis involved comparison of two ormore therapies directly. These studies wereisolated for closer examination in the "same

32

experiment" analyses. Because of the smallnumber of studies involved, the therapieswere aggregated into classes. In 56 experi-ments, the outcome of verbal psychothera-pies (psychodynamic, client-centered, ra-tional-emotive, etc.) were directly comparedwith those of behavioral psychotherapies (sys-tematic desensitization, behavioral modifi-cation, etc.). The experiments yielded 365effect-size measures. Unlike the confoundedcomparisons one can make in Table 2, the365 "same experiment" comparisons areequated in all relevant respects on the verbaland behavioral sides of the ledger. The find-ings at the most general level appear in Table3. This difference (.19 sigma units) betweenverbal and behavioral therapies struck us asbeing quite small, and our saying so appearsto have offended those who seem to believethat the psychotherapy Olympics were longsince over and the laurels were theirs (e.g.,Rachman & Wilson, 1980), even as it pleasedthose (Freudians and Rogerians) who oncebelieved the race had been lost.

The longevity of psychotherapy effects wasstudied by comparing the sizes of effects mea-sured at various follow-up dates. Two thirdsof the 1,700 therapy effects were measuredimmediately after therapy, that is, at "0weeks" follow-up time. Other common fol-low-up dates were 4 weeks (11% of the As),8 weeks (4%), and 52 weeks (2.5%) after ther-apy. Four effects were measured more than10 years after therapy. Effects were averagedin 19 follow-up-date categories extendingfrom immediately after therapy to more than300 weeks after. The average effect was re-gressed onto a quadratic function of time(Weeks plus Weeks Squared). The resultingleast squares solution produced a multiplecorrelation of .78. The graph in Figure 1 de-picts the relationship. The estimated average

Table 3Findings of the "Same Experiment" Analysis

Class of psychotherapy

Findings Verbal Behavioral

Average effect size: A.SD of effects: 0 10 30 50 70 90

WEEKS POST THERAPYno

Figure I. Relationship between measured effect of psy-chotherapy and the number of weeks after therapy atwhich the effect was measured.

effect of psychotherapy is slightly above .90standard deviation units (therapy vs. control)immediately after therapy, and it falls toaround .50 at about 2 years (104 weeks) aftertherapy.

The duration of therapy was recorded foreach study as the number of hours of psy-chotherapy received by the clients. Durationsof therapy ranged from 1 hour to over 300hours. Over two thirds of all effects weremeasured in studies involving 12 or fewerhours of treatment; the mean duration oftherapy was 15.7 hours. The effect of therapybore a complex relationship to its duration.Figure 2 is a graph of the curve relating effectsize to therapy duration. The curve has beensmoothed by a fifth-order moving average.The curvilinear correlation between effectsize and hours of therapy equals .29. The re-markable thing about the curve in Figure 2is that in its most peculiar feature (viz., thedip in effectiveness for therapies of duration

33

l.5r

10 20 30 40 50DURATION OF THERAPY IN HOURS

60

Figure 2. Relationship of effect size and duration of ther-apy. (Curve is derived from a fifth-order moving averageof 1,760 effect-size measures.)

between 10 and 20 hours) it resembles therelationship between duration and gain thatCartwright (1955) observed for 78 subjectsin client-centered therapy and which Taylor(1956) later documented in a separate study.

Some empirical findings about how psy-chotherapy-outcome studies are done and therelationship between methods and study re-sults were as interesting as the findings con-cerning therapy efficacy themselves. For ex-ample, outcome studies differ markedly inthe methods of measuring therapeutic gains.At one end of the continuum lie measure-ment techniques patently subject to bias anddistortion through personal influence, suchas unblinded therapist ratings or clients' be-havior (e.g., touching rats) enacted in thepresence of the experimenter. At the otherend lie outcome measures little subject toself-serving pressure by the experimenter orthe clients' desire to reward the therapist by

appearing to have benefited from treatment(e.g., grade-point average; galvanic skin re-sponse). When the 1,760 effect sizes weresorted into five categories on this "reactivitycontinuum," the average effect sizes in Table4 resulted.

The findings of Table 4 were surprising andsobering. Reactivity of method of measure-ment proved to be the highest single correlateof effect size that Smith et al. observed. Arethe effects of psychotherapy in the publishedliterature puffed up by a good bit of self-con-gratulation and self-deception? Smith et al.(1980) were provoked to the observation thatThe measurement of outcomes seems to have been aban-doned at a primitive stage in its development. Ratingscales are thrown together with little concern expressedfor their psychometric properties. Venerable paper-and-pencil tests invented for diagnosis and with roots plantedvaguely in no particular theory of pathology or treatmentare used to hunt for effects of short-term and highlyspecialized brands of psychotherapy. A superfluity of in-struments exists, and too little is known about them toprefer one to another, (p. 187)

One worries that some outcome experi-ments were designed purposely to make onetype of psychotherapy look good and a sec-ond type look bad. One way of getting at sucha motive is to classify each therapy effect sizeby whether the experimenter felt allegiance(expressed or clearly inferred) to the therapyor felt allegiance to a different therapy com-peting with it in a comparative study. Theaverage effect sizes under these two circum-stances plus a third category of unknownallegiance are reported in Table 5.

The difference in average effect size be-tween therapies favored by the experimenterand those not favored is 50% (.66 vs. .95). Itmight be argued that the efficacy of particular

Table 4Average Effect Size Classified by Each Value of Reactivity of Measurement

Reactivity scalevalue

1 (Low)2 (Low average)3 (Average)4 (High average)5 (High)

Examples of "instruments"

Galvanic skin response, grade-point average,Blind ratings of adjustmentMinnesota Multiphasic Personality InventoryClient self-report to therapist, E-constructed questionnaireTherapist ratings, behavior in presence of therapist

Averageeffect size

.55

.55

.60

.921.19

SEM

.06

.04

.04

.03

.06

No. ofeffects

222219213704397

Note. Average reactivity for all cases: 3.46. Correlation of reactivity and effect size: linear, ri) = .28.

.18; curvilinear,

34

Table 5Average, Standard Deviation, Standard Error, and Number of Effect Sizesfor Experimenter Allegiance

Experimenter allegiance Average effect size, A. SD No. of effects

Allegiance to the therapyAllegiance against therapyUnknown or balanced allegiance

.95

.66

.78

1.46.77.86

.04

.04

.06

1,071479213

therapies won them their followers; it couldbe counterargued that therapies of all typeshave their adherents.

Finally, studies were classified as high,medium, or low in "internal validity" (a com-bination of random assignment, differentialmortality patterns, regression, and otherprinciples of the Campbell-Stanley type).The variation in study findings related to ex-perimental internal validity appear inTable 6.

No reliable differences in magnitude ofeffect can be accounted for by differences indesign validity. There was a slight positiverelationship between effect size and internalvalidity, snowing that the better designedstudies produced larger effects. This trend isopposite of that implied by some critics whomaintain that only poorly designed studiesshow psychotherapy to be effective.

The analyses of the data set are continuing.Researchers with interests in special topicshave acquired it and are performing morerefined analyses. At some point in the future,a new generation of studies may be addedand the analysis may continue.

Responses to the Meta-Analysis

The publication of the meta-analysis ofpsychotherapy outcome research attracted

Table 6Average, Standard Deviation, Standard Error ofthe Mean, and Number of Effect Sizes Classifiedby the Internal Validity of the Study

Ratedinternalvalidity

1 (Low)2 (Medium)3 (High)

Averageeffect

size, A.

.78

.78

.88

SD

.80

.831.42

SEM

.05

.04

.04

No. ofeffects

224378

1,157

praise (Abeles, 1981; Frances & Clarkin,1981; Simon, 1981) and condemnation(Crown, 1981; Eysenck, 1978; Kazrin, Durac,& Agteros, 1979; Presby, 1978; Rachman& Wilson, 1980). That this one study couldhave prompted such different evaluation(Eysenck—"an exercise in mega-silliness";Simon—"Classics in science are rare, but Ipredict this volume owns a distinguished des-tiny.") suggests not that one reviewer is per-spicacious while the other is unaccountablyblind, but that the work can be viewed si-multaneously'as more than one simple thing(unlike, say, an experiment of the effects ofglyoxylate on CO2 fixation in photosyn-thesis). Because the Smith et al. study resem-bled things that some psychotherapy re-searchers are traditionally interested in, theynaturally found it difficult to evaluate inother than traditional terms. They might use-fully have pointed out that the Smith et al.study failed to solve a problem in which theywere interested, but that would have beenquite a different matter from saying that itfailed to solve any problem. In a bad-tem-pered sort of way, biologists may feel that so-ciology, is a waste of good grant money, orsociologists may scorn history as fradulentgossip; but such provincial grumbling is notserious criticism, especially if those whospeak it have failed to recognize that differentinquires have been deliberately shaped bydifferent purposes. Most misapprehensionsof the purposes of the Smith et al. study aredue, perhaps, to the tendency of some criticsto assume, in the absence of a clear statementof rationale, that the purpose of the study wassimilar to their own purposes or those of psy-chotherapy-outcome studies. We hope to cor-rect these misunderstandings here and pro-vide the belated rationale.

In the Benefits of Psychotherapy, Smith etal. took as object field and explananda theliterature (i.e., printed documentation) ofpsy-

35

chotherapy-outcome research, the methods ofstudy employed by researchers, and the useof this literature and methodology by profes-sionals, researchers, laymen and policy~mak-ers. The choices of concern are pragmaticjudgments, value judgments; and the authorstook some pains to distinguish the desiredend products of their inquiry from those thata psychotherapy researcher might falsely as-sume them to be .(Smith et al., 1980, pp.24-27). The literature on psychotherapyoutcomes is distinct from psychotherapy out-comes themselves by a sequence oftranslations too obvious to enumerate. Theimportance of this literature is attested to bythe effort that psychotherapy researchers in-vest in writing it, refereeing it, reading it, andreviewing it. The inquiry that takes as pri-mary objects the documented outcomes ofpsychotherapy and the methods of psycho-therapy-outcome researchers and that takesas explananda the policy decisions that canbe justified by the literature of outcome re-search will necessarily use taxonomies andmethods different from the inquiry that takesas object field the interaction of psychother-apist and client and as explananda the out-comes of this interaction. The distinction canbe made clearer by example: It would havebeen meaningless for Smith et al. to have spo-ken of a published study as being "anxious"or "depressed," whereas the attribution wouldbe perfectly appropriate made by a psycho-therapy researcher about a client in an out-come experiment The construction that weimposed on the psychotherapy-outcome re-search literature was chosen for a specificpurpose, that is, to determine how and inwhat ways the judgments, decisions, and in-clinations of persons (scholars, citizens, of-ficials, administrators, policy makers) oughtto be influenced by the literature of empiricalresearch on the benefits of psychotherapy.This purpose was chosen in part as a remedyfor an habitual inability of psychotherapy re-searchers to rise above partisan squabbles andtheoretical hot-dogging when attempting toinform policy makers. We join Meehl in theopinion that "most so-called 'theories' in thesoft areas of psychology (clinical, counseling,social, personality, community, and schoolpsychology) are scientifically unimpressiveand technologically worthless" (Meehl, 1978,

p. 806). Hence, we could not have sharedmany of the concerns that motivated the au-thors of the primary studies we integrated,their concerns being primarily the construc-tion pf big theory about changing behavior.A naive form of rationalism believes that thebest policy is synonymous with the best sci-ence. One would have thought that Oakeshott(1962) would have put this belief forever torest. Too many psychotherapy researchersseem to believe that "rational" policy is im-possible until encompassing theories are dis-covered and confirmed.

An Apologia for Smith et al.

Empirical inquiry of all sorts shares a basicstructure. One can distinguish (a) the selec-tion and definition of an object field (theevents or things that are to be explained, un-derstood, predicted or whatever), (b) the con-struction of a taxonomy (the definition ofconstructs, words and symbols, "slicing upthe raw behavioral flux," as Meehl, 1978, p.808, put it), and (c) the development of amethodology (techniques of measurement,analysis of information, collection of evi-dence, and the like). The articulation and re-finement of these three elements defines aparticular science. It is a regretable egocentricfailing of many scientists that they are unableto reflect self-consciously on the historicalchoices that have bequeathed to them theirparticular "science," but instead believe thatlogic demands they pursue their inquiriesprecisely as they are pursuing them. Scientistsreadily acknowledge that the selection of anobject field is a choice that is part arbitraryand part historical circumstance. There arefewer scientists, however, who believe thesame for the choice of-a methodology. A sub-stantial part of meta-analysis is concernedwith the investigation of methodological as-sumptions (which are genuinely refutableconjectures, Meehl, 1978, p. 810, and notaxioms at all). Thus, meta-analysis shakes upa scientist's faith in his method. Habermas(1971) argued convincingly that the knowl-edge-constitutive interests that determine, inpart, the selection of a certain methodologyfor science can be derived from the structureand pragmatic needs of the society in whichthe science exists. For example, experimental

36

design of the Fisherian sort, the specificationof independent and dependent variables, theidentification of cause-effect relationships,and the criterion of replicability, are princi-ples growing out of the wish for technologicalcontrol, whether it be control of pi-mesons,doorstops, machines, or human beings.Within a technological society, these meth-odologies are seen as paying greater dividendsthan, for example, passive observation,thought experiments, and Verstehen (em-pathic understanding). The notion that"logic" itself has led a particular group ofscientists to a "proper" methodology hasbeen severely and justly criticized (in partic-ular Feyerabend, 1978; Kuhn, 1962). It cannot be logically proved that biology is a sci-ence and archaeology is not. Voodoo mayhave as much potential for enriching ourknowledge of disease as does the paradigmof Western medicine we have inherited. Theselection of a particular methodology can notguarantee the success of a research program.

The misapprehensions certain critics haveheld concerning the Smith et al. study andmany of their specific criticisms stem, webelieve, from the critics' failure to discern therationale we used in adopting object field,taxonomy, and methodology or from theirinsistence that Smith et al.'s choices in regardto these three elements were illegitimate.Clearly, a scientist's choice of a taxonomy ora methodology is subject to criticism andevaluation (e.g., the taxonomy of astrologyis bloated, inconsistent, and contradictory, orthe playing-card-anticipation technique ofparapsychology is sometimes poorly con-trolled against visual clues). Even one's choiceof object field and explananda are neverabove reproachf as anyone proposing to theNational Institute of Mental Health a studyof the influence of distant planets on maritalaccord would quickly learn. The belief per-sists among some psychologists that the properpursuit of psychological science demands at-tention to a particular object field (e.g., ex-perimental intervention and not naturalisticobservation). That belief is arguable.

These remarks on the roots of methodol-ogy (viz., that they lie in pragmatic humaninterests, not in logic) were made to help clar-ify why meta-analysis takes methodologies aspart of its object field. Meta-analysis tests and

casts doubt on unwarranted methodologicalprinciples. It tests the rationality of holdingbeliefs about various methodological princi-ples, while subscribing to some of the samemethodological principles it tests. In thissense, it breaks no new ground. What pre-vents the endeavor from being uselessly cir-cular or self-refuting is that methodologicalprinciples never exist in abstraction from anobject field. Consequently, it is possible toselect as an object field the usefulness ofmethodological principles applied to a dif-ferent object field (e.g., psychotherapy). In-deed it is done all the time; it is merely dis-concerting to some to see it done empiricallyrather than as a matter of mathematics orlogic prior to empiricism.

From the many complaints and crotchetsthat critics have registered against the Smithet al. study, three general concerns emerge asserious and troubling. Each, we maintain,ceases to be a forceful criticism of the workwhen viewed in the context elaborated uponin the above remarks. The three criticismsare the quality of study problem, the unifor-mity problem, and the incommensurabilityproblem. Glass, with Smith and McGaw, dis-cussed each of these at some length in Meta-Analysis in Social Research (1981, pp.217-26) and with specific reference to psy-chotherapy-outcome research in Benefits ofPsychotherapy (Smith et al., 1980, pp. 27-35). The frequency with which independentcritics raise these issues and the rigidity withwhich the positions are held in the face ofcounterarguments suggest that the miscon-ceptions are deeply rooted in the critics' con-ceptual systems, which include the nature ofscience itself.

The Quality-of-Study Problem

The dogged rejection of the psychotherapy-outcome meta-analysis appears to be rootedin part in the guiding principles that a par-ticular scientific community has adopted.The unifying principles of this communityare primarily methodological as opposed tosubstantive. Uniformity or consistency, thehallmark of this view, is achieved primarilyby insuring potential intersubjective testabil-ity of reported results. The acceptance of thisprinciple is a priori to any research. The be-

37

lief or the conviction is held above questionthat there is a way of doing "correct" and"good" scientific work. To be sure, there areand always will be arguments about flaweddesigns, inappropriate controls, invalid mea-surement, and the like. Disagreements aboutfindings are settled (if indeed any attempt atall is made to settle them) by replications ofthe object studies with alternative designs andmeasures. Disagreements about the adequacyof a methodological assumption are settledby giving an alternative assumption a chancein replication. Meta-analysis treats method-ological assumptions of object studies as partof an object field in itself, that is, as a pos-terioris. This different point of view permitsone to suspend judgment about studies thatothers might regard as "a priori bad." Rach-man and Wilson (1980, p. 253), for example,impute to others doubts about the entire dis-sertation literature, while professing broad-mindedness on their own behalf. Indeed,some reviewers of psychotherapy research ig-nore all dissertation reports on the groundsthat they are undependable, a judgment thatis surely a matter of empirical test rather thana priori conjecture. If design "flaws" are cru-cial, they will show a correlation with studyfindings expressed as effect sizes. The weak-nesses of method need not be judged a priori.

Viewing the Smith et al. meta-analysis thisway, the argument "garbage-in—garbage-out"(Eysenck, 1978) is not only trite but besidethe point. Such a judgment reveals that itsauthor can not conceive of treating designproperties as something that can be empiri-cally studied rather than merely debated. Inthis respect, Eysenck's (1978) claim thatSmith and Glass advocated "low standards"for research quality and "abandoned schol-arship" can be understood as the opinion ofone to whom methodology is dogma.I would suggest that there is no single study in existencewhich does not show serious weaknesses, and until theseare overcome I must regretfully restate my conclusionof 1952, namely that there still is no acceptable evidencefor the efficacy of psychotherapy (Eysenck, 1978, p. 517).

It is relevant that Eysenck (1981) continuesto dispute evidence on the causal link be-tween smoking and lung cancer and stakeshis argument on narrow methodologicalgrounds long after reasonable people haveacknowledged that a less-than-perfect study

(U.S. Public Health Service, 1963) nonethe-less produced an eminently credible conclu-sion.

The empirical status of methodologicalprinciples can only be studied where thereare sufficient studies under various method-ological circumstances to permit estimationof the relationship between the principle andstudy findings. Such principles cannot bestudied in a single object study (focused onpsychotherapeutic processes) for the samereason that it would be impossible to drawstrong theoretical conclusions about humanbehavior on the basis of experience with oneperson. Hence, methods remain a priori forobject studies, and one must respect them.However, at the level of meta-analysis, thenecessity and justification of the method-ological principles of the object study becomethe point of concern. One can be grateful forsome less-than-perfect designs since the re-lationship of the methodological principleswith the study findings cannot be studiedunless the principles are satisfied to varyingdegrees. Rather than "garbage-in—garbage-out," meta-analysis examines that which isgarbage when judged by a priori standards."Garbage-in—information out" might benearer the truth.

The a priori considerations of a meta-anal-ysis are different from those entering the de-sign of a therapeutic processes object study.However, meta-analysis subscribes to thesame methodological dogmas as does psy-chotherapy research at the object level (mea-surement validity, control of extraneous in-fluences either by statistical correction or ex-perimental arrangements, and the like). Anumber of criticisms have been directed atSmith et al.'s work on this level. For example,an integrative analysis ought to be represen-tative; relevant studies should not be over-looked. Smith et al. were said to have missedsome studies that Rachman and Wilson(1980, pp. 251-2) regarded as "major, well-controlled investigations." Moreover, theyargued that these oversights disadvantagedstudies of behavioral therapy. Such a claimis fully appropriate and bears checking. Sincemeta-analysis takes as object field a static andtangible body of documents, it is often simpleto Include or exclude certain studies or char-acteristics of studies and perform new anal-

38

yses, somewhat in the manner of astronomy,where the same data are subjected to manyanalyses. The data base constructed by Smithet al. has already been borrowed, expanded,trimmed, or receded by a dozen different re-searchers seeking to answer new questions orold questions with better methods (for ex-amples, see Andrews & Harvey, 1981; Shap-iro & Shapiro, 1982).

The Uniformity Problem

The assertion was repeatedly made thatSmith et al.'s work was invalidated by theirmistaken belief in some sort of "myth of uni-formity," that is, that they believed persons,therapists, therapies, and pathologies to besomehow all the same and not worth distin-guishing among.

Smith and Glass subscribe to what Kiesler (1966)called the "uniformity assumption myths" of psycho-therapy research and evaluation. Nowhere is this moredamaging than with respect to measures of therapy out-come. . . .

Other objections to the Smith and Glass meta-analysisinvolve the confusing mixture of patients and problems.No attempt is made to distinguish between the effect ofthis medley of treatments on schizophrenics, or alco-holics or adolescent offenders, or under-achieving collegestudents, or phobic psychiatric patients, or subnormalpatients, or patients suffering from migrane, or fromasthma. . . . " Evidently we have travelled a great dis-tance from Paul's (1967) recommendation that we eval-uate the effectiveness of the particular technique for theparticular problem of the particular person, all the wayto a spreading sludge of diverse problems. (Rachman& Wilson, 1980, pp. 253-254)

Or consider Bandura's (1978) criticisms(delivered with nearly the same dip in theinkwell as his condemnation of Smith andGlass, 1977, for having mixed apples andoranges) of what we regard to be the finestsingle evaluation of psychotherapy outcomesin the literature:A widely publicized study by Sloane, Staples, Cristol,Yorkston, and Whipple (1975) comparing the relativeefficacy of behavioral therapy and psychotherapy, simi-larly contains the usual share of confounded variables,unmatched mixtures of dysfunctions, and inadequatelymeasured outcomes relying on amorphous clinical rat-ings rather than on direct assessment of behavioral func-tioning. As is now predictable for studies of this type,the different forms of treatment appear comparable andbetter than nothing on some of the global ratings but noton others. With such quasi-outcome measures even thecontrols, who receive no therapeutic ministrations,achieve impressive improvement. Based on this level of

research, weak modes of treatment are given a new leaseon life for those who continue to stand steadfastly bythem. (p. 87)

Criticisms of these types reflect a funda-mental misunderstandng of scientific inquiry.In a scientific inquiry, things that will be dis-tinguished and things whose distinctness willbe ignored are embodied in the choice ofobject field and taxonomy. To a physicist, alamina of which the center of gravity issought may perfectly well be assumed uni-form in density; an engineer testing materialsfor breaking strength would surely not makethe same assumption. Organisms assumed tobe uniform in some sense by a physiologicalpsychologist would not necessarily be so re-garded by a histologist. The elaboration ofall nonuniformity is an endless task that nosensible scientist ever attempts; knowledgedemands simplification. The criticism that aparticular inquiry fails to draw a set of criticaldistinctions can only be defended in the con-text of the object field and explananda beinginvestigated.

Smith et al. have not the slightest belief inuniformity; if they had, they would neverhave employed statistics, that prima facieproof of disbelief in uniformity (likewiseused, incidentally, by everyone who criticizedSmith et al. for believing in uniformity). Theirony of the criticism is twofold. We wouldhave liked to have drawn many more dis-tinctions among therapies, instances of a par-ticular therapy, clients, therapists, and the likethan we did. Unfortunately, such distinctionsare largely impossible in an investigation ofthe documented literature of psychotherapyoutcomes because the distinctions are notrecorded by psychotherapy researchers in theprimary studies. And they are not drawnthere, we claim, because of the naive beliefsof logical empiricists that understanding ofpsychotherapeutic processes can be com-municated in the operationist shorthand thatdefines the contemporary culture of researchjournals. Furthermore, Smith et al.'s beliefin «o«uniformity of human thought and ac-tion far surpasses that of their critics (Glass,1979). Many psychotherapy researchers ap-pear to believe that the intransigent variabil-

1 Results are reported separately for categories suchas these in Smith et al. (1980).

39

ity and unpredictable quality of human be-havior will be tamed by a multifactor analysisof variance design leading to simple gener-alizations of the form this type of therapywith this type of client will yield this type ofoutcome (Kiesler, 1966; Paul, 1967). Or theyseem to believe that consistency and unifor-mity will emerge once we undertake the"direct assessment of behavioral function-ing." As regards all contemporary theories ofhuman behavior, they strike us as naivelysimple and grossly uniform. In our defensewe cite Cronbach (1975), Bowers (1973), andGergen (1973). ,

The Incommensurability Problem

Research integration is faced with the taskof comparing studies that might have differedin their authors' intentions in respect to ob-ject field, taxonomy, and methodology. Dis-similar intentions would seem to imply a fun-damental incomparability, but it does not.This aspect of meta-analysis probably givesrises to the angriest criticisms. Rachman andWilson pounced on Cooper's (1979) stipu-lation of the conditions that must be met fora meta-analysis to make sense: The researchstudies to be integrated must "a) share a com-mon conceptual hypothesis or b). . , shareoperations for the realization of the indepen-dent or dependent variables, regardless ofconceptual focus," (p. 133). Rachman andWilson took Cooper's dicta as proof thatSmith and Glass's (1977) integration of out-come studies was illogical and useless—thatit failed even to meet meta-analysis's ownrequirements. Rachman and Wilson's use ofCooper against Glass and Smith is ironic andmischievous, Having coined the term "meta-analysis" (Glass, 1976) and first applied it tothe integration of psychotherapy-outcome re-search (Smith & Glass, 1977), Glass andSmith might be justified in claiming someproprietary rights for deciding what it meanswhen they use it. Of course, there are no pro-prietary rights to ideas in science; but ob-servers will see inconsistencies where they donot truly exist if they fail to distinguish whatone person judges proper research synthesisto be from what someone else thinks.

In the framework of science identified inthis article, Cooper would set as a condition

of comparability the identity of object field,taxonomy, and methodology. Thus, Cooper'sstipulation is clearly inappropriate for thepurposes of meta-analysis; indeed, if takenseriously, it would virtually preclude anymeta-analysis. The more common criticismof meta-analysis (viz., that it errs in mixing"apples and oranges") is best understood bydistinguishing theoretical from practical com-mensurability. The former is a long-standingpoint of debate in the philosophy of science,and the best that can be said of progress to-ward the solution of the problem is that therehas been little. It occupied Kuhn (1962) inThe Structure of Scientific Revolution, whodespaired of simple answers; and it will un-doubtedly be some time before a popularposition emerges. Practical commensurabil-ity, on the other hand, is trivial by compar-ison. Conceptually, it poses problems in thephilosophy of values, where some of the mostproductive thinking in philosophy has beenpursued in recent decades. Fortunately, prac-tical commensurability depends in no way ontheoretical commensurability.

No matter what Smith et al. might havefound, their conclusions would not have pro-vided anything but the most dubious evi-dence for or against the validity of any theoryof human behavior, whether biological, be-haviorist, cognitivist, or what-have-*you. Theequality of benefits—if that were observed—surely would say nothing about the equalityof theories in respect to heuristic power orcapacity to grow. Nor, one must add quickly,would the superiority of effects for A versusB imply the theoretical superiority of A. Themeta-analysis addressed a set of practical pol-icy questions. Smith et al. (1980) erred whentheir rhetoric slipped momentarily off thesequestions and into the domain of traditionalpsychological theory:

We did not expect that the demonstrable benefits of quitedifferent types of psychotherapy would be so little dif-ferent. It is the most startling arid intriguing finding wecame across. All psychotherapy researchers should beprompted to ask how it can be so. If it is truly so thatmajor differences in technique count for little in termsof benefits, than what is to be made of the volumes de-voted to the careful drawing of distinctions among stylesof psychotherapy? And what is to be made of the deepdivisions and animosities among different psychotherapyschools?

These are the kinds of sweeping questions that too

40

often evoke trite and thoughtless answers. Perhaps wecan avoid both. We regard it as clearly possible that allpsychotherapies are equally effective, or nearly so; andthat the lines drawn among psychotherapy schools aresmall distinctions, cherished by those who draw them,but all the same distinctions that make no importantdifferences. Those elements that unite different types ofpsychotherapy may prove to be far more influential thanthose specific elements that distinguish them. (pp.185-186)

This statement, drawn from Smith et al.'schapter of conclusions, crosses over from theinquiry on what claims can be rationally sup-ported from the literature of psychotherapy-outcome research to the inquiry into humanbehavior, its antecedents and consequences.Add to this the fact that some toes were trodupon and the angry reactions of some criticsbecome more understandable. Perhaps Smithet al. should forgive their critics a few mis-apprehensions about object fields and ex-plananda when they occasionally lost trackthemselves of the precise questions theycould and could not answer. They did, how-ever, properly address questions of the form,"Does a person who professes to administerGestalt psychotherapy achieve benefits im-portantly and demonstrably superior to thoseproduced by a therapist who calls his or herapproach 'cognitive behavioral'?" The valid-ity of answers to this question is supportedby the answers to several auxiliary questions,for example, how does psychotherapy inpractice compare with psychotherapy inprint? How does "publication bias" distortone's perception of the efficacy of psycho-therapy? These questions are clearly differentfrom the question, presumably of most con-cern to several critics who were upset by theSmith et al. study, whether theory of humanbehavior A, B, or C is more valid. To un-derline how separate are the questions ad-dressed in the psychotherapy meta-analysisfrom those addressed by various theories ofpsychotherapy, the first author confesses thatin regard to theories of human behavior hispredilections agree with Meehl's (1978): "Ido have a soft spot in my hea r t . . . for psy-choanalysis." (p. 829). What makes this per-sonal revelation more than gratuitous is thatthere is nothing in the least inconsistentabout believing, as the first author does, thatpsychoanalysis is by far the best theory ofhuman behavior and acknowledging that

there exists in the Smith et al. data base nota single experimental study that would qual-ify by even the shoddiest standards as an out-come evaluation of orthodox psychoanalysis.(For more on the distinction between eval-uating pragmatic claims and testing theoriesin the context of psychoanalysis, see Scriven,1959.)

Conclusion

This article is an apology to those scientistswho have worked at psychotherapy researchover the years and who saw things which, theymay not have approved of done to their ef-forts for reasons that were unclear. In ourdefense, we insist that our purposes were le-gitimate and violated no canons of science,even if we were derelict in stating the pur-poses clearly. In an attempt to correct anymisapprehensions our work may have fos-tered, we have offered the remarks in thisarticle to our critics in the spirit of, as theFrench say, "to understand all is to forgiveall."

Reference Note

1. Reardon, J. P., & Tosi, D. J. The effects of rationalstage directed imagery on self concept and reductionof psychological stress in adolescent delinquent fe-males. Unpublished manuscript, Ohio State Univer-sity, 1976.

References

Abeles, N. Psychotherapy works! Contemporary Psy-chology, 1981, 26, 821-23.

Andrews, J. G., & Harvey, R. Does psychotherapy ben-efit neurotic patients? A reanalysis of the Smith, Glass,and Miller data. Archives of General Psychiatry, 1981,38, 951-962.

Bandura, A. On paradigms and recycled ideologies. Cog-nitive Therapy and Research, 1978, 2, 79-103.

Bowers, K. S. Situationism in psychology: An analysisand critique. Psychological Review, 1973, 80, 307-36.

Cartwright, D. S. Success in psychotherapy as a functionof certain actuarial variables. Journal of ConsultingPsychology, 1955, 19, 357-63.

Cooper, H. M. Statistically combining independent stud-ies: A meta-analysis of sex differences in conformityresearch. Journal of Personality and Social Psychology,1919,37, 131-135.

Cronbach, L. J. Beyond the two disciplines of scientificpsychology. American Psychologist, 1975, 30, 116-27.

Crown, S. Review of The Benefits of Psychotherapy. TheBritish Journal of Psychiatry, 1981, 139, 199.

Eysenck, H. J. An exercise in mega-silliness. AmericanPsychologist, 1978, 33, 517.

41

Eysenck, H. J. Causes and effects of smoking. BeverlyHills, Calif.: SAGE, 1981.

Feyerabend, p. Against method. London: Verso, 1978.Frances, A., & Clarkin, J. Review of The Benefits of

Psychotherapy. Hospital and Community Psychiatry,1981, 32, 807-8.

Gergen, K. J. Social psychology as history. Journal ofPersonality and Social Psychology, 1973, 26, 309-20.

Glass, G. V. Primary, secondary and meta-analysis ofresearch. Educational Researcher, 1976, 5, 3-8.

Glass, G. V. Policy for the unpredictable: Uncertaintyresearch and policy. Educational Researcher, 1979,8,12-14.

Glass, G. V, McGaw, B., & Smith, M. L. Meta-analysisin social research. Beverly Hills, Calif.: SAGE Publi-cations, 1981.

Glass, G. V, & Smith, M. L,. Meta-analysis of researchon the relationship of class-size and achievement. Ed-ucational Evaluation and Policy Analysis, 1979,1,2-16.

Habermas, J. Knowledge and human interests. Boston:Beacon Press, 1971.

Kazrin, A., Durac, J., .& Agteros, T. Meta-meta analysis:A new method for evaluating therapy outcome. Be-havior Research and Therapy, 1979, 17, 397-99.

Kiesler, D. J. Some myths of psychotherapy research andthe search for, a paradigm. Psychological Bulletin,1966,65, 110-36.

Kuhn, T. S,. The structure of scientific revolutions. Chi-cago: University of Chicago Press, 1962.

Light, R, J., & Smith, P. V. Accumulating evidence: Pro-cedures for resolving contradictions among differentresearch studies. Harvard Educational Review, 1971,47,429-71.

Meehl, P. E. Theoretical risks and tabular asterisks: SirKarl, -Sir Ronald, and the slow progress of soft psy-chology. Journal of Consulting and Clinical Psychol-ogy, 1978, 46, 806-34.

Oakeshott, M. J. Rationalism in politics, and other es-says. London: Methuen, 1962.

Paul, G. L. Strategy of outcome research in psycho-therapy, Journal of Consulting Psychology, 1967, 31,109-18.

Presby, S. Overly broad categories obscure importantdifferences between therapies. American Psychologist,1978, 33, 514-5.

Rachman, S. J., & Wilson, G. T. The effects of psycho-logical therapy: second enlarged edition. Oxford: Per-gamon Press, 1980.

Scriven, M. The experimental investigation of psycho-analysis. In Hook, S. (Ed.), Philosophy, scientificmethod and psychoanalysis. N.Y.: New York Univer-sity Press, 1959. x

Shapiro, D. A., & Shapiro, D. Meta-analysis of com-parative therapy outcome studies: A replication andrefinement. Psychological Bulletin, 1982, 92, 581-604.

Simon, J. Review of The benefits of psychotherapy.American Journal of Psychiatry, 1981, 138, 1399-1400.

Sloane, R. B., Staples, F. R., Cristol, A. H., Yorkston,N. J., & Whipple, K. Psychotherapy versus behaviortherapy. Cambridge, Mass.: Harvard University Press,1975.

Smith, M. L., & Glass, G. V. Meta-analysis of psycho-therapy outcome studies. American Psychologist, 1977,32, 752-60.

Smith, M, L,, Glass, G. V, & Miller, T. I. The Benefitsof Psychotherapy. Baltimore, Md.: Johns Hopkins Uni-versity Press, 1980.

Taylor, J. W. Relationship of success and length in psy-chotherapy. Journal of Consulting Psychology, 1956,20, 332.

U.S. Public Health Service, Smoking and health. Reportof the Advisory Committee to the Surgeon General.Washington, D.C.: U.S. Government Printing Office,1963.

Received March 26, 1982Revision received April 19, 1982 •

TitleResponses to the Meta-AnalysisAn Apologia for Smith et al.The Quality-of-Study ProblemThe Uniformity ProblemThe Incommensurability ProblemConclusionReference NoteReferences

an apology for research integration in the study of ...€¦ · 3. cognitive-behavioral therapy 4....

Documents