bias in labeling effect

14
Journal of Personality and Social Psychology 1983, Vol. 44, No. I, 20-33 Copyright 1983 by the American Psychological Association, Inc. 0022-3514/83/4401-0020S00.75 A Hypothesis-Confirming Bias in Labeling Effects John M. Darley and Paget H. Gross Princeton University The present study examines the process leading to the confirmation of a per- ceiver's expectancies about another when the social label that created the expec- tancy provides poor or tentative evidence about another's true dispositions or capabilities. One group of subjects was led to believe that a child came from a high socioeconomic background; the other group, that the child came from a low socioeconomic background. Nothing in the socioeconomic data conveyed infor- mation directly relevant to the child's ability level, and when asked, both groups of subjects reluctantly rated the child's ability level to be approximately at her grade level. Two other groups received the social-class information and then wit- nessed a videotape of the child taking an academic test. Although the videotaped performance series was identical for all subjects, those who had information that the child came from a high socioeconomic background rated her abilities well above grade level, whereas those for whom the child was identified as coming from a lower class background rated her abilites as below grade level. Both groups cited evidence from the ability test to support their conflicting conclusions. We interpret these findings as suggesting that some "stereotype" information (e.g., socioeconomic class information) creates not certainties but hypotheses about the stereotyped individual. However, these hypotheses are often tested in a biased fashion that leads to their false confirmation. The expectancy-confirmation process is an important link in the chain leading from so- cial perception to social action (Darley & Fazio, 1980; Rosenthal & Jacobson, 1968; Snyder & Swann, 1978a). As research has demonstrated, two processes leading to the confirmation of a perceiver's beliefs about another can be identified. The first, called a "behavioral confirmation effect" (Snyder & Swann, 1978b), is consistent with Merlon's (1948) description of the "self-fulfilling prophecy." In this process, perceiver's behav- iors toward the individual for whom they hold an expectancy channel the course of the interaction such that expectancy-confirming behaviors are elicited from the other individ- ual (Rosenthal, 1974; Snyder, Tanke, & Ber- scheid, 1977). The second process leads to The authors are grateful for the insightful comments of Nancy Cantor, Ron Comer, Joel Cooper, E. E. Jones, Charles Lord, Mark Zanna, and the members of the Princeton Social Psychology Research Seminar. Robin Akert, Kristin Boggiano, Paul Bree, Kay Ferdinandsen, Hannah McChesney, and Frederick Rhodewalt ably as- sisted in creating the stimulus materials. Requests for reprints should be sent to John M. Dar- ley, Department of Psychology, Princeton University, Princeton, New Jersey 08544. what we may call a "cognitive confirmation effect." We use this term to refer to expec- tancy-confirmation effects that occur in the absence of any interaction between the per- ceiver and the target person. In these cases, perceivers simply selectively interpret, attrib- ute, or recall aspects of the target person's actions in ways that are consistent with their expectations (Duncan, 1976; Kelley, 1950; Langer & Abelson, 1974). Thus, perceivers with different expectancies about another may witness an identical action sequence and still emerge with their divergent expectancies "confirmed." The focus of the present article is on the mediation of cognitive confirmation effects. We suggest that there are at least two different processes that bring about the cognitive con- firmation of expectancies. The key to sepa- rating these processes lies in recognizing that people distinguish between the kinds of in- formation that create conceptions of other people. Perceivers may define a continuum, one end of which involves information that is seen as a valid and sufficient basis for judg- ments about another; at the other end is ev- idence that is seen as a weak or invalid basis for those judgments. 20

Upload: raluca-cocut

Post on 12-Nov-2014

37 views

Category:

Documents


1 download

DESCRIPTION

About labeling

TRANSCRIPT

Page 1: Bias in Labeling Effect

Journal of Personality and Social Psychology1983, Vol. 44, No. I, 20-33

Copyright 1983 by the American Psychological Association, Inc.0022-3514/83/4401-0020S00.75

A Hypothesis-Confirming Bias in Labeling Effects

John M. Darley and Paget H. GrossPrinceton University

The present study examines the process leading to the confirmation of a per-ceiver's expectancies about another when the social label that created the expec-tancy provides poor or tentative evidence about another's true dispositions orcapabilities. One group of subjects was led to believe that a child came from ahigh socioeconomic background; the other group, that the child came from a lowsocioeconomic background. Nothing in the socioeconomic data conveyed infor-mation directly relevant to the child's ability level, and when asked, both groupsof subjects reluctantly rated the child's ability level to be approximately at hergrade level. Two other groups received the social-class information and then wit-nessed a videotape of the child taking an academic test. Although the videotapedperformance series was identical for all subjects, those who had information thatthe child came from a high socioeconomic background rated her abilities wellabove grade level, whereas those for whom the child was identified as comingfrom a lower class background rated her abilites as below grade level. Both groupscited evidence from the ability test to support their conflicting conclusions. Weinterpret these findings as suggesting that some "stereotype" information (e.g.,socioeconomic class information) creates not certainties but hypotheses aboutthe stereotyped individual. However, these hypotheses are often tested in a biasedfashion that leads to their false confirmation.

The expectancy-confirmation process is animportant link in the chain leading from so-cial perception to social action (Darley &Fazio, 1980; Rosenthal & Jacobson, 1968;Snyder & Swann, 1978a). As research hasdemonstrated, two processes leading to theconfirmation of a perceiver's beliefs aboutanother can be identified. The first, called a"behavioral confirmation effect" (Snyder &Swann, 1978b), is consistent with Merlon's(1948) description of the "self-fulfillingprophecy." In this process, perceiver's behav-iors toward the individual for whom theyhold an expectancy channel the course of theinteraction such that expectancy-confirmingbehaviors are elicited from the other individ-ual (Rosenthal, 1974; Snyder, Tanke, & Ber-scheid, 1977). The second process leads to

The authors are grateful for the insightful commentsof Nancy Cantor, Ron Comer, Joel Cooper, E. E. Jones,Charles Lord, Mark Zanna, and the members of thePrinceton Social Psychology Research Seminar. RobinAkert, Kristin Boggiano, Paul Bree, Kay Ferdinandsen,Hannah McChesney, and Frederick Rhodewalt ably as-sisted in creating the stimulus materials.

Requests for reprints should be sent to John M. Dar-ley, Department of Psychology, Princeton University,Princeton, New Jersey 08544.

what we may call a "cognitive confirmationeffect." We use this term to refer to expec-tancy-confirmation effects that occur in theabsence of any interaction between the per-ceiver and the target person. In these cases,perceivers simply selectively interpret, attrib-ute, or recall aspects of the target person'sactions in ways that are consistent with theirexpectations (Duncan, 1976; Kelley, 1950;Langer & Abelson, 1974). Thus, perceiverswith different expectancies about anothermay witness an identical action sequence andstill emerge with their divergent expectancies"confirmed."

The focus of the present article is on themediation of cognitive confirmation effects.We suggest that there are at least two differentprocesses that bring about the cognitive con-firmation of expectancies. The key to sepa-rating these processes lies in recognizing thatpeople distinguish between the kinds of in-formation that create conceptions of otherpeople. Perceivers may define a continuum,one end of which involves information thatis seen as a valid and sufficient basis for judg-ments about another; at the other end is ev-idence that is seen as a weak or invalid basisfor those judgments.

20

Page 2: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 21

As an example of valid information, con-sider a teacher who receives the results of astandardized test indicating that a particularpupil has high ability. The expectancies thisinformation creates about the child are as-sumed to reflect the child's actual capabilitiesand are probably quite automatically ap-plied. At the other end of the continuum, andof primary interest to this article, is expec-tancy-creating information that most per-ceivers would regard as incomplete with re-spect to an individual's abilities or disposi-tions. Many of our social stereotypes fall intothis category. For example, racial or social-class categories are regarded by most of usas an insufficient evidential basis for conclu-sive judgments of another's dispositions orcapabilities. In this case, we suspect that per-ceivers are highly resistant to automaticallyapplying their expectancies to a target person.A teacher, for example, would be extremelyhesitant to conclude that a black child hadlow ability unless that child supplied directbehavioral evidence validating the applica-tion of the label.

The end of the continuum defining infor-mation that is seen as insufficient evidencefor social judgments is of interest because wefind what appears to be a paradox in the lit-erature dealing with social stereotypes. Somerecent investigations of the influence of ste-reotypes on social judgments have demon-strated a "fading" of stereotypic attributions(e.g., Karlins, Coffman, & Walters, 1969;Locksley, Borgida, Brekke, & Hepburn, 1980).For example, investigators have noted par-ticipants' increasing unwillingness to makestereotypic trait ascriptions (Brigham, 1971).Moreover, Quattrone and Jones (1980) dem-onstrate that although people may make ste-reotype-based judgments about a social group,they are unwilling to use category-based in-formation to predict the behavior of any onemember of that group.

Given this resistance to the utilization ofexpectancies when the social labels establish-ing them are not seen as valid guides for judg-ments, one might expect an elimination ofthe expectancy-confirmation bias. That is,perceivers would not unjustly assume thetruth of a stereotype; they would instead re-quire that evidence substantiating the accu-racy of that stereotype be provided. This

leads to the prediction that, ultimately, judg-ments about the target person will reflect theactual evidence produced by his or her be-havior, unbiased by the perceivers' initial ex-pectancies. Unfortunately, this conclusionstands in contradiction to the bulk of the self-fulfilling prophecy literature in which onefinds that confirmation effects are often pro-duced when racial, ethnic, or other negativesocial labels are implicated—exactly thosecases in which one expects perceivers to re-frain from using category-based information(e.g., Foster, Schmidt, & Sabatino, 1976; Rist,1970; Rosenhan, 1973; Word, Zanna, &Cooper, 1974). We suggest that this apparentcontradiction can be resolved if the followingtwo-stage expectancy-confirmation process isassumed: Initially, when perceivers have rea-son to suspect that the information that es-tablishes an expectancy is not diagnosticallyvalid for determining certain of the targetperson's dispositions or capabilities, they willrefrain from using that information to cometo diagnostic conclusions. The expectanciesfunction not as truths about the target personbut rather as hypotheses about the likely dis-positions of that person. If perceivers wereasked for judgments at this point in the pro-cess, without any behavioral evidence to con-firm their predictions, they would not reportevaluations based on their expectancies. Theywould instead report that either they did nothave sufficient information or they wouldmake judgments consistent with normativeexpectations about the general population.

The second stage occurs when perceiversare given the opportunity to observe the ac-tions of the labeled other. They then can testtheir hypotheses against relevant behavioralevidence. The initiation of a hypothesis-test-ing process would seem to be an unbiasedapproach for deriving a valid basis for judg-ments about another. If, however, individualstest their hypothesis using a "confirmingstrategy"—as has often been demonstrated—a tendency to find evidence supporting thehypothesis being tested would be expected(Snyder & Cantor, 1979; Snyder & Swann,1978b). A number of mechanisms operatingin the service of a hypothesis-confirmingstrategy may contribute to this result. First,the search for evidence may involve selectiveattention to information that is consistent

Page 3: Bias in Labeling Effect

22 JOHN M. DARLEY AND PAGET H. GROSS

with expectations and a consequent tendencyto recall expectancy-consistent informationwhen making final evaluations (Zadny &Gerard, 1974). Second, a hypothesis-con-firming strategy may affect how informationattended to during a performance will beweighted. Typically, expectancy-consistentinformation has inferential impact, whereasinconsistent information has insufficient in-fluence in social-decision tasks (Nisbett &Ross, 1980). In fact, a recent study by Lord,Ross, and Lepper (1979) indicates that evenwhen expectancy-inconsistent information isbrought to the attention of the perceiver, itmay be regarded as flawed evidence andtherefore given minimal weight in the eval-uation process. Third, it is also possible forinconsistent actions to be attributed to situ-ational factors and thereby be attributionallydiscounted (Regan, Strauss, & Fazio, 1974).Finally, apparently inconsistent behavior maybe reinterpreted as a manifestation of dis-positions that are consistent with the initialexpectancy (Hayden & Mischel, 1976).

Given the operation of all of these biasingmechanisms, an expectancy-confirmation ef-fect could arise even when the target person'sbehavior does not objectively confirm theperceiver's expectancies. Nonetheless, the op-portunity to observe the diagnostically rele-vant information is critical to the processbecause it provides what perceivers considerto be valid evidence, and thus, they can feelthat they have made an "unbiased" judg-ment.

In the present study, we attempted to findevidence of this two-stage expectancy-confir-mation process. To do this, perceivers weregiven information that would induce themto categorize an elementary school child asbelonging to a high- or low-socioeconomic-status (SES) class (cf. Cooper, Baron, & Lowe,1975). Consistent with the two-stage model,we predicted that perceivers given only thisdemographic information about the childwould be reluctant to provide label-consistentability evaluations. Another group of eval-uators were given the identical demographicinformation about the child (high or low SES)and were then shown a performance se-quence that provided ability-relevant infor-mation about the child. Owing to the hy-pothesis-confirming bias, it was predicted

that these individuals would find evidence inthe identical performance sequence to sup-port their opposing hypotheses and wouldthus report widely different judgments of thechild's ability. Moreover, we expected theseperceivers to mislocate the source of theirevidence from their own expectancies to the"objective" evidence provided by the perfor-mance sequence.

Method

As part of a study on "teacher evaluation methods,"students viewed a videotape of a fourth-grade femalechild and were asked to evaluate her academic capabil-ities. Variation in the videotape determined the four ex-perimental conditions. The first segment provided de-mographic information about the child and was used toestablish either positive or negative expectations for thechild's academic potential. Half of the participantsviewed a sequence that depicted the child in an urban,low-income area (negative expectancy); the other halfwere shown the same child in a middle-class, suburbansetting (positive expectancy).

Orthogonal to this manipulation was the performancevariable. Half of the participants from each expectancycondition were shown a second tape segment in whichthe child responded to achievement-test problems (per-formance). The tape was constructed to be inconsistentand relatively uninformative about the child's abilities.The remaining participants were not shown this segment(no performance).

The design was thus a 2 X 2 factorial one, with twolevels of expectancy (positive and negative) and two levelsof performance (performance and no performance). Inaddition, a fifth group of participants viewed the per-formance tape but were not given prior informationabout the child's background (performance only). Theirevaluations were used to determine if the performancetape was, as intended, an ambiguous display of the child'sacademic capabilities.

All viewers then completed an evaluation form onwhich they rated the child's overall achievement andacademic skill level.' Additional questions about thechild's performance and manipulation checks were in-cluded. After completing their evaluation form, partic-ipants were given a questionnaire designed to probe theirsuspiciousness about the experiment. Finally, partici-pants were debriefed, thanked, and paid.

Subjects

Seventy (30 male and 40 female) Princeton Universityundergraduates volunteered for a study on "teacher eval-uation and referral" for which they were paid $2.50 fora 1-hour session. Participants were randomly assignedto one of five (four experimental and one control) con-ditions, with an attempt made to have an equal numberof men and women in each condition. None of the stu-dents in the study reported having any formal teachertraining; two students had informal teaching experience,both at the high school level. Only three of the original

Page 4: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 23

subjects were eliminated from the study because of sus-piciousness about the experimental procedures.

Instructions

The experimenter introduced herself as a research as-sistant for a federal agency interested in testing new ed-ucational procedures. Students were told that their par-ticipation would be useful for determining the reliabilityof a new evaluation form teachers would use when re-ferring pupils to special programs (these included re-medial classes and programs for gifted students). To testthe completeness and scorability of the evaluation form,subjects, acting as teachers, were asked to provide anacademic evaluation of a selected child on this speciallydesigned form. The experimenter emphasized that allevaluations would be anonymous and confidential andasked participants not to place their names anywhere onthe form. She also requested that they replace the formin its envelope and seal it when they were finished. Eachparticipant was further admonished to be as accurateand objective as possible when evaluating their selectedpupil.

The reseach assistant then went on to explain that avideotape file of elementary school children had beenprepared for a previous study (numerous videotape reelswere on shelves in front of the subject). Participantswould be selecting one child from this sample to observeand evaluate. It was made clear that this "randomly se-lected sample of children includes some who performwell above their grade level, some who would benefitfrom remedial programs, and some at all levels betweenthese extremes." To select a child from this file, partic-ipants drew a number corresponding to a videotape reel.The experimenter, who had been blind to condition untilthis point, placed the tape on a television monitor andgave the participant a fact sheet appropriate to the childthey selected.

The participant actually selected one of five preparedtapes (corresponding to the four experimental and onecontrol conditions). In all conditions, the child observedwas a nine-year-old female Caucasian named Hannah,who was a fourth grader attending a public elementaryschool. The information about the child's name, grade,and so forth appeared on the fact sheet and was reiteratedin the narration of the tape.

Demographic Expectancy Manipulation

To establish either positive or negative expectanciesabout Hannah's ability, participants viewed a tape ofHannah that contained environmental cues indicatingeither a high or low socioeconomic background. Eachtape included 4 minutes devoted to scenes of Hannahplaying in a playground (filmed at a distance to preventclear perception of her physical attractiveness) and 2minutes devoted to scenes of her neighborhood andschool. The tapes were filmed at two different locations.

In the negative-expectancy condition, subjects viewedHannah playing in a stark fenced-in school yard. Thescences from her neighborhood showed an urban settingof run-down two-family homes. The school she attendedwas depicted as a three-story brick structure set close tothe street, with an adjacent asphalt school yard. The fact

sheet given to participants included the following infor-mation about Hannah's parents: Both parents had onlya high school education; her father was employed as ameat packer; her mother was a seamstress who workedat home.

In the positive-expectancy condition, Hannah wasseen playing in a tree-lined park. The scenes from herneighborhood showed a suburban setting of five- and six-bedroom homes set on landscaped grounds. Her schoolwas depicted as a sprawling modern structure, with ad-jacent playing fields and a shaded playground. Further,Hannah's fact sheet indicated that both her parents werecollege graduates. Her father's occupation was listed asan attorney, her mother's as a free-lance writer.

The Performance Manipulation

Two groups were asked to evaluate Hannah's academicability immediately after viewing one or the other ex-pectancy tape (no performance); two other groups weregiven the opportunity to observe Hannah in a test sit-uation (performance).

Subjects in the performance conditions observed asecond 12-minute tape sequence in which Hannah re-sponded to 25 achievement-test problems. This portionof the tape was identical for both performance groups.The problems were modified versions of items selectedfrom an achievement-test battery and included problemsfrom the mathematics computation, mathematics con-cepts, reading, science, and social studies subtests. Thegrade level for the problems ranged from the second tothe sixth grade. Participants were told that the test in-cluded "easy, moderate, and difficult problems." Theproblems were given orally to Hannah by a male testerwho held up the possible solutions on cards. The se-quence was filmed from behind the child so the viewerwas able to see the cards held by the tester but not Han-nah's face.

Hannah's performance was prearranged to present aninconsistent picture of her abilities. She answered botheasy and difficult questions correctly as well as incor-rectly. She appeared to be fairly verbal, motivated, andattentive on some portions of the tape and unresponsiveand distracted on other portions of the tape. The testerprovided little feedback about Hannah's performance.After each problem, he recorded Hannah's response andwent on to the next problem.

To determine what information the tape providedabout Hannah's ability in the absence of a priori expec-tancies, a group of participants, given the same coverstory as subjects in the other conditions, were shown onlythe performance tape. These subjects were given no in-formation about the child other than her name, age,grade, address, and the school she attended.

Dependent MeasuresAfter reviewing the tape, participants were given an

evaluation form to complete. The form contained thefollowing sections:

Ability measures. Nine curriculum areas formingthree broad categories were listed. Included in this sec-tion were reading (reading comprehension, reading abil-ity, writing, language ability), mathematics (mathemat-

Page 5: Bias in Labeling Effect

24 JOHN M. DARLEY AND PAGET H. GROSS

ical concepts, mathematical computation), and liberalarts (science, general knowledge, social studies). Eachcurriculum area was followed by a scale extending fromkindergarten to the sixth-grade, ninth-month grade level,with points labeled at 3-month intervals. Subjects wereinstructed to indicate the grade level that represented thechild's ability in each of these areas. For subsequent anal-yses, mean ratings of items within these three categorieswere used, and grade levels were converted to a scalewith months represented as fractions of a year (i.e., thirdgrade, sixth month would equal 3.5).

Performance measures. Participants in the perfor-mance conditions were asked to estimate the number ofeasy, moderate, and difficult problems the child answeredcorrectly and to report the overall grade level of the testadministered to the child. In an open-ended question,participants were asked to report the "information theyfound most useful in determining the child's capabili-ties."

Supplementary academic measures. Twenty traits orskills, followed by exemplars of classroom behaviorscharacterizing both the positive and negative ends ofeach of these traits, were listed. Subjects were asked tocheck the point on a 9-point scale that would best char-acterize the child on the dimension. Next to each scale,a box labeled "insufficient information" was also pro-vided. Subjects were instructed to check this box ratherthan a scale value if they felt they had not been givensufficient information to rate the child on a given di-mension.

These 20 items were selected to form five clusters:work habits (organization, task orientation, dependabil-ity, attention, thoroughness), motivation (involvement,motivation, achievement orientation), sociability (pop-ularity, verbal behavior, cooperation), emotional matu-rity (confidence, maturity, mood, disposition), and cog-nitive skill (articulation, creativity, learning capability,logical reasoning). Mean ratings of items within thesefive categories were used in subsequent analysis.

Manipulation checks. In the last part of the booklet,subjects were asked to rate the child's "attractiveness"and the "usefulness of socioeconomic information as anindicator of a child's academic ability." The final open-ended question asked subjects to report the child's so-cioeconomic level.

Suspiciousness probe. Finally, participants filled outa questionnaire assessing, for the agency, "how they hadbeen treated during the experimental session." This wasdesigned to probe their suspiciousness about the exper-imental procedures and purpose of the study. Followingthis, participants were thoroughly debriefed and paid.

Results

Ability-Level Ratings

Our primary hypothesis was that expec-tancy-confirmation effects occur only whenperceivers feel they have definitive evidencerelevant to their expectations. Specifically, wepredicted that subjects who viewed only thepositive- or negative-expectancy tape seg-ment (no performance) would show little, if

any, signs of expectancy confirmation in theirratings of the child's ability level, whereassubjects who viewed both the expectancy seg-ment and the test segment (performance)would show considerable signs of expectancyconfirmation. As a test of this hypothesis, a2 (positive vs. negative) X 2 (performancevs. no performance) analysis of variance(ANOVA) was performed on ability-level rat-ings.

As shown in Figure 1, the results supportour predictions. The ANOVA interaction termwas significant for each index: liberal arts,F(l, 56) = 6.67, p < .02; reading F(\, 56) =5.73, p < .03; and mathematics F(l, 56) =9.87, p < .01. Although a main effect for ex-pectancy emerged for each of the three in-dexes—liberal arts, F( 1,56) = 19.24, p < .01;

1 Positive Expectancy

— — — Negative Expectancy

5-

I

I 4

§O

Liberal Arts(4*3)

Mathematics

(4.03) (4.10)

(3.85) ̂ .̂

(3.04)

No Performance Performance

Figure 1. Mean grade-level placements on the liberal arts,reading, and mathematics indexes for experimental con-ditions.

Page 6: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 25

reading, F(\, 56) = 32.98, p< .001; andmathematics, F(l, 56) = 19.78, p < .001—Newman-Keuls tests revealed that the sub-jects in the no-performance conditions didnot rate the child's ability level as differingmuch in either direction from her knownschool grade. On only one of the indexes (lib-eral arts) did the no-performance-positive-expectancy subjects rate the child signifi-cantly higher than the negative-expectancysubjects (p < .05). In the two performanceconditions, however, positive-expectancysubjects made reliably higher ratings on allthree indexes (p < .05 in all cases). The fan-shaped interaction of Figure 1 is consistentwith the hypothesized two-stage confirma-tion process in which subjects first reservejudgment—if that judgment is based on onlydemographic indicators—but then allow theirjudgment about an ability to be biased in thedirection of hypothesis confirmation.1

Manipulation Checks

The manipulation checks indicate that theabove results were not artifactually produced.First, the expectancy manipulation was assuccessful for subjects who viewed the child'stest performance as for those who did not.Without exception, positive-expectancy sub-jects reported the child's socioeconomic sta-tus as upper middle or upper class, and neg-ative-expectancy subjects reported the child'ssocioeconomic status as lower middle orlower class. Second, analyses of ratings of thechild's attractiveness and the usefulness ofsocioeconomic information for predictingability yielded no differences across groups.The latter result is especially important inindicating that those who had seen the child'stest performance did not regard the demo-graphic information as any more diagnosticthan those who had not seen it. Thus, thegreater impact of induced expectancies in theperformance conditions was not attributableto greater confidence in an implicit theory ofthe social-class-ability relation. Moreover,mean ratings of the usefulness of socioeco-nomic information (for all groups) were justbelow the midpoint toward the "not useful"end of the scale.

Finally, as can be seen in Table 1, abilityratings of the performance-only group indi-

Table 1Mean Grade-Level Placements on CurriculumAreas by Performance Control Group

Index M Grade level SD

Liberal artsReadingMathematics

4.03.83.5

3rd grade, 9th month4th grade3rd grade, 6th month

.505

.581

.238

Note, n = 10.

cate that the performance segment was, asintended, an ambiguous display of Hannah'scapabilities.

We hoped that evaluations of the child'sability would tend to be variable, reflectingthe inconsistencies in the child's perfor-mance; however, mean estimates would beexpected to fall close to the child's givengrade level. As the data in Table 1 indicate,ratings on the three curriculum indexes doshow considerable variability, and the per-ceivers do use the child's grade level as ananchor for their judgments. The mathematicsratings were somewhat lower and less variablethan the others, indicating that her perfor-mance in this area may have been poorer andmore consistent.2 (This will be discussed ata later point.)

Judgments of the Performance

If performance subjects were no less awareof, or impressed by, the relevance of the ex-pectancy information, it follows that theyfound support for their divergent hypothesesin the child's performance. Measures fromthe academic evaluation form indicate sev-eral ways in which perceivers obtained sup-port for their diverse hypotheses. (All of these

1 We also analyzed this data by pooling across the threeability measures. As one would expect, because this in-creases the number of observations, the significance lev-els are improved, although the basic interactional pattern(p = .002) remains the same. Again, post hoc compar-isons reveal that the two performance conditions are re-liably different (p<.01), whereas the two no-perfor-mance groups do not differ reliably.

2 An Fr,,m test of the difference between several vari-ances indicates no difference between the variances ofthe liberal arts and reading indexes. The variance of themathematics index is, however, significantly differentfrom that of the liberal arts index, Fmax(9, 9) = 5.92, p <.05, and shows a marginally significant difference fromthat of the reading index.

Page 7: Bias in Labeling Effect

26 JOHN M. DARLEY AND PAGET H. GROSS

were measures of the subjects' perceptions ofthe performance and therefore were takenonly from the groups that witnessed the testsequence. Recall that all of these subjectswitnessed the identical performance tape.)

Test difficulty. Performance on a test isa joint function of the test taker's ability (andother personal factors) and the difficulty ofthe test. Therefore, one way of justifying ahigh-ability inference from an inconsistenttest performance is to perceive the test asbeing very difficult. Conversely, one way ofrationalizing a low-ability inference from thesame performance would be to perceive thetest as easy. This happened: Subjects in thepositive-expectancy condition rated the testas significantly more difficult (Mgrade = 4.8)than did those in the negative-expectancycondition (M grade = 3.9), f(28) = 2.69,p < .02.

Problems correct. Subjects also estimatedthe number of problems the child answeredcorrectly within each of the problem cate-gories: very difficult, moderately difficult, andeasy.

A repeated measures analysis of variancerevealed a marginally reliable tendency forsubjects with positive expectancies to esti-mate that the child correctly answered ahigher percentage of problems, F(l, 28) =3.94, p < .06. Follow-up analyses revealedthat subjects with positive expectancies esti-mated that the child correctly answered moreof the easy (M = 94% vs. 79%) and moder-ately difficult (M = 69% vs. 53%) problemsthan did subjects with negative expectancies,t(28) = 2.55 and 2.21, respectively, ps < .05.Expectancy did not affect estimates of an-swers to difficult problems (M = 37% vs.36%). The overall pattern suggests a bias toreport more instances of expectancy-consis-tent than expectancy-inconsistent test re-sponses.

Reporting relevant behaviors. Subjectshad been asked to report, in an open-endedformat, the performance information "mostrelevant for determining the child's capabil-ities." We expected that subjects anticipatinga good performance would report more in-stances of positive behaviors than those ex-pecting a poor performance. For each sub-ject, we computed a positivity index by sub-tracting the number of negative instances

from the number of positive instances. Con-sistent with predictions, positive-expectancysubjects reported a significantly greater num-ber of positive behaviors relative to negativeones as being relevant in their judgments thandid negative-expectancy subjects, /(28) =34.65, p<. 001.

To summarize the performance judg-ments, positive- and negative-expectancysubjects, although agreeing that the perfor-mance provided information that was suffi-cient to estimate the child's capabilities, dis-agreed on how difficult the test was, howmany problems the child answered correctly,and how many of her test behaviors reflectedeither positively or negatively on her achieve-ment level. On every measure, positive-ex-pectancy subjects made interpretations morefavorable to the child than did negative-ex-pectancy subjects.

Supplementary Academic Measures

Information sufficiency. Recall that sub-jects reporting on these measures were al-lowed to check a scale value or a box labeledinsufficient information. We believed that theno-performance subjects who had only de-mographic-based expectancies to rely onwould display a greater reluctance to evaluatethe child and that this reluctance would leadto more frequent use of the insufficient in-formation answer. A 2 X 2 analysis of vari-ance on these data yielded only a main effectfor performance, F(l, 56) = 12.86, p < .001,such that no-performance subjects chose thisoption more often (M = 43% of the items)than subjects who viewed the test sequence(M = 22% of the items). A one-way analysisof variance, comparing the no-performance,performance, and performance-only condi-tions yielded a reliable effect, F(2, 67) =12.41, p < .001. Moreover, comparing thesemeans (via Duncan's test), we find that theperformance-witnessing groups were not sig-nificantly different from each other, whereasboth were significantly different from the no-performance groups (p < .01). Thus, the dif-ference found between performance and no-performance groups on the use of the insuf-ficient information option did not seem todepend on the fact that the mere quantity ofevidence provided to performance subjects

Page 8: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 27

Table 2Mean Ratings on Trait Measures for Experimental Groups

Dependent measure

Condition

Positive, performancePositive, no-performanceNegative, no-performanceNegative, performance

Work habits

5.21.4.92a,b5.13.4.36b

Motivation

5.16a

5.31,4.80ab

4.1 lb

Sociability

5.25a4.82a,b4.38b4.58a,b

Maturity

5.33a,b5.65b4.67a4.77a,b

Cognitiveskills

4.73a5.55b4.83a

4.12,

Note, n = 15 per condition. The higher the number, the more positive the evaluation. Letter subscripts indicatevertical comparisons of cell means by Duncan's multiple-range test. Means that do not share a common subscriptare significantly different from each other at the .05 level.

was greater (two tape segments) than thatgiven to no-performance subjects (one tapesegment). The performance-only subjects,who also saw only one tape segment, did notdiffer from performance subjects on this mea-sure. The difference is better attributed to thegreater perceived diagnostic utility of the per-formance segment. Performance subjects ap-parently felt that the child's test performanceprovided sufficient diagnostic information onwhich to base their evaluations.

Trait measures. A 2 (expectancy) X 2(performance) analysis of variance was per-formed on ratings for each of the five traitdimensions. Because participants were giventhe option of not checking a scale value onthese measures, missing values were given ascore of 5, which, on a 9-point scale, repre-sents the neutral point.3 These data are pre-sented in Table 2.

Consistent with our findings for the cur-riculum indexes, a significant interactionemerged for the work habits index such thatindividuals expecting the child to performwell rated her more positively after viewingthe performance tape whereas those expect-ing her to perform poorly rated her morenegatively after viewing the performancetape, F(l, 56) = 5.15, p < .03. The predictedinteraction effect was not obtained for themotivation, sociability, emotional maturity,or cognitive skill measures. For each of thesemeasures, we found a main effect for expec-tancy, with the positive-expectancy groupsrating the child significantly higher than thenegative-expectancy groups, F(l, 56) = 6.99,4.57, 5.76, and 5.84, respectively, ps < .05.In addition, there was a significant effect for

performance on the cognitive skill index,with the performance groups showing lowerratings than the no-performance groups, F(l,56) = 7.73, p < .05. These data indicate thatcertain expectancy-consistent judgments maynot require a two-stage process. Although itmay be necessary to provide performance in-formation to obtain judgments of a child'sability level, judgments about other disposi-tional characteristics may be made withoutthis information.

Discussion

Unlike many previous studies demonstrat-ing expectancy-confirmation effects, the ex-pectancies in the present study were not cre-ated by information that most people wouldregard as definitively establishing their valid-ity. They were not created by objective testresults, expert judgments, or other authori-tative information. Instead, the expectancieswere conveyed by such cues as the child'sclothes, the bleakness of the playground onwhich she played, or the high- or low-statuscharacter of her parents' occupations.

We suggested that perceivers would realize

3 Analyses of these data require a decision about howto treat the responses of subjects who checked the "notenough information to rate" alternative. The means pre-sented in Table 2 are calculated by assigning a score of5 to missing scale values. This assumes that the nonre-sponding subjects would have checked the scale midpointif forced to respond. Another way of dealing with thesame issue is to insert the cell-mean score for each suchsubject. Using this procedure, the,pattern of results isessentially unchanged. The same effects emerged as sig-nificant.

Page 9: Bias in Labeling Effect

28 JOHN M. DARLEY AND PAGET H. GROSS

that expectancies created by this informationdo not form a completely valid basis for someof the evaluations they were asked to make.The results indicate that this is so: Perceiverswho were given only demographic informa-tion about the child demonstrated a resis-tance to making expectancy-consistent attri-butions on the ability indexes. Their esti-mations of the child's ability level tended tocluster closely around the one concrete factthey had at their disposal: the child's gradein school. When given the opportunity toavoid making dispositional attributions al-together, nearly half of the time these per-ceivers chose that option.

In contrast, a marked expectancy-confir-mation effect was evident for those perceiverswho evaluated the child after witnessing anability-relevant performance. Those who be-lieved the child came from a high socioeco-nomic class reported that her performanceindicated a high ability level, whereas thosewho believed the child came from a low so-cioeconomic class reported that the identicalperformance indicated a substantially lowerlevel of ability.

This pattern of results suggests that whenthe diagnostic validity of a perceiver's expec-tations is suspect, expectancies function ashypotheses, and the task of evaluating an in-dividual for whom one has an expectancy isa hypothesis-testing process. Expectancy con-firmation, then, does not always result froman automatic inference process. Instead, itoccurs as the end product of an active processin which perceivers examine the labeled in-dividual's behavior for evidence relevant totheir hypothesis.4

As is apparent from our data, the hypoth-esis-testing strategy that perceivers use has abias (as Snyder & Cantor, 1979, have sug-gested) toward confirmation of the hypothesisbeing tested. The literature suggests a numberof related mechanisms that can contribute tothis effect (see Nisbett & Ross, 1980, for areview). We do have evidence to suggest whatsome of these mechanisms may have been inour study. First, there seems to be a selectiverecall of evidence: Perceivers who expectedthe child to do well reported the child as hav-ing answered more easy and more moder-ately difficult problems correctly than those

expecting the child to do poorly. Second,there seems to be a selective weighting of theevidence such that hypothesis-consistent be-haviors are regarded as more "typical" of thechild's true capabilities. When people wereasked to report what evidence they foundmost useful in determining their evaluations,they reported only those test items on whichthe child's performance was consistent withtheir initial expectations. Third, perceiversappeared to develop auxilliary hypothesesthat would render apparently inconsistentbehavior consistent with their hypotheses.These auxilliary hypotheses did not seem tobe revised assessments of the actor but ratherassessments of situational factors that couldaccount for discrepancies in the actor's be-havior, For instance, we found that personswho expected a good performance decidedthat the test given to the child was very dif-ficult, a conclusion that would account forinstances of otherwise inconsistent poor per-formance; whereas persons who expected apoor performance reported that the test waseasy, which would account for inconsistentgood performance. Finally, we found evi-dence in the open-ended reports of some par-ticipants to suggest that the meaning givento the child's behaviors was often consistentwith the perceivers' initial hypotheses. Forexample, a low-SES Hannah was reported tohave "difficulty accepting new information,"whereas a high-SES Hannah was reported tohave the "ability to apply what she knows tounfamiliar problems."

Implicit in this data is the conclusion thatperceivers seem to be aware that witnessinga particular test performance does not give

4 In the experimental paradigm in which expectancyeffects are typically demonstrated, perceivers are alwaysprovided with the opportunity to observe or interact withthe labeled target person. By using this research design,one cannot conclusively determine whether the resultingexpectancy effect was due to differential perceptions ofthe target person, as most researchers suggest, or if sub-jects had simply based their evaluations on the infor-mation provided by the label and had ignored the per-formance. By including conditions in the present studyin which some perceivers are not provided with perfor-mance information, it becomes possible to distinguishbetween expectancy effects arising from a nonobserva-tionally based inference process and those arising fromexpectancy-guided search processes.

Page 10: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 29

them automatic access to an individual's un-derlying ability. Many other factors, such asluck, task difficulty, or lack of motivation,may intervene (Darley & Goethals, 1980;Weiner et al., 1971). Therefore, the meaningof a person's performance is susceptible tomultiple interpretations that can be consis-tent with, and even supportive of, opposinghypotheses about that person's ability.

Thus far, we have treated information ascreating expectancies that are either valid andautomatically applied to others or weak andonly hypothesis generating. It is more likelythat any item of information about a persongenerates some certainties and some hy-potheses, depending on the domain to whichit is applied. In the present study, the de-mographic information seems to have thischaracter of creating both certainties and hy-potheses. On the supplemental measures re-lated to school achievement—specifically, onmeasures of motivation, sociability, andemotional maturity—a simple main effectwas obtained such that people who saw thechild as coming from a high socioeconomicbackground judged her more positively, andthose who did not see the performance hadas extreme ratings as those who did. (Butkeep in mind t)iat individuals had the op-portunity not to rate the child on these mea-sures and that, overall, many more peoplefrom the no-performance conditions chosenot to rate.) Apparently, some individuals feltthat demographic data alone was sufficientevidence on which to base an evaluation of,for example, a child's likely achievement ori-entation. Thus, the addition of performanceinformation was not necessary for a conclu-sive judgment in this area. In general, oursocial categories do trigger expectancies fora constellation of dispositions and behaviors,and for some of these, it may not be necessaryto rely on performance evidence to feel cer-tain that one's expectations are accurate.

The Validity of Demographic Evidence

From another perspective, one could askwhether demographic information does notwarrant correspondent inferences of ability.Certainly, numerous studies show correla-tions between social class and school perfor-

mance (Dreger & Miller, 1960; Kennedy,VanDeReit, & White, 1963; Lesser, Fifer, &Clark, 1965). From this perspective, the dif-ferential judgments of people who witnessedthe same test with different demographicallyproduced expectancies was less evidence ofbias than it was of an understanding of thetrue workings of the world. Two things canbe said about this: First, part of the generalargument of those concerned with self-ful-filling prophecies is that the present processis exactly how the link between social classand academic performance comes about.Second, the data from our no-performanceperceivers indicate that people regard thequestion of what exists in the world as a sep-arate question from that addressed in thepresent study. Base-rate information (i.e., es-timates of the frequencies with which an at-tribute or capability level occurs in a socialgroup) represents probabilistic statementsabout a class of individuals, which may notbe applicable to every member of the class.Thus, regardless of what an individual per-ceives the actual base rates to be, rating anyone member of the class requires a higherstandard of evidence. When one child's abil-ity is being considered, demographic infor-mation does not appear to meet the per-ceiver's criteria for a valid predictor; perfor-mance information, on the other hand, clearlydoes.

There is yet another way to pose the valid-ity question, and that is to consider the useof demographic evidence when perceiversformulate a working hypothesis. From an in-formation-processing perspective, hypothesisformulation serves a useful function: It allowsone to make better use of subsequent evi-dence. The rub, of course, is that once a hy-pothesis is formulated, regardless of our judg-ments of the validity of the evidence on whichit is based, our cognitive mechanisms arebiased toward its eventual confirmation. Thus,when asking whether the final judgments ofthe perceiver accurately reflect what exists inthe world, we should not obscure an impor-tant point: how those judgments come about.To clarify further, the "judgmental bias" inthe present study does not refer to the indis-criminate use of category-based information,or to the (in)accuracy of final judgments, but

Page 11: Bias in Labeling Effect

30 JOHN M. DARLEY AND PAGET H. GROSS

to the processes that determine what thosejudgments will be.

An Alternative Explanation

An alternative explanation for the generalpattern of results reported here is possible.The individuals who witnessed only the de-mographic information may have actuallymade ability inferences but chose not to re-port them. Their failure to report their eval-uations may have been due to fears that theexperimenter would regard the inferences asunjustified. However, in the experiment weminimized the possible cause for this concernby demonstrating to the participants thattheir responses would be anonymous. Theexperimenter was not present while the par-ticipant filled out the dependent-measuresform and did not return until he or she de-posited the questionnaire in an anonymity-guaranteeing location. Furthermore, on thefinal questionnaire, participants were askedif they were sufficiently assured of the an-onymity of their responses, and all of themreplied affirmatively.

It is, of course, still possible to make a gen-eralized version of the same point: The per-ceiver's resistance to using the demographicinformation could, at least in part, be mo-tivated by the awareness that their behaviorwas under scrutiny by others. This does notnecessarily diminish the interest in the phe-nomenon. In the real world, people whomake judgments frequently know their judg-ments may be public. Teachers classifyingstudents, clinicians diagnosing clients, andemployers selecting new personnel are allaware that their actions may be scrutinizedby others. Thus, whether this awareness isbased on personal knowledge, social pres-sures, or internalized social desirability con-cerns, both the processes that bring aboutthose judgments and the consequences interms of judgmental bias are likely to be thesame.

The present study finds results that at firstglance seem contradictory to results of someother studies. One thinks particularly of thework of Locksley et al. (1980) and that ofKahneman and Tversky (1973). In the Lock-sley et al. study, the direction of the inter-

action appears to be the reverse of that ob-tained here. A strong stereotype effect in traitratings is found with category-membershipinformation (gender labels) or when nondi-agnostic information accompanies the cate-gory label. This stereotype effect disappearswith diagnostic information. However, con-sider the differences in the type of informa-tion given to perceivers in the present studyand that given to perceivers in the corre-spondent conditions of the Locksley et al.study. The perceivers in the present studywho were reluctant to apply stereotypes (theno-performance conditions) received nondi-agnostic case information, as do some in theLocksley et al. study. However, perceivers inthe present study observe the child they willrate and are given a fair amount of familydata—information that would certainly dis-tinguish the child from others in her socialgroup. The diagnostic information used byLocksley et al. consists of information thatcould be applied to almost any person andmay not have created an individuatedimpression of the person to be rated. The twoconditions, then, are not identical, and com-paring them leads us to the following possi-bility: Stereotype effects persist with infor-mation that does not distinguish the targetfrom the target's social category, whereas di-lution effects (nonstereotypic judgments) ap-pear when case information successfully cre-ates an individuated impression of the target.Recent studies by Quattrone and Jones (1980)and Locksley, Hepburn, and Oritz (in press)support this conclusion.

In comparing other conditions of theLocksley et al. (1980) and the present study,differences in the type of information givento perceivers produces discrepant results. Thediagnostic information given to perceivers inthe Locksley et al. study consists of a singlebehavioral exemplar that confirms or discon-firms a gender-based trait expectancy. A di-lution effect is found only with a disconfirm-ing behavior sample. In contrast, the diag-nostic test sequence in the present studycontains both confirming and discontinuingbehavioral evidence. Furthermore, we knowfrom supplementary measures that perceiv-ers found the expectancy-consistent portionsmore diagnostically informative than the in-

Page 12: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 31

consistent portions. Therefore, with a sourceof multiple information—with many infor-mation elements that serve to confirm ex-pectancies—a confirmation effect is not sur-prising. Had the performance tape in thepresent study provided only compelling dis-confirming evidence, we suspect a dilutioneffect might have been found here as well.

Discrepancies between this work and thatof Kahneman and Tversky (1973) can beaddressed as well. In Kahneman and Tver-sky's studies, individuals are asked to predicta target person's occupational-categorymembership from a brief personality descrip-tion. Predictions are overwhelmingly basedon the degree to which the personality infor-mation "fits" with an occupational stereo-type (i.e., a representativeness effect). Thisappears inconsistent with the stereotype-re-sisting judgments of the perceivers in ourstudy who received no performance infor-mation. However, the demographic infor-mation given to our no-performance per-ceivers, although it does allow for a judgmentof fit to a social category, does not provideinformation for a judgment on an ability di-mension. The condition, then, is similar toKahneman and Tversky's Experiment 3 inwhich the personality description is uninfor-mative with regard to the target person'sprofession (i.e., it contains no occupation-rel-evant personality traits). In their study, oc-cupational-category predictions were essen-tially random. That is, they were based nei-ther on prior probabilities nor on similarity.This is essentially the same effect we find forno-performance perceivers in the presentstudy. Apparently, the representativeness ef-fect (or an expectancy effect) depends on theprovision of information that allows for asimilarity match to the categories perceiversare asked to judge.

Further, the no-performance conditions inthe present study are not identical to Kahne-man and Tversky's (1973) null-descriptioncondition. In that condition, subjects aregiven no information whatsoever about thetarget—neither individuating information norcategory-relevant information. Here a strongbase-rate effect emerges. Although this mightcause one to predict a strong stereotype effectin the present no-performance conditions,

our earlier point about individuating infor-mation may explain why it is not obtained.No-performance perceivers may lack rele-vant case information, but they do have asignificant amount of individuating target in-formation; apparently, this significantly alters.the framework for prediction.

We might summarize as follows: Repre-sentativeness and expectancy effects are foundwhen relevant case data are provided so thatindividuals can determine the target person'sfit to a category. Base-rate effects (and non-observationally based stereotypic judgments)are found when neither case data nor indi-viduating information is given. Finally, as-sume that three conditions are met: indivi-duating information is given, informationabout base rates is withheld, and a priori ex-pectancies are not relied on because they maynot be applicable to a particular target. Then,without relevant case data, a judgment of fitis precluded, attenuating a representativenessor a biased confirmation effect. In these cir-cumstances, judgments are made at the scalemidpoint or the chance level. We find thislatter effect in both Kahneman and Tversky's(1973) uninformed condition and in the no-performance conditions of the present study.

A final point is relevant to both of the stud-ies reviewed above. Predicting ability fromsocial-class information may not be equiva-lent to predicting personality traits from gen-der labels or occupational membership. Thenature of the prediction required (abilityrather than personality characteristics) maycause individuals to regard social-class in-formation as at the invalid end of the con-tinuum we have defined. But an individual'sgender or occupation, on the other hand, maybe regarded as valid information on whichto base an inference about personality. Re-lated to this point is that the standards ofevidence required for different stereotype-confirming judgments may be different. Au-tomatic assumptions about personality maybe made from occupation or gender label,and thus, stereotype effects are obtained withthis information alone, or with minimal ad-ditional information. To make judgments ofa low-SES child's ability, perceivers requiremore information and, specifically, criterion-relevant information. Thus, stereotypic judg-

Page 13: Bias in Labeling Effect

32 JOHN M. DARLEY AND PAOET H. GROSS

ments are not found with only category ornondiagnostic information but are foundonly when a sufficient amount of apparentlyconfirming diagnostic information is pro-vided.

Limits to the Confirmation Process

The present study finds results consistentwith those of many other studies. For in-stance, Swann and Snyder (1980) found thattarget individuals labeled as dull wilted werestill seen as dull witted even after the per-ceivers had witnessed a sequence in whichthese target individuals outperformed thoselabeled as bright, a situation in which a cog-nitive confirmation effect triumphed overapparently strongly disconfirming evidence.Nonetheless, we suspect that there are limitsto the cognitive confirmation process.

We can suggest several variables, some ofwhich we have mentioned, that may deter-mine whether a confirmation or a disconfir-mation effect is found. First, there is the clar-ity of the disconfirming evidence. In the do-main of abilities, in spite of the aboveexample, a sustained high-level performanceis compelling evidence for high ability. I mayperceive another as a slow runner, but if I seehim or her do several successive 4-minutemiles, my expectancy must change. Whenthis occurs, it is possible that a contrast effectwill take place in which the significance ofthe disconfirming behavior will be exagger-ated and the initial expectancy reversed. In-tuitively, no such unambiguous evidence ex-ists in the personality realm, where even com-pelling positive behavior can be attributed tonegative underlying motives or dispositions.Second, the strength with which the initialexpectancy or hypothesis is held may produceconflicting effects. "Strength of expectancy"is an ambiguous, phrase. It may refer to one'sdegree of commitment to an expectancy ofa fixed level, or it may refer to the extremityof the expectancy. In the first instance, thestronger the commitment to the expectancy,the more resistant it would be to disconfir-mation. However, the more extreme the ex-pectancy, the more evidence there is that po-tentially disconfirms it. Finally, the per-ceiver's motivation may play a role. Under

certain circumstances, an individual mayprefer to see his or her expectancy confirmed;in other situations he or she may have a pref-erence for the disconfirmation of the sameexpectancy. All of these suggestions, of course,require empirical testing.

A Final Comment

The self-fulfilling prophecy and the expec-tancy-confirmation effect have been of inter-est to psychologists partially because of thesocial policy implications of the research.However, in many of the research studies thatdocument the effect, the specific and limitedcharacter of the material that creates the ex-pectancies is lost, and we talk as if any ma-terial that creates expectancies is automati-cally accepted as valid by the perceiver. Theimage of the perceiver that emerges is one ofan individual who takes his or her stereotypesand prejudices for granted and indiscrimi-nantly applies them to members of the classhe or she has stereotyped without any con-sideration of the unjustness of such a pro-ceeding. The present study suggests that thisis an oversimplification that in turn doessome injustice to the perceiver. There aretimes when perceivers resist regarding theirexpectancies as truths and instead treat themas hypotheses to be confirmed or discon-firmed by relevant evidence. Perceivers in thepresent study did not make the error of re-porting stereotypic judgments without suffi-cient evidence to warrant their conclusions.They engaged in an extremely rational strat-egy of evaluating the behavioral evidencewhen it was available and refraining fromjudgment when it was not. It was the strategyperceivers employed to analyze the evidencethat led them to regard their hypotheses asconfirmed even when the objective evidencedid not warrant that conclusion. The errorthe perceivers make, then, is in assuming thatthe behavioral evidence they have derived isvalid and unbiased. Future research couldprofitably address the question of the con-ditions under which this general confirma-tion strategy can be reversed or eliminated.In the meantime, however, the image of theperceiver as a hypothesis tester is certainlymore appealing than that of a stereotype-ap-

Page 14: Bias in Labeling Effect

BIAS IN LABELING EFFECTS 33

plying bigot, even though the end result ofboth processes, sadly enough, may be quitesimilar.

References

Brigham, J. C. Ethnic stereotypes. Psychological Bulle-tin, 1971, 76, 15-38.

Cooper, H. M., Baron, R. M., & Lowe, C. A. The im-portance of race and social class information in theformation of expectancies about academic perfor-mance. Journal of Educational Psychology, 1975, 67,312-319.

Darley, J. M., & Fazio, R. H. Expectancy confirmationprocesses arising in the social interaction sequence.American Psychologist, 1980, 35, 867-881.

Darley, J. M., & Goethals, G. R. People's analyses of thecauses of ability-linked performances. In L. Berkowitz(Ed.), Advances in experimental social psychology (Vol.13). New York: Academic Press, 1980.

Dreger, R. M., & Miller, S. K. Comparative psychologicalstudies of Negroes and whites in the United States.Psychological Bulletin, 1960, 57, 361-402.

Duncan, B. L. Differential social perception and attri-bution of intergroup violence: Testing the lower limitsof stereotyping of blacks. Journal of Personality andSocial Psychology, 1976, 34, 590-598.

Foster, G., Schmidt, C., & Sabatino, D. Teacher expec-tancies and the label "learning disabilities." Journalof Learning Disabilities, 1976, 9, 111-114.

Hayden, T, & Mischel, W. Maintaining trait consistencyin the resolution of behavioral inconsistency: The wolfin sheep's clothing? Journal of Personality, 1976, 44,109-132.

Kahneman, D., & Tversky, A. On the psychology of pre-diction. Psychological Review, 1973, 80, 237-251.

Karlins, M., Coffman, T. L., & Walters, G. On the fadingof social stereotypes: Studies in three generations ofcollege students. Journal of Personality and SocialPsychology, 1969,13, 1-16.

Kelley, H. H. The warm-cold variable in first impres-sions of persons. Journal of Personality, 1950,78,431-439.

Kennedy, W. A,, VanDeReit, V., & White, J. C. A nor-mative sample of intelligence and achievement ofNegro elementary school children in the southeasternUnited States. Monographs of the Society for Researchin Child Development, 1963, 28, 13-112.

Langer, E. J., & Abelson, R. P. A patient by any othername . . . : Clinician group differences in labelingbias. Journal of Consulting and Clinical Psychology,1974, 42, 4-9.

Lesser, G. S., Fifer, G., & Clark, D. H. Mental abilitiesof children from different social class and culturalgroups. Monographs of the Society for Research inChild Development, 1965, 30, 1-115.

Locksley, A., Borgida, E., Brekke, N., & Hepburn, C.Sex stereotypes and social judgment. Journal of Per-sonality and Social Psychology, 1980, 39, 821-831.

Locksley, A., Hepburn, C., & Ortiz, V. Social stereotypesand judgments of individuals: An instance of the base-

rate fallacy. Journal of Experimental Social Psychol-ogy, in press.

Lord, C., Ross, L., & Lepper, M, E. Biased assimilationand attitude polarization: The effects of prior theorieson subsequently considered evidence. Journal of Per-sonality and Social Psychology, 1979, 37, 2098-2109.

Merton, R. K. The self-fulfilling prophecy. Antioch Re-view, 1948,5, 193-210.

Nisbett, R., & Ross, L. Human inference: Strategies andshortcomings of social judgment. Englewood Cliffs,N.J.: Prentice-Hall, 1980.

Quattrone, G. A., & Jones, E. E. The perception of vari-ability within in-groups and out-groups: Implicationsfor the law of small numbers. Journal of Personalityand Social Psychology, 1980, 38, 141-152.

Regan, D. T., Strauss, E., & Fazio, R. Liking and theattribution process. Journal of Experimental SocialPsychology, 1974, 10, 385-397.

Rist, R. C. Student social class and teacher expectations:The self-fulfilling prophecy in ghetto education. Har-vard Educational Review, 1970, 40, 411-451.

Rosenhan, D. L. On being sane in insane places. Science,1973,779,250-258.

Rosenthal, R. On the social psychology of self-fulfillingprophecy: Further evidence for Pygmalion effects andtheir mediating mechanisms. New York: MSS Mod-ular Publications, Module 53, 1974.

Rosenthal, R., & Jacobson, L. Pygmalion in the class-room. New York: Holt, Rinehart & Winston, 1968.

Snyder, M., & Cantor, N. Testing hypotheses about otherpeople: The use of historical knowledge. Journal ofExperimental Social Psychology, 1979, 75, 330-342.

Snyder, M., & Swann, W. B. Behavioral confirmation insocial interaction: From social perception to socialreality. Journal of Experimental Social Psychology,1978,14, 148-162. (a)

Snyder, M., & Swann, W. B. Hypothesis-testing processesin social interaction. Journal of Personality and SocialPsychology, 1978, 36, 1202-1212. (b)

Snyder, M., Tanke, E. D., & Berscheid, E. Social per-ception and interpersonal behavior: Ori the self-ful-filling nature of social stereotypes. Journal of Person-ality and Social Psychology, 1977, 35, 656-666.

Swann, W. B., & Snyder, M. On translating beliefs intoaction: Theories of ability and their application in aninstructional setting. Journal of Personality and SocialPsychology, 1980, 6, 879-888.

Weiner, B., et al. Perceiving the causes of success andfailure. Morristown, N.J.: General Learning Press,1971.

Word, C. O., Zanna, M. P., & Cooper, J. The nonverbalmediation of self-fulfilling prophecies in interracialinteraction. Journal of Experimental Social Psychol-ogy, 1974, 70, 109-120.

Zadny, J., & Gerard, H. B. Attributed intentions andinformational selectivity. Journal of Experimental So-cial Psychology, 1974, 10, 34-52.

Received August 31, 1981Revision received April 2, 1982