false belief, watson

Upload: ruxandra-pop

Post on 03-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 False Belief, Watson

    1/30

    Child Development, May/June 2001, Volume 72, Number 3, Pages 655684

    Meta-Analysis of Theory-of-Mind Development:The Truth about False Belief

    Henry M. Wellman, David Cross, and Julanne Watson

    Research on theory of mind increasingly encompasses apparently contradictory ndings. In particular, in ini-tial studies, older preschoolers consistently passed false-belief tasksa so-called denitive test of mental-state understandingwhereas younger children systematically erred. More recent studies, however, havefound evidence of false-belief understanding in 3-year-olds or have demonstrated conditions that improvechildrens performance. A meta-analysis was conducted ( N 178 separate studies) to address the empirical in-consistencies and theoretical controversies. When organized into a systematic set of factors that vary acrossstudies, false-belief results cluster systematically with the exception of only a few outliers. A combined modelthat included age, country of origin, and four task factors (e.g., whether the task objects were transformed inorder to deceive the protagonist or not) yielded a multiple R of .74 and an R2 of .55; thus, the model accountsfor 55% of the variance in false-belief performance. Moreover, false-belief performance showed a consistent de-velopmental pattern, even across various countries and various task manipulations: preschoolers went from below-chance performance to above-chance performance. The ndings are inconsistent with early competenceproposals that claim that developmental changes are due to tasks artifacts, and thus disappear in simpler, re-vised false-belief tasks; and are, instead, consistent with theoretical accounts that propose that understandingof belief, and, relatedly, understanding of mind, exhibit genuine conceptual change in the preschool years.

    INTRODUCTION

    Theory of mind has become an important theoreti-cal construct and the topic of considerable researcheffort. Theory of mind describes one approach to alarger topic: everyday or folk psychologythe con-strual of persons as psychological beings, interactors,and selves. The phrase, theory of mind, emphasizesthat everyday psychology involves seeing oneself and others in terms of mental statesthe desires,emotions, beliefs, intentions, and other inner experi-ences that result in and are manifested in human ac-tion. Furthermore, everyday understanding of peoplein these terms is thought to have a notable coherence.Because actors have certain desires and relevant be-liefs, they engage in intentional acts, the success andfailure of which result in various emotional reactions.

    Whether or in what sense everyday psychology istheorylike is a matter of contention (see, e.g., Gopnik & Wellman, 1994; Nelson, Plesa, & Henseler, 1998).Regardless, the phrase, theory of mind, highlightstwo essential features of everyday psychology: its co-herence and mentalism.

    How, when, and in what manner does an everydaytheory of mind arise? This question has generatedmuch current research with children. The questionhas been investigated using a variety of tasks andstudies that focus on various conceptions within thechilds developing understanding, for example, con-

    ceptions of desires, emotions, beliefs, beliefdesire

    reasoning, or psychological explanation, among others(see, e.g., Astington, 1993; Flavell & Miller, 1998; Well-man, 1990). From the earliest research, however, acentral focus has been on childrens understanding of belief, especially false belief. Why? Mental-state un-

    derstanding requires realizing that such states mayreect reality and may be manifest in overt behavior, but are nonetheless internal and mental, and thusdistinct from real-world events, situations, or be-haviors. A childs understanding that a person has afalse beliefone whose content contradicts realityprovides compelling evidence for appreciating thisdistinction between mind and world (see, e.g., Den-nett, 1979).

    A now classic false-belief task presents a child withthe following scenario (Wimmer & Perner, 1983):Maxi puts his chocolate in the kitchen cupboard andleaves the room to play. While he is away (and cannotsee) his mother moves the chocolate from the cup- board to a drawer. Maxi returns. Where will he look for his chocolate, in the drawer or in the cupboard?Four- and 5-year-olds often pass such tasks, judgingthat Maxi will search in the cupboard although thechocolate really is in the drawer. These correct an-swers provide evidence that the child knows thatMaxis actions depend on his beliefs rather than sim-ply the real situation itself, because belief and reality

    2001 by the Society for Research in Child Development, Inc.

    All rights reserved. 0009-3920/2001/7203-0001

  • 8/12/2019 False Belief, Watson

    2/30

    656 Child Development

    diverge. Many younger children, typically 3-year-olds, fail such tasks. Instead of answering randomly,younger children often make a specic false-belief errorthey assert that Maxi will look for the choco-late in the drawer to which it was moved.

    These ndings are empirically intriguing; it isstriking that young children would make such a pro-vocative error. They are also theoretically importantin that they support a developmental hypothesis thatchildrens theory of mind undergoes a major concep-tual change in early life. To be clear, the claim is notthat young children know nothing of mental states, but that they fail to understand representational men-tal states. One way of depicting this claim is pre-sented in Figure 1. A simple understanding of a statesuch as desire could involve construing the desirer ashaving an internal, subjective urging for an external

    state of affairs. But an everyday understanding of be-lief requires the notion that the person has a represen-tation of the world, the contents of which could be,and in the case of false beliefs are, quite different fromthe contents of the world itself. Thus, the change in-volved has been characterized as a shift (1) from asituation-based to a representation-based understand-ing of behavior (Perner, 1991), (2) from a connectionsto a representational understanding of mind (Flavell,1988), or (3) from a simple desire to a beliefdesirenaive psychology (Wellman, 1990). In short, as revealedin part by successful false-belief reasoning, the claim

    is that older children understand that people livetheir lives in a mental world as much as in a world of real situations and occurrences.

    False-belief performance has come to serve as amarker for mentalistic understanding of personsmore generally. Thus, in research on individual differ-ences in young childrens social cognition, false-belief performances are used as a major outcome measureto assess the inuence of early family conversations(Dunn, Brown, Slomkowski, Tesla, & Youngblade,1991), engagement in pretend play (Youngblade &Dunn, 1995), or family structure (Hughes & Dunn,1998; Perner, Ruffman, & Leekam, 1994) on develop-ment of mentalistic understandings. False-belief un-derstanding has also become a major tool for researchwith developmentally delayed individuals. The theory-of-mind hypothesis for autism, in particular, claims

    that the severe social disconnectedness evident ineven high-functioning individuals with autism isdue to an impairment in their ability to construe per-sons in terms of their inner mental lives (see, e.g.,Baron-Cohen, 1995). False-belief performances pro-vided an initial empirical test of this claim in thathigh-functioning children with autism who are ableto reason competently about physical phenomenaoften fail false-belief tasks, whereas Down syndromeand other delayed populations of equivalent mentalage often do not (e.g., Baron-Cohen, Leslie, & Frith,1985; for more comprehensive ndings and compa-

    risons, see Happe, 1995; Yirimiya, Erel, Shaked, &Solomonica-Levi, 1998).For various reasons, therefore, a considerable body

    of research has accumulated, which employ an in-creasing variety of false-belief tasks that focus on at-tempting to demonstrate and explain false-belief errors, as well as relate performance on false-belief tasks to other conceptions, tasks, and competencies.Theory-of-mind research goes well beyond this task and these data; nonetheless false-belief tasks have acentral place in current social-cognitive research (seeFlavell & Miller, 1998), much as conservation tasks

    once were focal for understanding cognitive develop-ment and for testing Piagets ndings and theorizing.For the case of false belief, just as in the conservationliterature, the initial accounts, the initial tasks, and es-pecially the claims of conceptual change have all beenvigorously challenged.

    In particular, research on false belief instantiates a basic conundrum in the study of cognitive develop-ment. Performance on any cognitive task reects atleast two factors: conceptual understanding requiredto solve the problem (competence) and other non-focal cognitive skills (e.g., ability to remember the key

    information, focus attention, comprehend, and an-

    Figure 1 Graphic depiction of how the person on the rightconstrues the desire (top) or belief (bottom) of the person on

    the left.

  • 8/12/2019 False Belief, Watson

    3/30

    Wellman, Cross, and Watson 657

    swer various questions) required to access and ex-press understanding (performance). The last 25years of cognitive development research have pro-duced a plethora of early competence studies andaccounts essentially showing that on various tasksyoung children fail not because they lack the concep-tual competence, but rather because the testing situa-tion was too demanding or confusing. This researchhas had several desirable results: undeniable under-estimations of young childrens knowledge have been exposed; information-processing analyses of how children arrive at their answers and responses,and not just what answers or responses they make,have ourished; and domain-general accounts of cognitive competence have yielded to more precisedomain-specic understandings of childrens con-ceptions and skills. At the same time, however, ac-cepted demonstrations of conceptual change havelargely disappeared. This is curious inasmuch as theinterplay between cognitive change and stability isthe cornerstone of all major theories of cognitive de-velopment. Yet each proposed developmental change(e.g., Piagets conservation competence, Careys pro-posed shift from naive psychology to naive biology,false-belief understandings) has seemingly evapo-rated in the mist of task variations showing enhancedperformance in still younger children.

    Conceivably, conceptual change may indeed berare. The contemporary re-emergence of strongly na-tivist perspectives on cognitive development bothcontributes to and derives from this possibility (e.g.,Spelke, 1994). On the other hand, genuine conceptualchanges may be obscured by current emphasis onearly competence and task simplications, making itdifcult to comprehend the bigger picture amidst thehaze of accumulating results from numerous task variations. A comprehensive analysis of the volumi-nous and varied false-belief research provides an im-portant contemporary opportunity to examine this basic issue.

    Empirically, an increasing number of researchers

    now claim that the original false-belief tasks are un-necessarily difcult and that 3-year-olds can evidenceimproved, even above-chance, false-belief reasoningif the tasks are suitably revised (e.g., Siegal & Beattie,1991; Sullivan & Winner, 1993). Not only the correctestimation of 3-year-olds is at issue, but more impor-tantly, basic developmental trends. Some authorsnow claim that 3-year-olds, and much younger chil-dren as well, understand belief and false belief (e.g.,Chandler, Fritz, & Hala, 1989; Fodor, 1992). False- belief competence, assessed correctly, is thus predictedto be high even in young children. Other authors (e.g.,

    Robinson & Mitchell, 1995) claim that many 3-year-

    olds fail false-belief tasks but claim many 4- and 5-year-olds do as well. The most striking thing aboutthe age trends was the lack of them . . . Quite simply,it has become fashionable to claim that there is asharp age trend, but in fact there is not (Mitchell,1996, pp. 137 138).

    These issues, controversies, and theories, alongwith the increasing amount of empirical ndings,mandate a careful review. Indeed, several qualitativereviews have been presented, both several years agowhen the number of focal studies were modest (Fla-vell, 1988; Perner, 1991; Wellman, 1990) and more re-cently as the studies and discrepancies have grown(Leslie, 2000; Mitchell, 1996; Taylor, 1996). But thesereviews have come to different conclusions, and havefailed to provide a compelling synthesis of all thedata. We provide a quantitative review and integra-tion of the ndings, a meta-analysis of the data. Ameta-analysis seems feasible and useful. The numberof studies is now large; certainly sufcient for infor-mative meta-analysis and, indeed, so large that con-sidering all the studies one by one is difcult or im-possible. In addition, although different qualitativereviews depict this database in contradictory fash-ions, many of the contradictions could be addressed by meta-analysis. Consider the essential dependentvariable, false-belief performance as reported acrossstudies. When studies show a mixture of above-, at-,and below-chance responding, as is the case for re-search on false belief, then a comprehensive poolingof the data across studies is needed to clarify the na-ture of childrens performance.

    Of equal theoretical importance are possible inde-pendent variables, such as the ages of the childrentested. In particular, investigations of false belief rea-soning have increasingly differed according to a vari-ety of task variables. The focal false-belief questionhas been asked in terms of action (where will Maxilook for his chocolate), in terms of thoughts (wheredoes Maxi think his chocolate is), and in terms of speech (where will Maxi say his chocolate is). Some-

    times the target object has been moved from its origi-nal location inadvertently, whereas at other times, themovement has been presented, emphatically, as a de-liberate deception designed to fool or trick the protag-onist. At various times the protagonist, Maxi, has been a puppet, a doll, a real person, or a video por-trayal. Moreover, the change-of-locations task dis-cussed earlier represents only one standard false- belief task among several. Another often-used task involves unexpected contents: Children see a crayon box, state it will have crayons inside, then open the box to nd it is lled with candies. Then they are

    asked about someone else, Mary, who has never

  • 8/12/2019 False Belief, Watson

    4/30

    658 Child Development

    looked inside the box, What will Mary think is inhere, crayons or candies? A variation on unexpected-contents tasks are unexpected-identity tasks, in whicha false belief is engendered by an object with a decep-tive identity (e.g., a sponge that looks just like a rock).

    The variety of independent variables contributesto an increasingly confused picture. Again meta-analysis could be especially informative; sufcientvariation exists on a number of independent variablesacross studies to allow for the examination of factorsthat have not been varied, or not comprehensivelyvaried, within studies.

    Among the extended set of variables we will con-sider in this study, ve deserve brief introduction be-cause of their prominence in research and theory. Therst variable is age. According to initial ndings,false-belief performance changes dramatically from 3

    to 5 years of age; this change may or may not remainwhen important task variations are accounted for.The second variable concerns deceptionwhetherthe child views the false-belief situation as happen-stance or a deliberate ploy to trick the protagonist.Several authors claim that framing the task in termsof explicit trickery reduces or eliminates young chil-drens errors (e.g., Chandler et al., 1989), althoughothers claim it does not (e.g., Sodian, Taylor, Harris, &Perner, 1991). The third variable concerns salience.For example, framing the task in terms of trickerymay be unimportant except that it serves to make

    mental state (being fooled or duped) more salient tothe child in comparison with the alternative real stateof affairs (where the chocolate really is). Salience is-sues have two separate aspects. Certain task manipu-lations arguably serve to enhance the salience of theprotagonists mental state; other manipulations serveto reduce the salience of the contrasting real state of affairs.

    The fourth variable of special import concernswhether the children are asked about someone elsesfalse belief or their own. In a false-belief task for self (sometimes called a representational change task), a

    child may be shown the crayon box, and that it con-tains candies, and then asked, Before you looked in-side, did you think the box contained crayons or can-dies? Different theoretical accounts make differentpredictions about the difculty of false-belief judg-ments for self versus others (as detailed later inthe Discussion section). Moreover, an empirically in-triguing question is, do children understand beliefsrst for their own case, for others, or both similarly?

    Finally, the national or community identity of thechildren tested needs to be considered, for example,whether children are from the United States, Austria,

    Japan, or a hunter gatherer traditional African soci-

    ety. Understanding people in terms of mental states ingeneral, and in terms of beliefs, more specically, can be argued to be a universal folk psychological stance(e.g., Wellman & Gelman, 1998) or a specic manifes-tation of a particular Anglo-European individualisticinterpersonal stance (e.g., Lillard, 1998).

    METHOD

    In a meta-analysis, conditions within studies, ratherthan individual participants or entire studies, com-prise the unit of analysis (Glass, McGraw, & Smith,1981). In our analyses, the proportion of children in acondition who demonstrated the target behaviorscorrect false-belief judgments versus errorswasused as the essential dependent variable. (More pre-cisely, we use the proportion of false-belief questions

    answered correctly, because many studies asked eachchild two or more false-belief questions.) A meta-analysis of such data is especially straightforward;many of the statistical problems that arise for othertypes of meta-analyses, which require the transfor-mation and collapsing of various derived inferentialstatistics (e.g., F tests, t tests) (see Glass et al., 1981),can be avoided given this type of comparable descrip-tive data across studies.

    In addition, we have been able to rule out aberra-tions due to sampling by supplementing our ordinaryleast squares estimates with bootstrap estimates of pa-

    rameters and their standard errors. The details of these analytic methods are presented below in the Re-sults section. Here, we present information on thestudies and conditions included in the analyses.

    Studies

    The total potentially relevant literature is large.Therefore, the rst task was to search pertinent jour-nals, review articles, databases, and chapters forstudies on several related topics, including false be-lief, mental representations, theory of mind, under-

    standing mental states, belief-desire reasoning, andfolk or naive psychology. The references of each cita-tion were searched for additional articles. The studiesgenerated in this fashion were supplemented by asearch through several conference abstracts and bysoliciting information from a variety of known re-searchers in this eld (e.g., P. L. Harris, A. Gopnik, S.Baron-Cohen, J. W. Astington, J. H. Flavell, M. Siegal,M. Chandler, K. Bartsch, J. D. Woolley, and C. Lewis).Our efforts to nd relevant studies were concluded in January 1998; no studies published or sent after thattime were included in the analyses. In retrospect, we

    have discovered that several studies published before

  • 8/12/2019 False Belief, Watson

    5/30

    Wellman, Cross, and Watson 659

    1998 were omitted; given the size of the literature andits publication in a great many different journals thiswas inevitable.

    Many studies that were encountered initially werenot considered further because they reported nofalse-belief tasks or conditions. Some studies or con-ditions were not included because they focused onlyon autistic or delayed samples; the focus in our studywas on normally developing children. In total 38 arti-cles were examined and excluded for these reasons.

    Not all studies or conditions purporting to includefalse-belief tasks with normally developing childrencould be included in the analyses. As already noted,different studies encompassed a wide variety of dif-ferent task situations and questions. Most of thesewere reasonably straightforward, at least whensorted in relevant, measurable ways (e.g., deceptionversus none, puppets versus live protagonists). Some,however, were so different as to be irrelevant (e.g.,conditions in which the real state of affairs was com-pletely unknown, and hence, whether the protago-nists belief was false or true was indeterminable).

    Moreover, for our procedures we needed compa-rably reported data on the dependent measure (pro-portion of correct false-belief judgments) and weneeded the data to be reported separately for childrenof various ages (e.g., 3- versus 4-year-olds) and forseparate tasks (e.g., locations versus contents tasks).Several studies collapsed their reported ndings to-gether across age or task features. For a number of these, more detailed data was sought and receivedfrom the authors, but this was not always possible orclarications were not always forthcoming.

    In total, our analyses encompassed 77 reports orarticles including 178 separate studies and 591 condi-tions. A list of the number of studies and conditionsappears in Table 1. These same 77 reports yielded 58conditions that were excluded for the reasons de-scribed earlier (less than 10% of the conditions thatwere included).

    One other exclusion was made. False-belief tasks

    typically ask not only the false-belief question (e.g.,Where does Maxi think his chocolate is?) but also oneor more control questions (e.g., Where is the chocolatereally? Where did Maxi put his chocolate?). Perfor-mance on these control questions varies, but typicallyeven the youngest children do reasonably well, andoften only children who answer the controls correctlyare then included in the nal data. Conditions inwhich fewer than 60% of the children answered thecontrol questions correctly or in which 40% or more of the initial subjects were dropped were excluded fromour analyses. A total of nine conditions were ex-

    cluded for these reasons; we assumed that these con-

    ditions involved atypically difcult tasks or ex-tremely confused children.

    In total, a large majority of false-belief research wasexamined, representative of both published and un-published false-belief studies in the eld as of January1998. The analyses of the data provided by thesestudies proceeded in several stages. In each, differentsubsets of the total data were used, as claried in theResults section.

    Coding

    Each condition included in the analyses was codedfor the dependent variable (proportion of correct re-sponses to the false-belief question) and for a varietyof features constituting the following independentvariables:

    1. Year of publication.2. Mean age and number of participants in a

    condition.3. Percentage of participants passing control ques-

    tions, and percentage dropped from the research.4. Country of participants: for example, the

    United States, United Kingdom, Austria, Japan,and so forth.

    5. Type of task: Three levels of task type, whichdistinguished locations versus contents versusidentity tasks (as described earlier).

    6. Nature of the protagonist: Five levels that de-scribed the protagonist as a puppet or doll; apictured character; a videotaped person; or areal person present in, or absent from, the cur-rent situation. Protagonists who were real,present persons were then also coded as eitherthe self or another person.

    7. Nature of the target object: Four levels that de-scribed the target object as a real object (e.g.,chocolate), a toy, a pictured object, or a video-taped object.

    8. Real presence of the target object: Two levels

    denoting whether, at the time the false-belief question was asked, the target object was realand present (e.g., chocolate was in the drawer,the crayon box contained candies) or not (e.g.,Maxis chocolate was used up and thus nowabsent, the crayon box was empty and thuscontained no real object).

    9. Motive for the transformation: Two levels cap-turing whether the key transformation (e.g.,the change of location or the substitution of un-expected contents) was done to explicitly trick the protagonist (deception), or for some other

    reason including for no explicit reason at all.

  • 8/12/2019 False Belief, Watson

    6/30

    660 Child Development

    10. Participation in the transformation: Threelevels describing whether the child initiallyhelped to set up the task props, engaged in ac-tively making the key transformation, or onlypassively observed the events.

    11. Salience of the protagonists mental state: Fourlevels describing whether the mental state hadto be inferred from the characters simple ab-sence during the key transformation, whetherthe characters absence was emphasized andexplicitly noted, whether the false-belief expe-rience was demonstrated initially on the chil-

    dren themselves (e.g., the child initially discov-ered that the crayon box contained candies), orwhether the characters mental state was ex-plicitly stated (e.g., the child was told Maxithinks it is in the cupboard) or pictured insome fashion (e.g., via a thought bubble).

    12. Type of question: Four levels denoting whetherthe false-belief question was phrased in termsof where the protagonist would look (or someother belief-dependent action the charactermight take), what hed think or believe, whathed say, or what hed know.

    Table 1 Listing of the Studies and Conditions Included in the Meta-Analysis

    Authors YearStudies

    Reported

    StudiesUsed in

    Meta-Analysis

    PrimaryConditionsIncluded in

    Meta-Analysis

    NonprimaryConditionsIncluded in

    Meta-Analysis

    TotalConditionsIncluded in

    Meta-Analysis

    Astington, Gopnik, and ONeill 1989 2 2 8 0 8Avis and Harris 1991 1 1 0 6 6Baron-Cohen 1991 1 1 0 1 1Baron-Cohen, Leslie, and Frith 1985 1 1 0 1 1Bartsch 1996 2 2 3 3 6Bartsch, London, and Knowlton 1997 2 1 1 0 1Bartsch and Wellman 1989 2 1 1 0 1Berguno 1997 1 1 3 0 3Carlson, Moses, and Hix 1998 3 2 3 0 3Carpendale and Chandler 1996 2 2 13 0 13Chandler and Hala 1994 4 4 14 0 14Chen and Lin 1994 1 1 2 2 4Clements and Perner 1994 1 1 4 0 4Custer 1996 1 1 0 2 2

    Dalke 1995 2 2 12 2 14Davis 1997 2 1 2 1 3Flavell, Flavell, Green, and Moses 1990 4 2 2 0 2Flavell, Mumme, Green, and Flavell 1992 4 2 2 2 4Freeman and Lacohee 1995 6 6 31 6 37Freeman, Lewis, and Doherty 1991 5 4 15 7 22Fritz 1992 2 2 10 2 12Frye, Zelazo, and Palfai 1995 3 2 12 0 12Ghim 1997 5 4 7 1 8Gopnik and Astington 1988 2 2 30 12 42Hala and Chandler 1996 1 1 8 0 8Hala, Chandler, and Fritz 1991 3 1 3 0 3Happe 1995 1 1 0 2 2Harris, Johnson, Hutton, Andrews, and Cooke 1989 3 2 0 5 5

    Hickling, Wellman, and Gottfried 1997 2 2 4 0 4Hogrefe, Wimmer, and Perner 1986 6 4 16 0 16 Johnson and Maratsos 1977 1 1 4 0 4Kikuno 1997 1 1 2 0 2Koyasu 1996 1 1 9 0 9Lalonde and Chandler 1995 1 1 6 0 6Leekam and Perner 1991 1 1 2 0 2Leslie and Thaiss 1992 2 2 4 0 4Lewis, Freeman, Hagestadt, and Douglas 1994 5 4 0 6 6Lewis and Osborne 1990 1 1 18 0 18Lillard and Flavell 1992 2 2 3 0 3

    (Continued)

  • 8/12/2019 False Belief, Watson

    7/30

    Wellman, Cross, and Watson 661

    13. Temporal marker: Two levels capturing whetherthe false-belief question explicitly mentionedthe time frame involved (When Maxi comes back, which place will he look in rst?) or not.

    Because the way in which an individual condi-tion was to be coded on the above measures wasnot always clear-cut, two independent raters coded94 conditions representing 20 studies. Agreement(agreements divided by agreements plus disagree-ments) ranged from 92 to 100%. Over all codings,reliability averaged 97%. On some variables (year

    of publication, country, and motive for the transfor-

    mation) agreement was perfect. The two lowest re-liabilities were 92% (for percent passing controlitems and for salience of the protagonists mentalstate). All disagreements were resolved throughdiscussion.

    RESULTS

    Primary Conditions

    As noted, the entire database for the meta-analysiscontained information on 591 false-belief conditions.

    In the rst wave of analyses, however, we included

    Table 1 Continued

    Authors YearStudies

    Reported

    StudiesUsed in

    Meta-Analysis

    PrimaryConditionsIncluded in

    Meta-Analysis

    NonprimaryConditionsIncluded in

    Meta-Analysis

    TotalConditionsIncluded in

    Meta-Analysis

    Mayes, Klin, and Cohen 1994 1 1 6 0 6Mazzoni 1995 1 1 1 1 2Mitchell and Lacohee 1991 3 3 7 0 7Moore, Pure, and Furrow 1990 2 1 2 0 2Moses 1993 2 2 0 10 10Moses and Flavell 1990 2 2 4 0 4Naito, Komatsu, and Fuke 1994 1 1 4 2 6Perner, Leekam, and Wimmer 1987 2 2 9 5 14Perner and Wimmer 1987 2 2 8 0 8Phillips 1994 7 1 2 0 2Riggs, Peterson, Robinson, and Mitchell 1998 4 2 2 0 2Robinson and Mitchell 1992 5 2 3 0 3Robinson and Mitchell 1994 3 3 5 0 5Robinson and Mitchell 1995 6 4 6 2 8

    Robinson, Riggs, and Peterson 1997 4 2 2 0 2Robinson, Riggs, and Samuel 1996 3 3 8 0 8Roth and Leslie 1998 2 2 4 6 10Ruffman, Olson, Ash, and Keenan 1993 3 3 2 5 7Russell and Jarrold 1994 3 3 4 4 8Saltmarsh, Mitchell, and Robinson 1995 5 5 12 0 12Seier 1993 1 1 0 1 1Sheffield, Sosa, and Hudson 1993 1 1 4 0 4Siegal and Beattie 1991 2 2 8 0 8Slaughter and Gopnik 1996 2 2 3 1 4Sullivan and Winner 1991 1 1 18 0 18Sullivan and Winner 1993 1 1 12 0 12Taylor and Carlson 1997 1 1 4 0 4Vinden 1996 1 1 4 8 12Wellman and Bartsch 1988 3 1 3 0 3Wellman, Hollander, and Schult 1996 4 1 3 0 3Wimmer and Hartl 1991 3 3 6 5 11Wimmer and Perner 1983 4 3 16 0 16Wimmer and Weichbold 1994 1 1 8 0 8Winner and Sullivan 1993 1 1 6 0 6Woolley 1995 2 2 7 1 8Yoon and Yoon 1993 2 1 16 0 16Zaitchik 1990 5 1 4 0 4Zaitchik 1991 1 1 12 0 12

    Totals 178 143 479 112 591

  • 8/12/2019 False Belief, Watson

    8/30

    662 Child Development

    only what we termed primary conditions. These wereconditions in which (1) subjects were within 14months of each other in age, (2) less than 20% of theinitially tested subjects were dropped from the re-ported data analyses (due to inattention, experimen-tal error, or failing control tasks), and (3) more than80% of the subjects passed memory and/or realitycontrol questions (e.g., Where did Maxi put thechocolate? or Where is the chocolate now?). Ourreasoning was that age trends are best interpretable if each conditions mean age represents a relatively nar-row band of ages; interpretation of answers to the tar-get false-belief question is unclear if a child cannot re-member key information, does not know where theobject really is, or cannot demonstrate the verbal facil-ity needed to answer parallel control questions. Inmost of the studies, few subjects were dropped, very

    high proportions passed the control questions, andages spanned a year or less, so primary conditions in-cluded 479 (81%) of the total 591 conditions available.The primary conditions are enumerated in Table 1;they were compiled from 68 articles that contained128 separately reported studies. Of the 479 primaryconditions, 362 asked the child to judge someoneelses false belief; we began our analyses by concen-trating on these conditions. On average in the pri-mary conditions, 3% of children were dropped from acondition, children were 98% correct on control ques-tions, and ages ranged 10 months around their mean

    values.In an initial analysis only age was considered as afactor. As shown in Figure 2, false-belief performancedramatically improves with age. Figure 2A showseach primary condition and the curve that best ts thedata. The curve plotted represents the probability of being correct at any age. At 30 months, the youngestage at which data were obtained, children are morethan 80% incorrect. At 44 months, children are 50%correct, and after that, children become increasinglycorrect. Figure 2B shows the same data, but in thiscase the dependent variable, proportion correct, is

    transformed via a logit transformation. The formulafor the logit is:

    ,

    where ln is the natural logarithm, and p is theproportion correct. With this transformation, 0 rep-resents random responding, or even odds of predict-ing the correct answer versus the incorrect answer.(When the odds are even, or 1, the log of 1 is 0, so thelogit is 0.) Use of this transformation has three major benets. First, as is evident in Figure 2B, the curvilin-

    ear relation between age and proportion correct is

    logit l n p1 p ------------ straightened, yielding a linear relation that allowssystematic examination of the data via linear regres-

    sion; second, the restricted range inherent to propor-tion data is eliminated, for logits can range fromnegative innity to positive innity; and third, thetransformation yields a dependent variable and ameasure of effect size that is easily interpretable interms of odds and odds ratios (see, e.g., Hosmer &Lemeshow, 1989).

    The top line of Table 2 summarizes the initial anal-ysis of age alone in relation to correct performance

    Figure 2 Scatterplot of conditions with increasing age show-ing best-t line. (A) raw scatterplot with log t; (B) proportioncorrect versus age with linear t. In (A), each condition is rep-resented by its mean proportion correct. In (B), those scores aretransformed as indicated in the text.

  • 8/12/2019 False Belief, Watson

    9/30

    Wellman, Cross, and Watson 663

    (transformed via the logit transformation). The farright column presents the measure of effect size: theodds of being correct increase 2.94 times for everyyear that age increases. At about 44 months of age,children are about 50% correct, or at 0 on the trans-formed measure; hence, at that age the odds of beingcorrect or incorrect are even or 1.0 (Figure 2B). The ef-fect size measure in Table 2 indicates the increase inthe odds of being correct for a more advantageousvalue of a variable. A 2.94 increase for age means thatfor children who are 1 year older (i.e., 56 rather than44 months of age) percent correct would be 74.6%. Interms of months, the effect size for age is 1.09 permonth, or an increase from 50% to 52% correct from44 to 45 months of age.

    A meta-analysis must be guided by an organizedset of questions and hypotheses. Our analyses were

    originally directed toward evaluating several base-line models of childrens performance in false-belief tasks. The simplest model is a null model predictingrandom performance on false-belief tasks. Randomperformance at some age might suggest that childrenare confused, have few relevant systematic concep-tions, or that the tasks tap little of childrens under-standings. Predictions of random performance contrastclearly with predictions of signicantly below-chanceperformance (systematic false-belief errors) and sig-nicantly above-chance performance (systematic cor-rect understanding of false beliefs). Developmen-

    tally, a baseline model predicts no developmental

    changefor example, all ages evidence essentiallyrandom performance, or all evidence similar above-chance performance.

    A central question for evaluating these baselinemodels is whether and where childrens performancesignicantly differs from chance. Ninety-eight per-cent of all false-belief conditions in the studies sam-pled used two locations (Maxis chocolate might be inthe drawer or the cupboard), or two limited identities(the crayon box usually contains crayons but nowcontains candies), thus chance responding represents50% correct (or a score of 0 in the transformed data).We calculated a 95% condence band around the best-t line shown in Figure 2B and then determinedwhere performances within this band fell below orabove 0. These condence bands are also depicted inFigure 2B.1 At younger agesessentially 41 months

    (3 years, 5 months) and youngerchildren per-formed below chance, making the classic false-belief error. At older agesessentially 48 months (4 years)and olderthey performed above chance, signi-cantly correct. As shown in Figure 2, then, average per-

    Table 2 Summary of Meta-Analytic Results for the Primary Conditions

    Variables Main Effect Interaction with Age Effect Size a

    Age: F(1, 360) 229.15, p .001 2.94 for 1 yearNonsignificant

    Year of publication F(1, 325) 2.65, p .10 F(1, 324) .96, p .32Type of task F(1, 359) .63, p .42 F(1, 358) .64, p .42Type of question F(3, 357) 1.96, p .11 F(3, 354) .81, p .48Nature of the protagonist F(4, 356) 1.66, p .44 F(4, 352) 1.15, p .33Nature of the target object F(3, 357) 2.49, p .06 F(3, 354) 2.35, p .07Self versus other F(1, 230) 1.77, p .18 F(1, 229) 3.10, p .07

    Main effectsMotive F(1, 359) 14.27, p .001 F(1, 358) .30, p .58 1.90Participation F(2, 358) 5.91, p .003 F(2, 356) 1.25, p .28 1.96Real presence F(1, 359) 16.05, p .001 F(1, 358) .63, p .42 2.17Salience F(3, 357) 4.28, p .006 F(3, 354) .98, p .40 1.92Country F(6, 345) 10.42, p .001 F(6, 359) 1.04, p .40 Australia versus United States 2.27

    United States versus Japan 1.48Interaction

    Temporal marker F(1, 358) 5.45, p .02 F(1, 358) 7.57, p .006

    a Effect sizes are presented only for signicant variables. Effect sizes were computed in odds ratios and represent the increased odds of being correct given a facilitating value of a variable, as explained in the text.

    1 The condence bands, here and throughout the analyses,are natural extensions of condence intervals for the populationmean, with the exception that condence limits are formed forthe conditional mean of the dependent variable at different com- binat ions of the independent variab les. These are simultaneouscondence bands that contain the error rate at .05, no matterhow many combinations of independent variables are investi-

    gated (Draper & Smith, 1981).

  • 8/12/2019 False Belief, Watson

    10/30

    664 Child Development

    formance changes rapidly during the period 3 to 4 years from signicantly wrong to signicantly correct.

    This age trend constitutes the foundation for anal-yses of the inuence of various factors on perfor-mance in which we examined the effects of the 12remaining independent variables (e.g., year of publi-cation, type of task, nature of the target object, and soforth, as overviewed in the Method section). Again,consider in advance various patterns of results thatmight emerge. As depicted in Figure 3A, one patternof results is that some additional variable (e.g., type of taska comparison between locations versus con-tents tasks) would have no effect on the foundationalage trajectory. Alternatively, as shown in Figure 3B,that variable may be signicant, but only as a main ef-fect that does not interact with age. That is, differ-ences on that variable affect performance similarly at

    all ages. Finally, as depicted in Figure 3C, a focal vari-able could interact signicantly with age. A variety of patterns might yield signicant interactions with age, but the one shown in Figure 3C is especially relevantfor discriminating early competence accounts fromclaims of conceptual change. To conrm early compe-tence claims, task modications should inuenceyounger childrens performance (and thus accountfor their poor performance relative to older children),and, in particular, some task version(s) should in-crease young childrens responding to above-chancelevels.

    We tested for such patterns as follows. First, linearregression was used to screen for main effects and alltwo-way interactions with age. If interactions withage were absent, the interaction term was droppedfrom the regression and main effects were then tested.For those variables shown to have a signicant effect,condence bands were constructed around the ob-tained regression lines to determine where these linesdiffered from chance. Finally, these primary analyseswere corroborated both by using more assumption-free bootstrapping methods and by examining the en-tire omnibus dataset of 591 conditions. Due to the

    large database, a conservative .01 level for signi-cance was generally adopted. Because interaction ef-fects are of particular theoretical importance, how-ever, any interactions at the .05 level were consideredsignicant as well.

    As shown in Table 2, six variables did not signi-cantly affect performance, neither by themselves norin interaction with age; ve were signicant but failedto interact with age; and one signicantly interactedwith age.

    Nonsignicant variables.Given 362 conditions, afailure to nd signicance with such a powerful test is

    noteworthy. For example, although year of publication

    is of no theoretical interest, it provides a meta-analyticcheck for the possibility that an initial intriguing nd-ing can shrink or disappear as later experiments aredevised with better controls and procedures (Green &Hall, 1984). Some authors contend that early reports

    of young childrens false-belief errors are plaguedwith just such problems. However, the meta-analysisshows that false-belief results from earlier studies arevirtually identical to those from later studies.

    More important, however, is that type of task, typeof question, nature of the protagonist, and nature of the target object were all nonsignicant. To conrmgraphically the absence of some of these variables assignicant factors, Figure 4 shows the data for type of question and type of task. Type of question concernswhether the false-belief question is phrased in termsof what the character will think, know, or say, or

    where he will look. As can be seen in Figure 4A, thesevariations make little difference.As noted, false-belief tasks come in three essential

    forms: change-of-location tasks, unexpected-contentstasks, and unexpected-identity tasks. In fact, unexpected-identity tasks are rare (used in only 44 of 362 condi-tions), and preliminary analysis showed that they didnot differ from unexpected-contents tasks, whichthey closely resemble. Therefore, unexpected-identitytasks were collapsed with unexpected-contents tasksand Figure 4B thus plots the data for these twomain task types, locations versus contents (including

    identity). As shown, these task types are remarkablyequivalent.In addition, the medium in which the false-belief

    task is presented has no signicant effect. That is, itmakes no difference if the protagonist is presented asa real person, a puppet, a doll, a pictured storybook character, or a videotaped person. Similarly, as longas a concrete target object is present at the time thefalse-belief question is asked, it makes no differencewhether the object is a real item (a piece of ediblechocolate), a toy (a small toy car), a picture of an ob- ject (a drawing of a piece of chocolate), and so forth.Most studies maintained concordances across protag-

    Figure 3 Three hypothetical patterns of results: (A) No effect,(B) main effect, and (C) interaction.

  • 8/12/2019 False Belief, Watson

    11/30

    Wellman, Cross, and Watson 665

    onist and object. If a real person was used as the pro-tagonist, then a concrete object or toy object was usedas the object. Likewise, a doll or puppet protagonistwas matched with a toy object and a drawing of a girlwas matched with a drawing of a piece of chocolate.At least within these loose connes, children do notreason better or worse about real people versus dolls,or about real chocolate versus pictured chocolate.

    Self versus other.The analysis for nature of the pro-tagonist, described thus far, included only conditionsin which the protagonist was someone else. Condi-

    tions in which the false-belief question was asked

    about the childs own belief (self-conditions) were ex-cluded for all the analyses summarized at the top of Table 2, with the exception of the line labeled self ver-sus others. To compare whether false-belief judgmentsfor self differ from those for other people, it was notappropriate to simply compare all self-conditions toall other conditions-because self-questions were askedfor only a limited number of tasks. All self-questionsare about a personthe child (rather than, e.g., apuppet or doll); all self-tasks are content or identitytasks, not location tasks. By their nature, all self-tasksrequire the child to experience an initial demonstra-tion (e.g., to say a crayon box will contain crayons andthen see that it contains candies instead). To insurecomparability, therefore, the 117 self-conditions werecompared with the 118 other-conditions in which theprotagonist was a person, the task was a content oridentity task, and the child viewed an initial demon-stration of or participated actively in making the keytransformation. As Figure 5 shows, in this analysis of 235 conditions, childrens correct responses to false- belief questions for self versus other did not differ,and were virtually identical at the younger ages.

    Figure 5 also shows that the essential age trajectoryfor tasks requiring judgments of someone elses false belief is paralleled by an identical age trajectory forchildrens judgment of their own false beliefs. Youngchildren, for example, are just as incorrect at attribut-ing a false belief to themselves as they are at attribut-ing it to others.

    Main effects.Five variables were signicant as maineffects, but did not interact with age. In essence, thesefactors can enhance childrens performance, but in

    Figure 4 (A) Proportion correct versus Age Type of Ques-tion, and (B) proportion correct versus Age Type of Task.

    Figure 5 Proportion correct versus Age Self/Other.

  • 8/12/2019 False Belief, Watson

    12/30

    666 Child Development

    doing so they leave the underlying developmentaltrajectory (from incorrect to correct performance) un-changed. Figure 6 graphically depicts the effects formotive for the transformation (motive), participa-tion in the transformation (participation), salienceof the protagonists mental state (salience), and realpresence of the target object (real presence).

    Motive concerned the motive presented to thechild for the change of location or the unexpectedcontents. Sometimes a deceptive motive was explic-itly stated (the chocolate was moved to trick the pro-tagonist), sometimes it was not (either because nomotive was statedthe crayon box just containedcandies, or, rarely, a nondeceptive motive wasmentionedto put them away). As is clear in Figure6A, a deceptive motive enhances childrens perfor-mance, and does so for children of all ages.

    Table 2 presents F values for all the variables aswell as effect sizes for signicant variables. To reiter-ate, just as for age, this odds ratio measure indicatesthe increase in the odds of being correct for a moreadvantageous value of the variable: for example, formotive, the increase in odds of being correct for taskswith deceptive motives over those without deceptivemotives. As shown in Table 2, this value for motive is1.90, that is, a deceptive motive increases the odds of being correct by 1.90. In percentage terms, if childrenare 50% correct at 44 months without deception, thenthey are 66% correct with deception.

    Participation by the child in transforming the tar-get object is also important. Often children are essen-tially passive onlookers; for example, they watch assomeone transfers Maxis chocolate from one place toanother. However, in some tasks children help set up

    Figure 6 (A) Proportion correct versus Age Motive; (B) proportion correct versus Age Participation; (C) proportion correct

    versus Age Salience; (D) proportion correct versus Age Real Presence.

  • 8/12/2019 False Belief, Watson

    13/30

    Wellman, Cross, and Watson 667

    or manipulate the initial task situation or story props.Further, in some tasks the children themselves makethe essential transformation, for example, movingMaxis chocolate, or taking crayons out of a crayon box and putting in candies instead. As shown inFigure 6B, this last type of engagementactivelymaking or helping make the crucial transformationpositively inuences childrens performance at all theages tested. As shown in Table 2, actively makingthe transformation increases the odds of being correct1.96 over a baseline of being a passive onlooker. Inpercentage terms, if at 44 months children who arepassive onlookers are 50% correct, then children whoare actively involved in transforming the task mate-rials are 66% correct.

    The variables real presence and salience both argu-ably inuence the extent to which the task focuses onmental-state information. Again, consider false-belief tasks as providing information about two realms of content: real-world contents and mental-state con-tents. One might attempt to make mental state morefocal either indirectly by diminishing the salience of the contrasting real-world contents, or directly by en-hancing the salience of mental states. The variablereal presence describes the status of the target objectat the time the false-belief question is askedwhether the true state of affairs is instantiated by areal and present object (e.g., Maxis chocolate in thecupboard, or the candies in the crayon box) or not(e.g., Maxis chocolate was removed from the drawerand eaten, and, thus, is not now real and present). Asshown in Figure 6D, if the object is not real and present,children are more likely to answer correctly. With aneffect size of 2.17, if children at 44 months are 50% cor-rect with real and present objects, then they are 68%correct if the object is not real and present. Becausethis variable does not interact with age, however, both values of real presence leave the basic age trajec-tory unchanged. Moreover, even when there is no realand present object, the younger children do not per-form at above-chance levels; 95% condence bands

    around that line substantially overlap with zero.Thus, the youngest children may move from below-chance performance to at-chance performance, but onlyolder children move to above-chance performance.

    Comparably, salience of the protagonists mentalstate is also signicant. In this case, as shown in Fig-ure 6C, most task variations are equivalent. For ex-ample, it does not matter if the protagonist is merelyabsent when the transformation is made and themental state must be inferred, whether the charac-ters absence is emphasized, or if the false-belief situation and experience is initially demonstrated on

    the children themselves (e.g., they discover the

    crayon box contains candies). Yet, if the protagonists belief itself is clearly stated or pictured, this signi-cantly raises performancefor example, youngerchildren move from below-chance to at-chance per-formance. Even if mental states are stated or pic-tured, however, young children do not achieve above-chance performance and the basic age trajectoryremains the same.

    The 44 conditions included as stated or picturedare worth unpacking further. In one type of statedor pictured condition, the protagonists belief isstated (e.g., Maxi thinks his chocolate is in thedrawer.) and then the false-belief question askswhere he will look (e.g., Where will Maxi look forhis chocolate?). To be correct the child must at leastrecognize the implication of Maxis thought for his behavior in a situation in which behaving accordingto belief means Maxi will search for his chocolatewhere it isnt. In a second type of stated or picturedcondition, the protagonists belief is again stated(e.g., Maxi thinks his chocolate is in the drawer.)and the false-belief question asks what Maxi thinks(e.g., Does Maxi think his chocolate is in the draweror the cupboard?). Arguably, in this type of task children could respond correctly by simply repeating back the earlier belief statement (Maxi thinks hischocolate is in the drawer.). Finally, a third type of stated or pictured condition shows the protagonists belief in pictorial form. For example, at the time Maxiputs his chocolate in the drawer the child is asked tochoose a picture showing where the chocolate is atthis point (and the picture is either kept in view or re-moved from view). Or, Maxi may be shown with athought bubble depicting his belief (e.g., a thought bubble of chocolate in the drawer and not the cup- board). Intriguingly, all these variations producelargely the same results: they all are helpful, but noneprovide evidence for the type of interaction noted inFigure 3C.

    Finally, the country of origin inuences perfor-mance. Figure 7, which includes lines for the seven

    countries in which there were six or more total con-ditions, shows that at any one age, children fromvarious countries can perform better or worse thaneach other. Nonetheless, children in all countriesexhibit the same developmental trajectory. Condi-tions from the United States and the United King-dom represent the largest sample (together con-tributing 48% of all conditions). Thus, using thosecountries as a baseline, children in Korea performsimilarly; children in Australia and Canada per-form somewhat better; and those in Austria and Japan perform somewhat worse. The effect size

    values shown in Table 2 present two extremes. If at

  • 8/12/2019 False Belief, Watson

    14/30

    668 Child Development

    44 months of age children in the United States are50% correct, then children in Australia are 69% cor-rect and children in Japan are 40% correct.

    Interaction. One variable, temporal marking, inter-acts with age. This variable refers to whether thefalse-belief question emphasizes the time frame in-

    volved (e.g., When Maxi comes back, where will helook rst for his chocolate?). As shown in Figure 8,only at older ages does including temporal informa-tion in the target question signicantly increase cor-rect performance. Temporal markers increase thelength and complexity of false-belief questions. Con-ceivably this complexity may hinder, or at least fail toenhance, young childrens performance, althougholder children seem to benet from the clarity this ad-ditional information provides.

    Bootstrap Analyses

    Data for meta-analyses often fail to meet certainstandard statistical assumptions. For example, in ouranalyses some studies contributed 10 or more conditionsto the dataset but others contributed only one or twoconditions, making independence assumptions prob-lematic. Bootstrap analyses are more assumption free.

    The basic idea of the bootstrap is to replace the the-oretical normal distribution with the empirical distri- bution formed by the data themselves, N observa-tions (Freedman & Peters, 1984). One then repeatedlysamples with replacement ( B samples of size N ) fromthis empirical distribution, each time computing the

    coefcients of the regression model and storing theresults. When the procedure is completed, there are Bestimates of each parameter, each computed on a ran-domly selected set of N observations. The mean of these B estimates is the bootstrap estimate of the pa-rameter; the standard deviation of the B estimates isused to estimate the standard error of each parameter.

    The key analyses in this study were conrmed byconducting parallel bootstrap analyses (where N 362 and B 1,000). For example, consider the regres-sion model underlying Figure 2B. If only age is usedto account for false-belief judgments, then R2 .39,and the following coefcients are obtained (with stan-dard errors in parentheses)

    proportion correct (transformed) 3.96 [constant] .09 [age].

    (.305) (.006)Using bootstrap procedures to calculate the parame-ters of this model 1000 times yielded mean values of

    3.98 for the constant and .09 for age, with the stan-dard deviation for these parameters being .303 and.006, respectively. In short, this bootstrap analysisclosely replicates the earlier nding in all respects.

    Bootstrap analyses conrmed all the effects re-ported above. In particular, bootstrap analyses con-rmed the absence of interactions between age andany other signicant factor, except temporal marking.

    Omnibus Analyses

    The ndings from primary conditions were alsocorroborated by conducting similar analyses on the

    Figure 7 Proportion correct versus Age Country for the pri-mary conditions.

    Figure 8 Proportion correct versus Age Temporal Marker.

  • 8/12/2019 False Belief, Watson

    15/30

    Wellman, Cross, and Watson 669

    entire omnibus dataset. For example, the analysis of age, depicted in Figure 2, includes the data from 362primary conditions; the parallel omnibus analysis in-cluded 453 conditions. Age was signicant in the om-nibus analysis, replicating the primary analysis. Theregression equations are closely similar:

    DV (primary) 3.96 [constant] .090 [age]R2 .39;

    DV (omnibus) 3.51 [constant] .090 [age]R2 .28.

    The principal difference is that the omnibus analysisis noisier, yielding a lower R2 and consequently ac-counting for less of the overall variance.

    Similarly, each variable that was signicant in theprimary analyses was re-analyzed, using the omnibusdataset. Results closely replicated the primary analy-ses, although in each case they proved noisier andthus accounted for less overall variance. Specically,results for motive, real presence, participation, sa-lience, and country again evidenced main effects andno interactions. Temporal marking evidenced the sameinteraction that was apparent in the primary analyses(shown in Figure 8). We prefer the primary analy-ses over the omnibus ones because, methodologi-cally, the primary analyses focused on studies with better controls and more complete data, and, analyti-cally, the primary analyses provided a better account-ing of the variance. Nonetheless, the close similarityof the results from the primary and omnibus analysesfurther attests to the robustness of the ndings.

    The omnibus data allow for the most comprehen-sive look at the inuence of childrens country of ori-gin on their false-belief reasoning. It is reasonable thatduring attempts to tailor tasks to different culturalmaterials, nonstandard tasks might result. Figure 9, inconjunction with Figure 7, therefore, presents morecomplete information about country. The line for theUnited States in Figure 9 closely replicates the U.S.line in Figure 7 for the primary analysis, and thus

    serves as a baseline for other comparisons. The otherlines in Figure 7 depict performance by children inWestern and non-Western literate, developed coun-tries. The omnibus data, however, include childrenfrom two nonliterate, more traditional communities:the line in Figure 9 for Africa depicts responses fromhuntergatherer Baka children (Avis & Harris, 1991)and the line for Peru depicts data from Quechua-speaking Peruvian Indians (Vinden, 1996). Just as inthe primary analyses, country is clearly signicant but does not interact with age. For these two culturalcommunities, just as for the others shown in Figure 7,

    childrens false belief performance increases across

    years in equivalent age trajectories, although at anyone age children from different countries and culturescan perform differently.

    Multivariate Accounting

    To summarize, four task variables help the perfor-mance of younger (as well as older) childrenmotive,participation, salience, and real presence. One non-task variable, country, also included values that en-hanced performance. In all the analyses reported thusfar, younger children (3040 months of age) remain below or at-chance level. It remains to assess, how-ever, if a combination of task enhancements might en-able very young children to perform systematicallyabove chance level. In addition, it is important todetermine what combined model best predicts chil-drens false-belief performances. Finally, testing mul-tiple factors in a combined model is important because the signicant inuence of one factor maydisappear if the other factors are controlled for aswell. For example, the effects for country may con-

    ceivably be due to investigators in one country exclu-sively using more helpful task variations; or the ap-parent benets of active participation may disappearif motive is also included in the analysis (if, for exam-ple, almost all active participation tasks also includeddeception).

    We tested combined models on primary conditions because regression predictions were more precise forprimary conditions rather than the omnibus set(which, to reiterate, consistently yielded similar, butstatistically noisier, results). The rst model testedincluded all variables that signicantly enhanced

    young childrens performance in the earlier two-way

    Figure 9 Proportion correct versus Age Country for threedifferent countries within the omnibus conditions.

  • 8/12/2019 False Belief, Watson

    16/30

    670 Child Development

    analysescountry, motive, participation, salience,real presence, and age. With these factors included,the multiple R .74, and R2 .55. Hence these factorsaccounted for 55% of the variance in childrens cor-rect false-belief judgments. Again, bootstrap analysesclosely replicated this regression model.

    Because country is a subject variable not under ex-perimental control, it was excluded in the next modeltested. In this model all factors except participationsignicantly and independently contributed to theoverall prediction, multiple R .68, R2 .47. Thenal model therefore dropped participation and in-cluded motive, salience, real presence, and age, mul-tiple R .68; R2 .46.2

    Using this nal model three predicted sets of ef-fects were then examined, as shown in Figure 10. Thebest-effects alternative shows predicted perfor-

    mance when values of all variables are maximally en-hanced. Therefore, for motive, the best-effects modelwas based on framing the transformation in terms of explicit deception of the protagonist; for salience, thismodel was based on explicitly stating or picturing theprotagonists (false) mental contents; and for realpresence, this model was based on the absence of areal target object at the time of the childs judgment.The worst-effects alternative examined predictedperformances in the presence of detrimental values of these variables. For example, for motive the worst-effects model was based on the absence rather than the

    presence of explicit deception. The no-effects pre-dictions were based on neutral or average values of each variable. As shown in Figure 10A, the no-effectsprediction closely mimics the overall age changes re-vealed in the initial analysis of age alone (shown inFigure 2B). Most important, the best-effects predic-tion in Figure 10A is identical in slope to the no-effectsline; even the best-effects combination of these en-hancing variables does not differentially enhanceyounger childrens performance. Moreover, a 95%condence band that is t around the best-effects linesubstantially overlaps with 0 or chance performance.

    These condence bands are shown in Figure 10B.Specically, the best-effects combination does notyield signicant above-chance performance at theyoungest ages; only at 40 months and older does that

    combination of variables produce signicant above-chance performance.

    DISCUSSION

    The integration of information across the many studiesof childrens understanding of false beliefs has be-come a necessity. Yet, the numerous studies make in-tegration difcult, thereby encouraging reviewers toignore some troublesome results or view the ndingsthrough the lens of prior theoretical commitments. Ameta-analytic, quantitative integration of the researchcan overcome these difculties.

    Meta-analyses, of course, have potential weaknesses.For example, they can be undermined if publication biases work against certain, usually nonsignicant,

    ndings. However, the literature on false beliefs is re-plete with studies of above-chance, chance, and below-chance ndings because the occurrence of false-belief errors has become a contentious issue. Furthermore,in our analyses, we took care to include unpublishedas well as published research.

    More generally, meta-analyses are dependent onthe quality of the accumulated studies to date. Wehave addressed concerns about quality by concentrat-ing on a primary set of conditions where, on aver-age, 98% of subjects passed relevant control tasks andfew subjects were dropped. Moreover, a conrmatory

    analysis with a larger omnibus set of conditions

    Figure 10 (A) Prediction intervals for nal model. (B) Con-dence bands for the best effects predictions.

    2 This combined model arguably provides the most precise es-timates of effect size for these key variables, because the inu-ence of any one variable is controlled for the inuence of othervariables in this model. The effect size values generated from thismodel were: age, 3.32 (for each increase of 1 year); motive, 2.10;real presence, 2.63; and salience, 1.80. Comparing these values tothose in Table 2 shows close agreement as well as some modestadjustments. Note that in this combined model the effect of age is

    not attenuated (as expected by an early competence hypothesis).

  • 8/12/2019 False Belief, Watson

    17/30

    Wellman, Cross, and Watson 671

    produced results almost identical to the original anal-ysis. Reservations about meta-analytic ndings alsoarise because of concerns with the appropriateness,assumptions, and robustness of various techniquesfor pooling indirect measures of effect signicance(e.g., p values). The analyses in this study were morestraightforward; we analyzed proportions of correctanswers, rather than derived measures from statisti-cal treatments that vary across studies. Finally, ouranalyses encompassed a great number of task varia-tions. Yet when these variations were organized into asystematic set of factors, the results clustered consis-tently with the exception of only a few outliers. This istestimony to the robustness of the basic phenomena,the tasks used to assess it, and our meta-analyticprocedures.

    Using these meta-analytic procedures, a variety of substantive results emerged. The combined-effectsmodels made it possible to predict approximately50% of the variance in false-belief judgments. This is astrong accounting of the variance, because in meta-analyses there are a host of factors that typically atten-uate ndings; in this case, there were differencesacross studies in experimenter characteristics and ex-pertise, differences in exact stimuli and proceduresused, linguistic differences across a variety of coun-tries, and differences in child-rearing cultures sam-pled. In what follows, the substantive results areconsidered in a series of ordered sets, and the impli-cations of the results with regard to current proposalsabout theory-of-mind development are addressed.

    Development

    The basic nding in this study is the presence of asubstantial effect for age in every analysis. Correctperformance signicantly increases with increasingage; in some cases, correct performance increasesfrom chance to above chance, but in most cases it in-creases from below chance to above chance. Suchndings clearly support the initial claims of substan-

    tial development during these preschool years, andcontradict recent suspicions that developmentalchange is nonexistent (Mitchell, 1996) or is connedto only a few standard tasks that are unusually de-manding (Chandler et al., 1989).

    Noneffects

    Many potentially relevant variables (e.g., nature of the protagonist, nature of the target object, type of question, and type of task) systematically failed to af-fect childrens age-related performances. These re-

    sults conrm those of several individual studies that

    have failed to nd differences for these variables.However, detecting a lack of differences when com-paring small samples of 12 to 20 children is problem-atic. In contrast, because our meta-analysis encom-passed 350 to 500 different conditions representingmore than 4,000 children, it is certainly powerfulenough to detect differences on these variables, if they existed.

    The null ndings of this study have importantmethodological implications. Knowing that valid andcomparable assessments of false-belief performanceare uninuenced by a variety of task specics, exper-imenters can condently vary their tasks over an ex-tended set of possibilities for ease of presentation andto achieve other experimental contrasts. More cru-cially, this lack of effects is of theoretical importanceas well. That childrens false-belief judgments are sys-tematically unrelated to such task variations in-creases the likelihood that their judgments reect ro- bust, deep-seated conceptions of human action, ratherthan task-specic responses provoked by the specialfeatures of one set of materials or questions. Speci-cally, the irrelevance of these task and proceduralvariations increases the likelihood that childrens per-formance is systematically dependent on the onething that does not vary, namely their conception of belief states.

    For adults, the task variations mentioned above areconceptually equivalent or irrelevant; it is a substan-tive nding that young children treat them as equiva-lent as well. Consider the notion of belief that istargeted by standard false-belief tasks. The beliefsinvolved are determined by informational accesswhether or not Maxi saw the chocolate being moved but should be uninuenced by irrelevant individualdifferences in the target protagonists, such as theirage, gender, and their real life presence versus video orstorybook nature (Miller, in press). The meta-analyticndings show that even young children appropri-ately understand the irrelevance of several protag-onist differences for an understanding of belief.

    Early Competence versus Conceptual Change

    As noted in the Introduction, our ndings speak toan important divide between two general accountsof young childrens poor false-belief performance.One account argues for conceptual change, that is,developmental changes in performance on false- belief tasks reect genuine changes in childrensconceptions of persons. A contrasting account ar-gues for early competence, that is, even young chil-dren have the necessary conception; their poor task

    performances reect instead information-processing

  • 8/12/2019 False Belief, Watson

    18/30

    672 Child Development

    limits, unnecessarily demanding tasks, or confusingquestions. Fodor (1992, p. 284), for example, arguedthat something like belief-desire psychology is in-nate: The experimental data offer no reason to be-lieve that the 3-year-olds theory of mind differs inany fundamental way from adult folk psychology.In particular, there is no reason to suppose that the3-year-olds theory of mind suffers from an absentor defective notion of belief. Chandler et al. (1989,pp. 12631264) outline two competing opinionsabout the understanding of mind, and thus aboutthe understanding of beliefs. There are the boost-ers, who advocate the . . . early onset view, andthe scoffers, who share a delayed-onset perspec-tive. Chandler and colleagues (p. 1266) championthe booster point of view and claim that the stan-dard assessment task . . . conates the . . . capacity to

    entertain beliefs about beliefs with the altogetherdifferent ability to comment on this understanding,and is more tortuous and computationally complexthan is necessary or appropriate.

    Empirical conrmation of early competence ac-counts requires that there be some version of the tar-get task that indeed demonstrates enhanced perfor-mance by young children, when task limitations have been eliminated or reduced. Our ndings show thatseveral task manipulations do increase young chil-drens performance: framing the task in terms of ex-plicit deception or trickery, involving the child in

    actively making the key transformations, and high-lighting the salience of the protagonists mental stateor reducing the salience of the contrasting real-worldstate of affairs, all help young children to perform better.

    Early competence accounts require more than justimproved performance, however. First, such accountsrequire that relevant task manipulations differentiallyenhance young childrens performance. This require-ment derives from the essential claim that such task factors mask early competence, thereby artifactuallyproducing apparent developmental differences on

    the target tasks. Second, such accounts require dem-onstrations of above-chance performance. A task manipulation that raises young 3-year-olds perfor-mance from below-chance to chance performancemay be methodologically important in that the ma-nipulation reduces features that systematically leadchildren to choose an incorrect response option. Ran-dom at-chance performance is, however, neither sys-tematically misled nor systematically informed; itprovides no evidence of correct, conceptual under-standing. Third, demonstrations of above-chance per-formance must extend beyond children at an interme-

    diate, transitional age. If children at an intermediate

    age are aided, but younger children still genuinely failthe task, then the timing of a developmental change, but not its absence versus presence, may be at issue.Thus, for early competence accounts, task variationsshould interact with performance in a specic devel-opmental pattern.

    None of these three pieces of evidence for an earlycompetence view were sustained in the meta-analysis:older children as well as younger children were aided by certain task manipulations; no set of manipu-lations boosted younger childrens performance toabove chance; and only one variable, presence or ab-sence of temporal marking in the test question, inter-acted with age, but not in an early competence pat-tern. Even a combined-effects model, statisticallyassembling the most powerful package of helpfultask manipulations, failed to produce above-chance

    performance in the youngest children and failed tointeractively change the shape of the basic develop-mental pattern of performance across these years. Itis important to note that the nding of an interaction between age and temporal marking demonstratesthat the meta-analysis was able to detect interactionsin the data, if they were present.

    Conceptual change accounts also require specicempirical ndings. To begin with, a task is neededthat plausibly assesses a target conceptual under-standing, and performance on that task must changefrom incorrect to systematically above-chance judg-

    ments with age. Condence in both good and poorperformance on the task must additionally be bol-stered by correct judgments on control tasks or oncontrol questions that demonstrate memory for keyinformation and grasp of the task format. False-belief tasks, especially those included in the primary analy-ses, t these criteria. As shown in Figure 2, for exam-ple, performance on such tasks dramatically changeswith age, and 98% of all children in primary condi-tions pass relevant control tasks.

    Conceptual change accounts require more thansimply age changes on a well-controlled standard

    task, however. They are also subject to a general multi-method approach to construct validity: a variety of tasks, all conceptually similar but varying in theirtask specics, should lead to similar developmentalchanges. That is, the more the same developmentalchange is demonstrated in a variety of conceptuallysimilar tasks (which vary in their specic features anddemands), the less likely it is that the change is due tospecic information-processing strategies tied to spe-cial task formats, or to limitations in understandingor using particular response formats. In this regard it bears repeating that in the meta-analysis a wide vari-

    ety of task materials and formats yielded equivalent

  • 8/12/2019 False Belief, Watson

    19/30

    Wellman, Cross, and Watson 673

    developmental performances. The response require-ments of the tasks included in the meta-analysis alsovaried widely: in some tasks the child could answer by simply pointing, or could answer a question aboutthe characters behavior (Where will he search?) orhis mental state (What will he think?). Indeed, al-though nonverbal tasks were not available at thetime we conducted the meta-analysis, similar age tra- jectories are obtained when the task is completelynonverbal (Call & Tomasello, 1999).

    Conceptual change accounts do not require thattask demands and information-processing limita-tions have no inuence; on the contrary, conceptualunderstandings can only be manifest in specictasks via processes of information utilization andexpression. Conceptual change accounts do requirethat task demands and processing limitations notcompletely account for performance across age sothat control of such factors does not eliminate alarger developmental pattern. The meta-analysisshowed that a number of task manipulations doinuence childrens false-belief performances. Al-though these factors are important in their ownright, they nonetheless leave the basic developmen-tal pattern unchanged, even when considered incombination.

    Meta-analytic results capture signicant trendsacross studies. Of course, not every individual studyaccords with the group trends. Of particular rele-vance in the current case is whether there are any in-dividual studies that reported clear early competencepatterns of data. Most important is whether anystudies reported above-chance performance fromvery young children (e.g., 2-year-olds). Such resultswould appear in the top left quadrant of Figure 2A.Inspection of Figure 2A shows that two data pointsstand out from the rest as indicating highly correctperformance (better than 85% correct) from 30- and36-month-olds. These data come from a study byShefeld, Sosa, and Hudson (1993) using a simpli-ed task in which there was no real presence of the

    target object and the characters mental state was ex-plicitly stated.Why might Shefeld and colleagues conditions

    emerge as outliers (e.g., was the task that they usedespecially sensitive, or artifactually easy)? In Shef-eld et al.s simplied task, children see a red and agreen box and discover that both are empty. ThenCookie Monster enters and explicitly says, I think my cookie is in the red box. Immediately after thisstatement the child is asked the false-belief question:Where does Cookie Monster think his cookie is? To be correct on this task, therefore, the child need only

    repeat back what Cookie Monster just said, poten-

    tially without any real understanding of belief at all.Arguably, therefore, Shefeld et al.s ndings repre-sent artifacts of this particular task format.

    In studies other than Shefeld et al. (1993), correctperformance also could be achieved, potentially, bychildren merely repeating what was told them. In Fla-vell, Flavell, Green, and Moses (1990), the childknows that a cup hidden behind a screen is blue. An-other person, Ellie, who enters the room, cannot seethe cup, but explicitly states, I think the cup iswhite. Then the child is asked, Do you think thecup is white or blue? and nally is asked, Does Elliethink the cup is white or blue? In the Flavell et al.study, 3-year-olds systematically erred by saying Elliethinks the cup is blue. When children err in this fash-ion it is remarkable, because to err they avoid a verysimple response strategyjust repeat Ellies ownwords. Incorrect answers, therefore, may informa-tively indicate problems with conceptualizing false beliefs. Correct responses with such a procedure aremuch less interpretable, however, because, when cor-rect, the child may indeed be simply parroting whatthey just heard.

    A related issue concerns the results of the com- bined models. The best-effects model shows pre-dicted responses if children are presented with tasksthat combine several key enhancing features. Nostudies, however, have included a task with all foursignicantly enhancing factors: that is, a task in whichthe salience is pictured/stated, the motive is decep-tive, the object is not real and present, and the childparticipates actively in the transformation. Shefeldet al. (1993) included only two of these four enhancingfeatures, and only one condition in the meta-analysisincluded three of these factors in a task for young pre-schoolers. Woolley (1995, Experiment 1) tested young3-year-olds ( M 39 months) with a task in which themotive was deceptive, the object was not real andpresent, and the child actively participated in thetransformation. In Woolleys study, young childrenperformed below chance: 14 correct responses out of

    38 false-belief questions, or 37% correct. Thus, thepredictions of the combined-effects model are bothderived from and are largely consistent with ob-served results from individual studiesvery youngchildren typically fail to perform signicantly abovechance, even when tested on tasks that combine task manipulations that enhance performance.

    With regard to a general contrast between concep-tual change and early competence accounts, then, themeta-analysis suggests that an important conceptualchange in childrens understanding of persons is tak-ing place between the ages of 2 and 5 years. Against

    this general background, the detailed results of the

  • 8/12/2019 False Belief, Watson

    20/30

    674 Child Development

    meta-analysis also speak to several more specic the-oretical proposals.

    Chandlers and Leslies Accounts

    Recall that Chandler (1988; Chandler et al., 1989)proposed that early competence at false-belief judg-ments is masked by unnecessarily demanding fea-tures of the standard tasks. In contrast to standardtasks, Chandler argued, tasks framed in terms of explicit deception, which enlist the child in concoct-ing the deceptive circumstances and minimize ver- bal demands by having the child point or nonverballyarrange props, would better reveal young chil-drens early competence. In their research, tasksemploying these features have been found to en-hance young childrens performance (e.g., Chandler

    et al., 1989; Hala, Chandler, & Fritz, 1991), althoughin other studies this has not been the case (e.g.,Sodian et al., 1991). The meta-analysis showed thatthese featuresespecially deception and activeparticipationindeed aid performance. The patternof ndings for these factors, however, failed tot an early competence model more specically. Toreiterate, these task manipulations, although help-ful, do not alter the shape of the general develop-mental trajectory and do not raise the youngest chil-drens performance to systematically above-chanceperformance.

    Leslie also proposed a specic early competenceaccount (e.g., Leslie & Roth, 1993; Roth & Leslie, 1998)in which understanding persons mental attitudes orstates, such as the belief that X is so,is the result of aspecial Theory-of-Mind Mechanism (ToMM) that isactivated early in development. Performance in anytask situation, however, depends not simply onToMM but also on a Selection Processor (SP) that canlimit the application of ToMM in any specic case.According to this account, standard false-belief tasksplace large demands on SP and hence mask, ratherthan reveal, young childrens Theory-of-Mind com-

    petence. In contrast, nonstandard tasks can reducethese demands thereby revealing early ToMM com-petence. Thus, Leslie (Leslie & Roth, 1993, p. 99) notesapprovingly that three-year-olds do succeed on somenon-standard false-belief tasks (see, e.g., Mitchell &Lacohee, 1991; Roth & Leslie, 1998; Wellman &Bartsch, 1988). The arguments above against earlycompetence accounts more generally apply to Lesliesspecic account; manipulations of the type that Leslieendorses do aid performance, but across numerousstudies in the meta-analysis such manipulations didnot systematically produce an early competence pat-

    tern of results.

    Conversational Accounts

    Several authors claim that young children havespecial difculties with the verbal-conversational as-pects of standard false-belief tasks. As conversationalskills improve, children become able to reveal theirpre-existing conceptual competence. For example, be-ginning with Lewis and Osborne (1990), investigatorshave worried that the typical false-belief question,(Where will Maxi look for his chocolate?) is unclear.Perhaps young children interpret the question as,Where will Maxi end up looking, after he rst failsto nd his chocolate?, thereby helpfully resolvingany unclarity in favor of Maxis success. Siegal(Siegal, 1997; Siegal & Beattie, 1991) more comprehen-sively considers several potential conversationalproblems. For example, standard tasks typicallyinclude repeated questioning of the child. In an un-expected contents task, for example, the child isasked, What do you think is in here?; then What isreally in here?; followed by What will George think is in here? With repeated questioning, young chil-dren may alter their answers, producing erroneousresponses albeit conceptually understanding false- belief problems. Switching answers for conversa-tional reasons might be especially problematicwhen children are given multiple false-belief trialsin a row.

    The meta-analysis showed that revising the stan-

    dard tasks to overcome potential conversational dif-culties can inuence childrens performance, asshown for the variable temporal marking. A tem-porally marked question such as Where will Maxirst look for his chocolate? claries that the questionis about Maxis rst search and not a later successfulsearch. Providing temporal clarication of this typedoes aid performance, but, opposite to what an earlycompetence account would require, temporal mark-ing fails to enhance very young childrens perfor-mance, enhancing judgments only for older children.

    What about repeated questioning? Childrens con-

    sistency in responding to multiple false-belief trials isrelevant here. In many studies, children receive a sin-gle false-belief trial. But just as often children receivetwo or more false-belief trials (i.e., they are presentedwith several equivalent false-belief tasks in a row inwhich only the characters and objects are changedacross trials). The dependent variable in the presentanalyses (proportion correct) was summed acrossfalse-belief trials of any equivalent type. Summingacross multiple trials was deemed appropriate fortwo reasons. Practically, research articles often do not break down proportion correct on separate trials,

    only providing information summed across trials.

  • 8/12/2019 False Belief, Watson

    21/30

    Wellman, Cross, and Watson 675

    Empirically, when studies do provide informationabout childrens consistency across trials, they aretypically found to be highly consistent. Within the 178studies included in the present analyses, there were52 reports of consistency across trials. This data took several forms but could be converted across studiesinto the proportion of times that responses on a sec-ond trial agreed with those on the rst trial (propor-tion of agreement, or PA). Across these 52 reports,mean PA was .84 ( SD .14); on average, childrengave identical responses to two or more similar false- belief trials 84% of the time. Consistency was corre-lated with age, r(51) .28, p .05, but even in condi-tions in which childrens mean age was less than 44months (19 of the 52 reports) the mean PA was .81.

    These results speak to conversational concerns thatstandard tasks encourage answer switching and thusmask early competence. Children are quite consistenton multiple false-belief tasks of the same type. Recallalso that childrens performance does not differ if thequestion is asked in terms of look, think, orsay; or if responses require simply pointing at onelocation or another, yesno answers, or longer ver- balizations. In short, the ndings argue againststraightforward early competence conversational ac-counts. Of course, the link between conversation andconception is not merely that the former may mask t