the most dangerous equation -...

9
The Most Dangerous Equation Ignorance of how sample size affects statistical variation has created havoc for nearly a millennium Howard Wainer W liat constitutes a dangerous equation? There are two obvious interpretations: Some equations are dangerous if you know them, and others arc dangerous if you do not. The first category may pose danger because the secrets within its bounds open doors be- hind which lies terrible peril. The obvious win- ner in this is Einstein's iconic equation e = nic~, for it provides a measure of the enormous energy hidden within ordinary matter. Its destructive capability was recognized by Leo Szilard, who then instigated the sequence of !'.• The Goldsmiths't!iiiiipLii f;p fi Figure 1. Trial of the pyx has been performed since 1150 A.D. In the trial, a sample of minted coins, say 100 at a time, is compared with a stan- dard. Limits are set on the amount that the sample can be over- or underweight. In 1150, that amount was set at 1/400. Nearly 600 years later, in 1730, a French mathematician, Abraham de Moivre, showed Ihat the standard deviation does not increase in proportion to the sample. Instead, it is proportional to the square root of the sample size. Ignorance of de Moivre's equation has persisted to the present, as the author relates in five examples. This ignorance has proved costly enough that the author nominates de Moivre's formula as the most dangerous equation. www.americanscientist.org 2007 May-June 249

Upload: others

Post on 05-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

The Most Dangerous Equation

Ignorance of how sample size affects statistical variationhas created havoc for nearly a millennium

Howard Wainer

Wliat constitutes a dangerous equation?There are two obvious interpretations:

Some equations are dangerous if you knowthem, and others arc dangerous if you do not.The first category may pose danger becausethe secrets within its bounds open doors be-

hind which lies terrible peril. The obvious win-ner in this is Einstein's iconic equation e = nic~,for it provides a measure of the enormousenergy hidden within ordinary matter. Itsdestructive capability was recognized by LeoSzilard, who then instigated the sequence of

!'.• The Goldsmiths't!iiiiipLii f ; p fi

Figure 1. Trial of the pyx has been performed since 1150 A.D. In the trial, a sample of minted coins, say 100 at a time, is compared with a stan-dard. Limits are set on the amount that the sample can be over- or underweight. In 1150, that amount was set at 1/400. Nearly 600 years later, in1730, a French mathematician, Abraham de Moivre, showed Ihat the standard deviation does not increase in proportion to the sample. Instead,it is proportional to the square root of the sample size. Ignorance of de Moivre's equation has persisted to the present, as the author relates infive examples. This ignorance has proved costly enough that the author nominates de Moivre's formula as the most dangerous equation.

www.americanscientist.org 2007 May-June 249

Page 2: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

Howard Wainer is Distin-guisliai Ri'seiirch Scientist atthe Nalioiml Botml ofMediailExaminers and nn ndjunctprofessor ofsttitislics at HieWliarton School of the Uni-versity ofPennsylvimia. Hehas published 16 books, mostrecenthi, Testlet ResponseTheory and Its Applications(Cambridge University Press).Heisa Fellow of the AmericanStatistical Association andwas awarded the 2007 Nation-al Council on Measurementin Education Career Acliieve-ment Award for Contributionsto Educational Measurement.Address: National Board ofMedical Examiners, 3750Market St.. Philadelphia, PAJ9105. Internet:hzi>ainer®nbme.org

events that culminated in the construction ofatomic bombs.

Supporting ignorance is not, however, the di-rection I wish to pursue—^indeed it is quite theantithesis of my message. Instead I am inter-ested in equations that unleash their danger notwhen we know about them, but rather whenwe do not. Kept close at hand, these equationsallow us to understand things clearly, but theirabsence leaves us dangerously ignorant.

There are many plausible candidates, andI have identified three prime examples: Kel-ley's equation, which indicates that the truthis estimated best when its observed value isregressed toward the mean of the group thatit came from; the standard linear regressionequation; and the equation that provides uswith the standard deviation of the samplingdistribution of the mean—what might becalled de Moivre's equation:

a ^ = <T / Vn

where O- is the standard error of the mean, ois the standard deviation of the sample andn is the size of the sample. (Note the squareroot symbol, which will be a key to at least oneof the misunderstandings of variation.) DeMoivre's equation was derived by the Frenchmathematician Abraham de Moivre, who de-scribed it in his 1730 exploration of the bino-mial distribution. Miscellanea Anahjtica.

Ignorance of Kelley's equation has provedto be very dangerous indeed, especially toeconomists who have interpreted regressiontoward the mean as having economic causesrather than merely reflecting the uncertaintyof prediction. Horace Secrist's The Triumph ofMediocrity in Business is but one example listedin the bibliography. Other examples of failureto understand Kelley's equation exist in thesports world, where the expression "sopho-more slump" merely describes the likelihoodof an average season following an especiallygood one.

The familiar linear regression equationcontains many pitfalls to trap the unwary.The correlation coefficient that emerges fromregression tells us about the strength of thelinear relation between the dependent andindependent variables. But alas it encouragesfallacious attributions of cause and effect. Iteven encourages fallacious interpretation bythose who think they are being careful. ("1may not be able to believe the exact value ofthe coefficient, but surely I can use its signto tell whether increasing the variable willincrease or decrease the answer") Tlie linearregression equation is also badly non-robust,but its weaknesses are rarely diagnosed ap-propriately, so many models are misleading.When regression is applied to observationaldata (as it almost always is), it is difficult toknow whether an appropriate set of predictors

has been selected—and if we have an inappro-priate set, our interpretations are questionable.It is dangerous, ironically, because it can be themost useful model for the widest variety ofdata when wielded with caution, wisdom andmuch interaction between the analyst and thecomputer program.

Yet, as dangerous as Kelley's equation andthe common regression equations are, I findde Moivre's equation more perilous still. I ar-rived at this conclusion because of the extremelength of time over which ignorance of it hascaused confusion, the variety of fields thathave gone astray and the seriousness of theconsequences that such ignorance has caused.

In the balance of this essay I will describefive very different situations in which igno-rance of de Moivre's equation has led to bil-lions of dollars of loss over centuries yieldinguntold hardship. Tliese are but a small sam-pling; there are many more.

The Trial of the PyxIn 1150, a century after the Battle of Hastings, itwas recognized that the King of England couldnot just mint money and assign it to have anyvalue he chose. Instead the coinage's valueneeded to be intrinsic, based on the amount ofprecious materials in its make-up. And so stan-dards were set for the weight of gold in coins—a guinea, for example, should weigh 128 grains(there are 360 grains m an ounce). In the trial ofthe pyx—tlie pyx is actually the wooden boxthat contains the standard coins—samples aremeasured and compared with the standard.

It was recognized, even then, that coinagemethods were too iniprecise to insist that al!coins be exactly equal in weight, so instead theking and the barons who supplied the LondonMint (an independent organization) with goldinsisted that coins when tested in the aggre-gate (say 100 at a time) conform to the regu-lated size plus or minus some allowance forvariability. They chose 1/400th of the weight,which for one guinea would be 0.28 grainsand so for the aggregate, 28 grains. Obviously,they assumed that variability increased pro-portionally to tlie number of coins and not toits square root, as de Moivre's equation wouldlater indicate. This deeper understanding layalmost 600 years in the future.

The costs of making errors are of two types.If the average of all the coins was too light, thebarons were being cheated, for there would beextra gold left over after minting the agreednumber of coins. This kind of error is easilydetected, and, if found, the director of the mintwould suffer grievous punisliment. But if theallowable variability was larger than neces-sary, there would be an excessive number of tooheavy coins. The mint could thus stay within thebounds specified and still provide the opportu-nity for someone at the mint to collect these

250 American Scientist, Volume 95

Page 3: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

Figure 2. A cursory glance at the distribution of the U.S. counties with the lowest rates of kidney cancer (teal) might lead one to conclude thatsomething about the rural lifestyle reduces the risk of that cancer. After all, the counties with the lowest 10 percent of risk are mainly Midwest-ern, Southern and Westem counties. When one examines the distribution of counties with the highest rates of kidney cancer (red), however,it becomes clear that some other factor is at play Knowledge of de Moivre's equation leads to the conclusion that what the counties with thelowest and highest kidney-cancer rates have in common is low population—and therefore high variation in kidney-cancer rates.

overweight coins, melt them down and recastthem at the correct lou-er weight. This wouldleave the balance of gold as an excess paymentto the mint. The fact that this error continued foralmost 600 years pro\ndes strong support for deMoivre's equation to be considered a candidatefor the title of most dangerous equation.

Life in the Country: Haven or Threat?Figure 2 is a map of the locations of of countieswith unusual kidney-cancer rates. Tlie coun-ties colored teal are those that are in the lowesttenth of the cancer distribution. We note thatthese healthful counties tend to be very rural,Midwestern, Southern or Western. It is botheasy and tempting to infer that this outcome isdirectly due to the clean living of the rural life-style—no air pollution, no water pollution, ac-cess to fresh food without additives and so on.

Tlie counties colored in red, however, beliethat inference. Although they have much thesame distribution as the teal counties—in fact,they're often adjacent—they are those thatare in the highest decile of the cancer distribu-tion. We note that these unhealthful countiestend to be very rural, Midwestern, Southemor Westem. Tt would be easy to infer that thisoutcome might be directly due to the poverty

of the rural lifestyle—no access to good medi-cal care, a high-fat diet, and too much alcoholand tobacco.

What is going on? We are seeing de Moivre'sequation in action. The variation of the meanis inversely proportional to the sample size, sosmall counties display much greater variationthan large counties. A county with, say, 100inhabitants that has no cancer deaths wouldbe in the lowest category. But if it has 1 cancerdeath it would be among the highest. Countieslike Los Angeles, Cook or Miami-Dade withmillions of inliabitants do not bounce aroundlike that.

Wlien we plot the age-adjusted cancer ratesagainst county population, this result becomesclearer still (see Figure 3). We see the typicaltriangle-shaped bivariate distribution: Whenthe population is small (left side of the graph)there is wide variation in cancer rates, from 20per 100,000 to 0; when county populations arelarge (right side of graph) there is very Uttlevariation, with all counties at about 5 cases per100,000 of population.

The Small-Schools MovementTlie urbanization that characterized the 20thcentury led to the abandonment of the rural

www.americansdentist.org 2007 May-June 251

Page 4: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

20 1

15 -

10 -

ra 5

• •• •

a

1 1—

••

• •

—1—'~]

PM ^

—1 1

102 103 10"population

Figure 3. When age-adjusted kidney-cancer rates in U.S. counties areplotted against the log of county population, the reduction of varia-tion with population becomes obvious. This is the typical triangle-shaped bivariate distribution.

lifestyle and, with it, an increase in the sizeof schools. The era of one-room schoolhouseswas replaced by one with large schools—oftenwith more than a thousand students, dozensof teachers of many specialties and facilitiesthat would not have been practical without theenormous increase in scale. Yet during the lastquarter of the 20th century, there were the be-ginnings of dissatisfaction with large schoolsand the suggestion that smaller schools couldprovide a better education, in the late 1990sthe Bill and Melinda Gates Foundation begansupporting small schools on a broad-ranging,intensive, national basis. By 2001, the Foun-dation had given grants to education proj-ects totaling approximately $1.7 billion. Theyhave since been joined in support for smallerschools by the Annenberg Foundation, theCarnegie Corporation, the Center for Col-laborative Education, the Center for SchoolChange, Harvard's Change Leadership Group,the Open Society Institute, Few CharitableTrusts and the U.S. Department of Education'sSmaller Learning Communities Program. Tiieavailability of such large amounts of moneyto implement a smaller-schools policy yieldeda concomitant increase in the pressure to doso, with programs to splinter large schoolsinto smaller ones being proposed and imple-mented broadly {New York City, Los Angeles,Chicago and Seattle are just some examples).

What is the evidence in support of such achange? There are many claims made about theadvantages of smaller schools, but I will focushere on just one—that when schools are smaller,student achievement improves. The supportingevidence for this is that among high-performing

schools, there is an unrepresentatively largeproportion of smaller schools.

In an effort to see the relation between smallschools and achievement, Harris Zwerling andI looked at the performance of students at allof Pennsylvania's public schools, as a functionof school size. As a measure of school perfor-mance we used the Pennsylvania testing pro-gram (PSSA), which is very broad and yieldsscores in a variety of subjects and over the en-tire range of precoUegiate school years. Wlienwe examined the mean scores of the 1,662separate schools that provide 5th-grade-read-ing scores, we found that of the top-scoring50 schools (the top 3 percent) six were amongthe smallest 3 percent of the schools. This isan over-representation by a factor of four. Ifsize of school was unrelated to performance,we would expect 3 percent to be in this selectgroup, yet we found 12 percent. The bivari-ate distribution of enrollment and test score isshown in Figure 4.

We also identified the 50 lowest-scoringschools. Nine of tbese (18 percent) were amongthe 50 smallest schools. This result is complete-ly consonant with what is expected from deMoivre's equation—smaller schools are expect-ed to have higher variance and hence should beover-represented at both extremes. Note that theregression line shown on the left graph in Figure4 is essentially flat, indicating that overall, thereis no apparent relation between schcxil size andperformance. But this is not always true.

The right graph in Figure 4 depicts 11th-grade scores in the PSSA. We find a similarover-representation of small schools on both ex-tremes, but this time the regression line showsa significant positive slope; overall, studentsat bigger schools do better. This too is not un-expected, since very small high schools can-not provide as broad a curriculum or as manyWghly specialized teachers as can large schools.A July 20, 2005, article in the Seattle Weekly de-scribed the conversion of Mountlake TerraceHigh School in Seattle from a large suburbanschool with an enrollment of 1,800 studentsinto five smaller schools, greased with a GatesFoundation grant of almost a million dollars.Although class sizes remained the same, eachof the five schools had fewer teachers. Studentscomplained, "There's just one English teacherand one math teacher ... teachers ended upteaching things they don't really know." Per-haps this anecdote suggests an explanation forthe regression line in Figure 4.

Not long afterward, the small-schools move-ment took notice. On October 26, 2005, The Se-attle Times reported:

[t]he Gates Foundation amtounced lastweek it is moving away from its empha-sis on converting large high schools intosmaller ones and instead giving grants

252 American Scientist, Volume 95

Page 5: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

1,700-

1.500-

1,300-

1,100-

900-

700

1,800-H

1,600-

£ 1.400-E

£ 1.200-

1,000-

800300 600

enroilment900 1,200 500 1,000 1,500

enrollment2,000 2,500 3,000

Figure 4. In the 1990s, it became popular to champion reductions in the size of schools. Numerous philanthropic organizations and govern-ment agencies funded the division of larger school based on the fact that students at small schools are over-represented in groups with hightest scores. Shown here at left are math test scores from 1,662 Pennsylvania 5th-grade schools. The 50 highest-performing schools are shown inblue and the 50 lowest in green. Note how the highest- and lowest-performing schools tend to group at low enrollment—just what de Moivre'sequation predicts. The regression line is nearly flat, though, showing that school size makes no overall difference to 5th-grade mean scores.Math scores for 11 th-grade schools were also calculated (right). Once again, variation was greater at smaller schools. In this case, however, theregression line has a significant positive slope, indicating that the mean score improved with school size. This stands to reason, since largerschools are able to offer a wider range of classes with teachers who can focus on fewer subjects.

to Specially selected school districts witha track record of academic improv^ementand effective leadership. Education lead-ers at the Foundation said they concludedthat improving classroom instruction andmobilizing the resources of an entire dis-trict were more important first steps toimproving high schools than breakingdown the size.

This point of view was amplified in a studypresented at a Brookings Institution Conferenceby Barbara Schneider, Adam E. Wyse and Ve-nessa Keesler of Michigan State University. Anarticle in Education Week that reported on thestudy quoted Schneider as saying, "I'm afraidwe have done a terrible disservice to kids."

Spending more than a billion dollars ona theory based on ignorance of de Moivre'sequation—in effect serving only to increasevariation—suggests just how dangerous thatignorance can be.

The Safest CitiesIn tlie June 18, 2006, issue of the New YorkTimes there was a short article that listed theten safest United States, cities and the ten leastsafe based on an Allstate Insurance Companystatistic, "average number of years betweenaccidents." The cities were drawn from the 200largest dties in the U.S. With an understand-ing of de Moivre's equation, it should come asno surprise that a list of the ten safest cities, the

ten most dangerous cities and the ten largestcities have no overlap (see Figure 5).

Sex Differences in PerformanceFor many years it has been well establishedthat there is an over-abundance of males at thehigh end of academic test-score distributions.About twice as many males as females re-ceived National Merit Scholarships and otherhighly competitive awards, Historically, someobservers used such results to make inferencesabout differences in intelligence between thesexes. Over the past few decades, however,most enlightened investigators have seen thatit is not necessarily a difference in level but adifference in variance that separates the sexes.Public observation of this fact has not alwaysbeen greeted gently, witness the recent outcrywhen Harvard (now ex-) President LawrenceSummers pointed this out. Among other com-ments, he said:

It does appear that on many, many, dif-ferent human attributes—height, weight,propensity for criminality, overall lQ,mathematical ability, scientific ability—there is relatively clear evidence thatwhatever the difference in means—whichcan be debated—there is a difference instandard deviation/variability of a maleand female population. And it is truewith respect to attributes that are and arenot plausibly, culturally determii^ed.

www.americanscientist.org 2007 May-June 253

Page 6: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

city

ten safest

Sioux FallsFort CollinsCedar RapidsHuntsvilleChattanoogaKnoxvilleDes MoinesMilwaukeeColorado SpringsWarren

ten least safe

NewarkWashingtonElizabethAlexandriaArlingtonGlendaieJersey CityPatersonSan FranciscoBaltimore

ten biggest

New YorkLos AngelesChicagoHoustonPhiladelphiaPtioenixSan DiegoSan AntonioDallasDetroit

state

South DakotaColorado

IowaAlabama

TennesseeTennessee

IowaWisconsinColoradoMichigan

New JerseyDC

New JerseyVirginiaVirginia

CaliforniaNew JerseyNew JerseyCaliforniaMaryland

New YorkCalifornia

IllinoisTexas

PennsylvaniaArizona

CaliforniaTexasTexas

Michigan

populationrank

1701821901291381241031948169

642518917411492741481418

12345678910

population

133,834125.740122,542164.237154.887173,278196,093586,941370,448136,016

277,911563,384123,215128.923187,873200,499239,097150,782751,682628,670

8.085,7423,819,9512,869,1212,009,6901,479.3391,388,4161.266,7531,214,7251,208,318911,402

numberof yearsbetween

accidents

14.313.213.212.812.712.612.612.512.312.3

5.05.15.45.76.06.16.26.56.56.5

8.47.07.58.06.69.78.98.07.310.4

Figure 5. Allstate Insurance Company ranked the ten safest and ten least-safe U.S.cities based on the number of years drivers went between accidents. The Neiv YorkTimes reported on this in 2006. By now the reader will not be surprised to find thatnone of the ten largest cities are among either group.

The males' score distributions are almost al-ways characterized by greater variance than thefemales'. Thus while there are more males at thehigh end, there are also more at the low end.

An example, chosen from the National As-sessment of Educational Progress (NAEP), isshown in Figure 7 NAEP is a true survey, soproblems of self-selection (rife in college en-trance exams, licensing exams and so on) aresubstantially reduced. The data summarizedin the table are over 15 years and five sub-jects. In all instances the standard deviationof males is from 3 to 9 percent greater thanfemales. This is true both for subjects in whichmales score higher on average (math, science,geography) and lower (reading).

Both inferences, the incorrect one about dif-ferences in level, and the correct one aboutdifferences in variability, cry out for expla-

nation. The old cry would have been "whydo boys score higher than girls?" The new-er one should be "why do boys show morevariability?" If one did not know about deMoivre's result and only tried to answer thefirst question, it would be a wild goose chase,a search for an explaiitition for a pbenomenonthat does not exist. But if we focus on greatervariability in males, we may find pay dirt.Obviously the answer to the causal question"why?" will have many parts. Surely social-ization and differential expectations must bemajor components^—especially in the past, be-fore the realization grew that a society cannotcompete effectively in a global economy witbonly half of its workforce fully mobilized. Butthere is another component that is key—andespecially related to the topic of this essay.

In discussing Lawrence Summers's remarksabout sex differences in scientific ability, Chris-tiane Nusslein-Volhard, the 1995 Nobel laure-ate in physiology and medicine, said:

He missed the point. In mathematics andscience, there is no difference in the intel-ligence of men and women. The differ-ence in genes between men and womenis simply the Y chromosome, which hasnotliing to do with intelligence.

But perhaps it is Professor Nusslein-Volhardwho missed the point here. The Y chromosomeis not the only genetic difference between thesexes, although it may be the most obvious.Summers's point was that when we look at ei-ther extreme of an ability distribution we willsee more of the group that has greater varia-tion. Mental traits conveyed on the X chro-mosome will have larger variability amongmales thiin females, for females have two Xchnimosomes, whereas males have an X and aY. Thus, from de Moivre's equation we wouldexpect, all other things being equal, about 40percent more variability among males than fe-males. The fact that we see less than 10 percentgreater variation in NAEP scores demandsthe existence of a deeper explanation. First, deMoivre's equation requires independence ofthe two X chromosomes, and with assortativemating this is not going to be true. Addition-ally, both X chromosomes are not expressedin every cell. Moreover, there must be majorcauses of high-level performance that are notcarried on the X chromosome, and still othersthat Indeed are not genetic. But for some skills,perhaps 10 percent of increased variability islikely to have had its genesis on the X chromo-some. This obser\'ation would be invisible tothose, even those with Nobel prizes for workin genetics, who are ignorant of de Moivre'sequation.

It is well established that there is evolution-ary pressure toward greater variation within

254 American Scientist, Volume 95

Page 7: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

species—within the constraints of geneticstability. This is evidenced by the dominanceof sexual over asexual reproduction amongmammals. But this leaves us with a puzzle.Why was our genetic structure built to yieldgreater variation among males than females?And not just among humans, but virtually aUmammals. The pattern of mating suggests ananswer. In most mammalian species that re-produce sexually, essentially all adult femalesreproduce, whereas only a small proportion otmales do (modern humans excepted). Thinkof the alpha-male tion SLirrounded by a prideof females, with lesser males wandering aim-lessly and alone in the forest roaring in frustra-tion. One way to increase the likelihood of off-spring being selected to reproduce is to havelarge variance among them. Thus evolutionarypressure would reward larger variation formales relative to females.

ConclusionIt is no revelation that humans don't fullycomprehend the effect that variation, and es-pecially differential variation, has on whatwe obsen'e. Daniel Kahneman's 2002 Nobelprize in economics was for his studies on in-tuitive judgment (which occupies a middleground "between the automatic operationsof perception and the deliberate operations ofreasoning"), Kaluieman showed that humar^sdon't intviitively "know" that smaller hospitalswould have greater variability in the propor-tion of male to female births. But such inabilityis not limited to humans making judgments inKalineman's psychology experiments.

Routinely, small hospitals are singled outfor special accolades because of their exempla-ry performance, only to slip toward averagein subsequent years. Explanations typicallyabound that discuss how their notoriety hasoverloaded their capacity. Similarly, small mu-tual funds are recommended, after the fact, byWall Street analysts only to have their subse-quent performance disappoint investors. Thelist goes on and on adding evidence and sup-port to my nomination of de Moivre's equa-tion as the most dangerous of them all. Thisessay has been aimed at reducing the peril thataccompanies ignorance of that equation.

BibliographyBaumol, W. J., S. A. B. Blackman and E. N. Wolff. 1989.

Pwtiiictivit}/ mni Aiiienavi leadership: The Um^ Vieiv.CiiiTibridge and London: MIT Press.

Beckner, W- 1983. The Case for the Simller School. Bloom-ington, Indiana: Phi Delta Kappa Educadona! Foun-dation. HD 228 002.

Camevale, A. 1999. Sfrivers. Wall Street fourim!. AugList 31.

de Moivre, A. 1730, Miscellanea Analytica. London: Ton-son and Watts.

Dreifus, C. 2006. A conversation with ChHstianeNiisslein-Viilhard. The New York Times, F2, July 4.

Figure 6. Former Harvard President Lawreiuo Summers received some sharp criti-cism for remarks he made concerning differences in science and math performancebetween the sexes. In particular. Summers noted that variance among the test scoresof males was considerably greater than that of females and that not all of it could beconsidered to be based on differential environments.

Dunn, F. 1977. Choosing sm.illness. In Education in iiumlAnuricn: A Reassestiiiifiit of Conveutioiml V^isdom, ed. J.Shcr. Boulder, Colo.: West\-iew Press.

Fowler, W. J., jr. 1995. School size and student outcomes. InAdvances in Education Pwductivit}/: Vol. 5. OrganizationalInfluences oji Prtxludivitif. ed. H. J. Walberg, B. Levin andW. J. Fowler, Jr. Greenwich, Conn.: Jai Press.

Friedman, M. 1992. Do old fallacies ever die? journal ofEconomic Literature (30)2129-2132.

subject

math

science

reading

geography

U.S. history

year

199019921996200020032005

199620002005

199219941998200220032005

19942001

19942001

mean scale scoresmale

263268271274278280

150153150

254252256260258257

262264

259264

female

262269269272277278

148146147

267267270269269267

258260

259261

Standard deviationsmate

373738393737

363736

363736343635

3534

3333

female

353637373535

333534

353533333434

3432

3131

male:femaleSD ratio

1.061.031.031.051.061.06

1.09 j1.061.06

1.031.061.091.031.061.03

1.031.06

1.06 j1.06 3

Figure 7. Data from the National Assessment of Educational Progress show just theeffect that Lawrence Summers claimed. The standard deviation for males is from 3to 9 percent greater on all tests, whether their mean scores were belter or worse thanthose of females.

www.americanscientist.org 2007 May-June 255

Page 8: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed

Geballe, B. 2005. Bill Gates' Guinea pigs. Seattle Weekly, 1-9.

Gelman, A., and D. Nolan. 2002. Tc-admifi Slatlstics: AB(7j; of Tricks. Oxford: Oxfoi'd University Press.

Hotelling, H. 1933. Review of The Triumph of Medioa-ihiill Business, by H. Secrist. ]ournal of the American Stn-Hsfical Association (28)463-465.

Howley, C. B. 1989. Syntliesis of the effects of school anddistrict size: Wh.it research say^ about achievementin small schcxil.s and school districts, loiirnn! of Ruralami Small Schools

KaiTiicman, D. 2002. Maps of bounded rationality: Aperspfc'Cti\'e on intuitive judgment and choice. Nobe!Prize Lecture, December 8, 2002, Stockholm, Swe-den. httpV/notielprize.org/nobeLprii^es/econom-ics / la urea tes/ 2002 / kahnetnan-lecture.htm 1

Kelley, T. L. 1927. The hiterprciation of Eihicatioml Mea-surements. New York: World Book.

Kelley, T L. 1947. Fundamentals of Statistics. Cambridge:Harvard University Press.

King, W. I. 1934. Review of The Triumph of Mediocrityin Business, by H. Secrist. loiirnal of Political Economif(4)2:398-4^)0. '

Laplace, Pierre Simon. 1810. Memoire sur !es approxi-mations des formulas qui sent functions de tres-grand nombres, et sur leur application aux probabili-ties. Mcmoiref do In citjsse des sciences matliemntiqueset physiques de I'histitut de Trance, annev 1809, pp.3534l5 , Supplement pp. 559-565.

Larson, R. L. 1991. Small is beautiful: Innovation fromthe inside out. Phi Deltn Kappan. March, pp. 550-554.

Maeroff, G. 1. 1992. To Improve schools, reduce theirsize. College Board News 20(3):3.

Mills, F. C. 1924. Statistical Methods Applied to Economicsand Business. New York: Henry Holt.

Mosteller, R, and J. W. Tukcy. 1977. Data Analysis andRegression. Reading, Mass.: Addison-Wesley.

Schneider, B., A. E. Wyse and V. Keesler. 2007. Is smallreally better? In Brookings Papers on Educatio7i Polia/

2006-2007, ed. T. Loveless and F. Hess. Washington:Brookings Institution.

Secrist, H. 1933. The Triumph of Mediocrity in Business.Evanston, 111.: Bureau of Business Research, North-western University.

Sharpe, W. F. 1985. Investments. Englewood Cliffs, N.J.:i'rentice-Hail.

Stigler, S. 1997. Regression toward the mean, historicallyconsidered. Statistical Methods in Medical Research6:103-114.

Stigler, S. M. 1999. Statistics on the Table. Cambridge,Mass.: Harvard University Press.

ViadtTo, D. 2006. Smaller not necessarily better, school-size study concludes. Education Week 25(39):12-13.

Wainer, H., and L. Brown. 2007. Three statisHcal para-doxes in the interpretation of group differences: Illus-trated with medical school admission and licensing.In Handbook of Statistics {Volume 27) Psi/chometrics,ed. C. R. Rao and S. Sinharay. Amsterdam: ElsevierScience.

Wainer, H., and H. Zwerling. 2006. Logical and empiri-cal evidence that smaller schools do not improve stii-dent achievement. Vie Phi Delta Kappnii 87:300-303.

Williamson, J.G. 1991. Productivity and American lead-ership: A review article, jonrml of Economic Literature29(1 ):51-68.

For relevant Web links, consult this issue ofAmerican Scientist Online:

http://www.americanscientist.org/Issue TOC/issue/961

256 American Scientist, Volume 95

Page 9: The Most Dangerous Equation - faculty.cord.edufaculty.cord.edu/andersod/MostDangerousEquation.pdfley's equation, which indicates that the truth is estimated best when its observed