the case of arthur jensen misuses of statistics in the study of ... · new york), as the primary...

14
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ucha20 Download by: [University of Washington Libraries] Date: 12 October 2017, At: 08:44 CHANCE ISSN: 0933-2480 (Print) 1867-2280 (Online) Journal homepage: http://www.tandfonline.com/loi/ucha20 Misuses of Statistics in the Study of Intelligence: The Case of Arthur Jensen Jack Kaplan , Conor V. Dolan & Arthur R. Jensen To cite this article: Jack Kaplan , Conor V. Dolan & Arthur R. Jensen (2001) Misuses of Statistics in the Study of Intelligence: The Case of Arthur Jensen, CHANCE, 14:4, 14-26, DOI: 10.1080/09332480.2001.10542294 To link to this article: http://dx.doi.org/10.1080/09332480.2001.10542294 Published online: 20 Sep 2012. Submit your article to this journal Article views: 39 View related articles Citing articles: 2 View citing articles

Upload: others

Post on 23-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=ucha20

Download by: [University of Washington Libraries] Date: 12 October 2017, At: 08:44

CHANCE

ISSN: 0933-2480 (Print) 1867-2280 (Online) Journal homepage: http://www.tandfonline.com/loi/ucha20

Misuses of Statistics in the Study of Intelligence:The Case of Arthur Jensen

Jack Kaplan , Conor V. Dolan & Arthur R. Jensen

To cite this article: Jack Kaplan , Conor V. Dolan & Arthur R. Jensen (2001) Misuses ofStatistics in the Study of Intelligence: The Case of Arthur Jensen, CHANCE, 14:4, 14-26, DOI:10.1080/09332480.2001.10542294

To link to this article: http://dx.doi.org/10.1080/09332480.2001.10542294

Published online: 20 Sep 2012.

Submit your article to this journal

Article views: 39

View related articles

Citing articles: 2 View citing articles

Page 2: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

Statistics plays a key role in the nature/nurture debateabout intelligence.

Misuses of Statistics inthe Study of Intelligence:The Case of Arthur Jensen

Jack Kaplan Vohlf"" ~ •• Numhl'~ 3. 1'fq8

17\\ Ablex Publishing Corporation\j-\J Stamford, Connecticut

INTELLIGENCEFew scientific topics generate as muchcontroversy as the study of intelligence.Practically from the moment the firstintelligence test was published inFrance by Albert Binet in 1905, schol­ars and members of the general publichave debated what intelligence is;whether it is, in fact, reliably measuredby IQ tests; the degree to which it isgenetically determined; whether someracial, ethnic, and social class groupsarc on average more intelligent thanother groups; the relationship betweenintelligence and social problems such ascrime, poverty, and immorality; andrelated questions. These debates are, toa great degree, arguments about theproper use and interpretation of statis­tical methods.

In the United States there have beenthree periods of particularly intensedebate about intelligence. The firstsuch period occurred in the I920s,when the results of IQ tests given toArmy recruits during World War Ibecame ammunition for advocates ofimmigration restriction who argued thatimmigrants from Southern and EasternEurope were lowering the genetic qual­ity of the country's population. Thesearguments played a role - how muchof a role is a matter of dispute - in thepassage of the racist Immigration

14 VOL. 14. NO.4. 2001

Restriction Act of1924, which heav­ily favored immi-grants fromWestern andNorthern Europe(members of theso-called Nordicraces) over theallegedly inferiorAlpine andMediterraneanraces from South-ern and EasternEurope who inrecent decadeshad been comingto the country inlarge numbers.

A secondperiod of intensedebate took placein the late 1960sand the 19705,centered largelyonthe theoriesadvanced byWilliam Shockley,a Nobel prize-winning physicist whoturned to the study of intelligence laterin life, and Arthur Jensen, an educa­tional psychologist at the University ofCalifornia at Berkeley.

" Mullldlodplla...,. Jounuol

Editor: Douglas K. Detterman

Special Issue

A King Among Men;

AnhurJcnsen

The last period occurredjust recentlyfollowing the 1994 publication of TheBell Curve by Hichurd l lcrrnstcin andCharles Murray (Simon and Schuster,Ncw rork).

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 3: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

The Role of ArthurJensen

Although Shockley's writings are largelyforgotten, Jensen has continued to be acentral figure in the IQ debate up untilthe present day. His writings arc stillwidely cited, and he continues (he isnow in his late seventies) to be activelyengaged in research. Anyone makingeven a cursory examination of recenthooks and articles on intelligence willinevitably run across many citations tohis five hooks and literally hundreds ofarticles about intelligence. Herrnstcinand Murray, for example, cite his 1980book, Bias in Mental 'Testing (Free Press,New York), as the primary reference fortheir claim that certain assertions aboutintelligence - that intelligence is reli­ably measured by IQ tests, that intelli­gence is highly heritable, that IQ testsare not culturallybiased, and so forth­have been scientifically proven (or, intheir words, "arc by now beyond signif­icant technical dispute").

One indication of Jensen's contin­ued influence is that a few years ago thejournal Intelligence dedicated an issuein Jensen's honor. The issue was largelydevoted to essays effusively praisingJensen, whom the journal extolled on itstitle page as "A KingAmong Men" (seeillustration on previous page).

Jensen recently published a majorbook, The g Faaor (Praeger, West Port,C'l. 1998) and an article of his, 'ThePuzzle of Nongenetic Variance," wasincluded in a recently published collec­tion, Intelligence, Heredity, and Envi­ronment (Cambridge University Press,Cambridge, NY, 1997) edited by HobertSternberg and Elena Grigorenko, thatalso included articles by Sternberg,Howard Gardner, H. J. Eyscnck, San­dra Scarr, Hobert Plomin, John Lochlin,Thomas Bouchard, and other leadingintelligence researchers.

Jensen's prominence dates from thepublication in 1969 of a long articleabout intelligence in the Harvard Edu­cational Review, "How Much Can WeBoost IQ and ScholasticAchievement?"The article begins with the provocativesentence, "Compensatoryeducation hasbeen tried and it apparently has failed."

Jensen went on to argue that the rea­son for its failure is that it assumes,incorrectly, that environmental factorsare primarily responsible for the low

achievement levels of most childrenfrom disadvantaged families in theUnited States. On the contrary, accord­ing to Jensen, a child's potential for edu­cational achievement is largelydetermined by his or her intelligence­what some psychometricians call g ­and thatg is largelydetermined bya per­son's genes and therefore cannot be sub­stantially raised by an improvedenvironment.

Jensen estimated in his article thatthe heritability of intelligence is around80%. He quoted approvinglya commentby the psychologist Edward Thorndikein 1905 that "In the actual race of life... the chiefdetermining factor is hered­ity." Jensen wrote that "the preponder-

Although dozens (ifnot hundreds) of

articles and quite afew books have been

written criticizingJensen, his use of

statistics in particularhas not been

subjected to the sortof scrutiny given to

Herrnstein andMurray...

ance of evidence has proved[ThorndikeI right, certainly as concernsthose aspects of life in which intelli­gence plays an important part."

Jensen's article provoked a huge con­troversy, among both scholars and thegeneral public, not unlike the one thatoccurred more recently following the1994 publication of TheBellCurve. Thetwo subsequent issues of the HarvardEducational Heview were largelydevotedto responses, mostly critical, to Jensen'sarticle, and many other articles appearedin other scholarly and general interestpublications. Some of these articleswere later collected and published as abook,TheIQControversy: Critical Read­ings (Pantheon Books, New York, 1976)

edited by N. J. Rlock and GeraldDworkin.

Over the next decade or so a num­ber of books were written, at least inpart as a critical response to Jensen,including TheMismeasure ofMan (Nor­ton, New York, 1981) by Stephen Gould;The Science and Politics of IQ (L. Erl­baum Assoc, Potomac, MD, 1974) byLeon Kamin; Not in Our Genes (Pan­theon Rooks, New York, 1984) by R.Lewontin, S. Rose, and L. Kamin; Race,IQ, and Jensen (Routledge and KeganPaul, Boston, 1980) by J. R. Flynn; TheIQ Game (Rutgers University Press,New Brunswick, NJ, 1980) byH. E Tay­lor; and ArthurJensen, Consensus andControversy (Farmer Press. Philadel­phia, 1987) edited by Sogan and CecilMogdil. Jensen himself elaborated hisarguments in three books - Educabil­ity and Group Differences (Methuen,London, 1973) Bias in Mental 'Testing,and Straight Talk About Mental Tests(Free Press, New York, 1981) - anddozens of articles. (A fourth book,Geneticsand Education [Methuen, Lon­don, 1972] reprinted some of Jensen'sjournal articles, including the one fromthe 1969 Harvard Educational Review.)Jensen became such a controversial fig­ure personally that his classes and out­of-town lectures were sometimesdisrupted bydemonstrators. For a while,two plainclothes policemen wereassigned to accompany him wheneverhe walked to and from his classes.

One indication of Jensen's promi­nence is that the hereditarian school ofthought about intelligence is oftenreferred to as "[ensenism." The term iseven included in some dictionaries.

Jensen's Use of StatisticsHerrnstein and Murray's use of statis­tics has been severely criticized by anumber of scholars. In a book review ofThe Bell Curve that was published in1995 in the Journal of theAmerican Sta­tistical Association, two statisticians(Stephen Fienberg and KathrynRoeder), a historian (Daniel Resnick).and a psychiatrist (Bernie Devlin) atCarnegie Mellon University wrote thefollowing:

The Bell Curve is superficially apowerful book, with a clear mes-

CHANCE 15

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 4: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

sage forall societies. It offers intel­lectual reinforcement to manybeliefs and prejudices that con­tinue to be voiced both publiclyand privately. We believe that itshould be read, but with a skep­tical eye. Where we have delvedwith care into its arguments andanalyses, we havefound these tobedeeply flawed and misleading ...The premises of [the authors']main theme need to be stronglyqualified at the very least. What'sworse, the conclusions that theydraw from their premises are notsupported by the empirical evi­dence marshaled in their support.[Italics added] (pp. 1497-1498)

The same authors also wrote an arti­cle, Wringing The Bell Curve, in theSummer 1995 issue of Chance andedited a book, Intelligence, Genes, andSuccess: Scientists Respond to The BellCurve (Springer, New York, 1997) thatincluded a number of articles discussingthe use and (mostly) misuse of statisticsby Hermstein and Murray. Chancealsopublished another article about Herrn­stein and Murray's use of statistics, AStatistical Error in the Bell Curve, byJack Kaplan, in the Winter 1997 issue.

Although dozens (if not hundreds) ofarticles and quite a few books have beenwritten criticizing Jensen, his use of sta­tistics in particular has not been sub­jected to the sort of scrutiny given toHermstein and Murray by Fienberg,Roeder, and other statisticians, and asfar as I am aware no articles have beenpublished about Jensen intended pri­marily for an audience of statisticians.This is unfortunate becauseJensen's useof statistics is, in my opinion, at least asflawed as Herrnstein and Murray's, andthe fact that (unlike Herrnstein andMurray) he has published numerousjournal articles and several books aimedat specialists tends to give his opinionssome prima facie credibility.

In reading through Jensen's booksand some of his articles, I have numer­ous times run across statistical state­ments that are, at a minimum, highlyquestionable. Described here are foursuch statements. Not all of them con­cern issues that are important in and ofthemselves, but they all serve to illus­trate what I would argue is Jensen's ten­dency to draw clear-cut, strongly stated

16 VOL. 14. NO.4. 2001

This is a standardillustration that

differences betweenpopulations can be

entirely environmentaleven when

differences withinpopulations is largely

genetic.

conclusions from woefully inadequateevidence. The examples concern the fol­lowing questions about intelligence:

• Is the black/white difference inaverage IQ scores explained, at least inpart, by genetic differences between thepopulations?

• Is the black/white difference inaverage IQ scores explained, at least inpart, by blacks' being at greater risk ofhaving neurological "accidents" duringpregnancy and childbirth?

• Do blacks have less variability inintelligence than whites?

• Is intelligence (as distinct fromscores on intelligence tests) normallydistributed?

Jensen on GeneticsAmerican blacks on average score aboutone standard deviation lower than Amer­ican whites on most IQ tests. Whetherthis difference is at least partly geneticis probably the most controversial ques­tion raised by the study of intelligence.

Jensen's Statement

Jensen wrote in Educability and GroupDifferences that genetics most likelyaccounts for between 50% and 75% ofthe average difference between the twopopulations. As evidence he pointed tostudies of sibling regression to the pop­ulation mean. Such studies, accordingto Jensen, show that siblings of whiteshave IQ scores that regress to the meanIQ of the white population, whereas sib­lings of blacks have IQ scores thatregress to the mean of the black popu­lation. For example, if one takes a sam-

pie of blacks with IQs of 120, the aver­age IQ of their siblings will be around100;whereas fora sample of whites withIQs of 120, their siblings will have anaverage IQ of around 110. Jensen con­cludes the following:

[This result) is entirely expectedifone assumes a genetic model ofintragroup and intergroup differ­ences .... [but] seems difficult toreconcile with any strictly envi­ronmental theory ... that has yetbeen proposed. (pp. 117-118)

Discussion

Actually the result is entirely expectedeither way. Regression to the meanapplies regardless of whether the popu­lations differ for genetic or environ­mental reasons. (This point was madeby J. H. Flynn in Race, IQ, andJensen.)

Consider, for example, planting cornon two plots ofland, one with fertile soiland the other with poor soil. Assumeyou use seed from the same source forboth plots. Naturally the yield will behigher for corn planted in the fertile soil.This will he entirely due to the environ­ment (the soil) rather than genetics (theseed), since both plots use the sameseed. This is a standard illustration (usedby Herrnstein and Murray, among oth­ers) that differences between popula­tions can heentirely environmental evenwhen differences within populations islargely genetic.

Now consider what happens withregression to the mean. For purposes ofdiscussion. let us assume the averageplant from the fertile plot yields 40ounces of corn compared to 32 ouncesfor the average plant from the unfertileplot. Now take a sample of plants fromthe fertile plot that all yielded exactly40ounces of corn, and re-plant the samefield with seed from those plants. Onaverage we would expect the new plantsto again yield 40 ounces of corn.

Now take a sample of plants from theunfertile plot that also yielded exactly40ounces of corn and replant that fieldwith seed from those plants. On aver­age we would expect the new plants toyield something less than 40 ounces ofcorn, how much less depending on theheritability of yield.

Thus a plant with yield 40 ouncesfrom the fertile plot will tend to havehigher-yielding offspring than a plant

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 5: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

with yield 40 ounces from the unfcrtilcplot. despite the fact that the two pop­ulations are genetically identical. Intu­itively, the reason for this is that plantsfrom the unfertile plot that yield 40ounces of corn will mostly be geneti­cally superior plants whose superiorgenes are offset by a poor environment.Offspring of such plants will tend to beless genetically superior and thus havean average yield below 40 ounces. Plantsfrom the fertile plot that yield 40 ouncesof corn, on the other hand. will mostlybe genetically average plants. Offspringwill also tend to be average and willtherefore have the same 40-ounce aver­age yield.

Regression to different populationsmeans is not in any way inconsistentwith an environmental explanation forpopulation differences.

Jensen on NeurologicalAccidents

One possible explanation forblack/white IQ differences is that blacksmay be at higher risk for suffering neu­rological damage from adverse eventsoccurring prior to or during birth. Suchevents could result in a degree of intel­lectual impairment without being severeenough to cause a detectable abnor­mality.

There arc a number of plausible rea­sons why blacks may be more vulnera­ble than whites to such incidents ­poorer maternal health care. a higherratc of teen pregnancy. a higher preva­lence of substance abuse, greater expo­sure to lead and other environmentalpollutants, and so on. Moreover, blacksare known to experience a much higherrate than whites of measurable preg­nancy complications such as prematu­rity, low blrthwcight. and infantmortality. It would seem reasonable toexpect that whatever factors cause thesemeasurable complications may, in lesssevere form. cause undetected harmthat can impair intellectual functioninglater in life.

Jensen's Statement

To evaluate this possibility, Jensenhypothesized in Educabilit)' and GroupDifferences (p. 346) that fetal damageresults from stresses that can be mea-

sured by a variable that might be called"organismic viability" or "freedom fromimpairment." He then assumes that.above a certain threshold, values of thisvariable can lead to observable harm inan individual fetus. whereas values thatare somewhat high but below thethreshold can lead to nonobservableconsequences. including impaired men­tal capacity.

Jensen then compared nationwidefetal death rates for blacks and whites- 25.H deaths per 1,000 live births forblack." 13.3 per 1,000 for whites. Heassumes that these deaths correspondto values on the "organismic viability"variable that are above a certain thresh­old. Assuming that this variable is nor­mally distributed with equal variancesin the two populations. one can calcu­late that the average black value is .46standard deviations below the averagewhite value.

But blacks and whites differ on IQscores by a full standard deviation.Jensen concludes that reproductivecasualty can at most explain a small frac­tion of the black/white IQ difference.

Discussion

Although the basic premise of this argu­ment is plausible - factors that insevere form cause observable harm may,in less severe form, cause unobservablemental impairment - to view these fac­tors as values of a single underlying"organismic viability" quantitative vari­able is artificial and probably unhelpful.Toassume furthermore that this variableis normally distributed with equal vari­ances in both the black and white pop­ulations seems to me clearly unjustified.

Jensen on VariabilityIn most studies comparing the IQs ofblacks and whites, IQs of the black sam­ple have a smaller standard deviationthan IQs of the white sample. as well asa smaller mean.

Jensen's Statement

Jensen. in Educability and Group Dif­ferences. spent six pages discussing inconsiderable detail possible reasons forthis (pp. 211-216). He suggested andevaluated several possibilities - thatthe black population has less geneticvariability than the white population.

that blacks engage in assortative matingto a lesser extent than the white popu­lation, and that blacks experience lessvariability in their environment than thewhite population.

Discussion

The smaller standard deviation of IQscores for blacks compared to whites ismost likely nothing more than a statis­tical artifact - a consequence of thefact that IQ scores have been normal­ized on a predominantly white popula­tion. Since most blacks score lower thanthe average white score, their scores aremostly clustered in the lower half of theIQ distribution. If IQ scores were nor­malized on a predominantly black pop­ulation, scores for whites would bemostly clustered in the upper halfof thedistribution, in which case the standarddeviation for blacks would probably begreater than the standard deviation forwhites - although it's not possible toknow this for certain without knowingthe underlying distributions of each pop·ulation,

As a test of how this would work outwith real data, I analyzed scores on avocabulary test given to 2,28 I black and4,333 white high school students as partof a study conducted by the federal gov­ernment. The scores of blacks on theoriginal scale had a mean of 46. I and astandard deviation of 9.04. The scoresof whites had a mean of 55.0 and a stan­dard deviation of 9.46.

Transforming the data so that thescores of whites were normally distrib­uted with mean 100 and standard devi­ation IS, scores of blacks had a standarddeviation of 13.4. But transforming thedata so that the scores of blacks werenormally distributed with mean 100 andstandard deviation 15, the scores ofwhites had a standard deviation of 14.7.

Which population had more vari­ability thus depended, as expected, onwhich population was used for normal­izing the scores.

Jensen on NormalityEven assuming that IQ tests do measureintelligence, the question of whetherintelligence is normally distributed is dif­ferent from the question of whetherintelligence test scores are normally dis­tributed. Scores are forced to be nor-

CHANCE 17

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 6: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

mally distributed bya normalizing trans­formation. This can bedone for any con­tinuous variable and says nothing aboutthe underlying trait being measured.

Jensen's Statement

Jensen argued in his Harvard Educa­tional Review article that intelligenceitself must be approximately normallydistributed because the normalized testscores behave like an interval variable:

If we assume that intelligence is"really"normally distributed in thepopulation, and then measure itin such a way that we obtain anormal distribution of scores, ourmeasurements (JQs) can beregarded as constituting an inter-val scale. If, then, the scale in factbehaves like an interval scale,there is some justification for say­ing that intelligence itself (not justIQ) is normally distributed. Whatevidence is there of the IQsbehaving like an interval scale?The most compelling evidence, 1believe, comes from studies of theinheritance of intelligence, inwhich we examine the pattern ofintercorrelations among relativesof varying degrees of kinship.[Genetics and Education, p. 911

Jensen went on to compare the sta-tistical behavior of IQ scores of kinshippairs (e.g, parents and children, siblings,cousins, etc.) with the statistical behav­ior of heights of kinship pairs. He con­cluded that they both show regressionto the mean, with the degree of regres­sion to the mean varying with the close­ness of the biological relationship. Fromthis he concluded that IQ, like height,must be an interval variable and that thenormal distribution of lQ scores there­fore implies the normal distribution ofintelligence.

Discussion

What Jensen appears to be saying hereis that regression to the mean appliesonly to variables that are measured on aninterval scale. Since IQ scores of relatedindividuals show regression to the mean,according to this view,they must bemea­sured on an interval scale - implying,for example, that the difference in intel­ligence between a person with an IQ of100 and a person with an IQ of 105 isthe same as the difference between a

18 VOL. 14. NO.4. 2001

After going through acomplex, lengthy, and

superficiallyimpressive statisticalanalysis he reaches aclear conclusion that,

on closerconsideration, has

little or nojustification.

person with an IQ of 150 and a personwith an IQ of 155. Furthermore, if IQ isan interval variable and IQ scores arenormally distributed and IQ is a measureof intelligence, then intelligence mustbe normally distributed.

Apparently Jensen does not under­stand that regression to the mean is aproperty that holds true for any variableswhich have a bivariate normal distribu­tion, regardless of the type of variable.It makes no difference whether they areinterval variables, transformed intervalvariables, or transformed ordinal vari­ables.

The Puzzle ofNongenetic Variance

The four statisticaIerrors described pre­viously are particularly blatant and obvi­ous ones, butJensen makes many othersas well. They are part of a larger patternof careless thinking and sloppy analysis.

Perhaps Jensen's most bizarre use ofstatistics is to be found in 'The Puzzleof Nongenetic Variance," the article thatwas published a few years ago in Intel­ligence, Heredity, and Environment.

The purpose of the article was tostudy the sources of IQ variance thatcannot be explained genetically. Suchvariance is usually attributed to a per­son's social and cultural environment­such things as how a person was raisedby his or her parents, the quality andlength of a person's education, the typeof neighborhood lived in, and so on. But

IQ can also be affected hy the biologi­cal environment - such things as nutri­tion, exposure to lead, prenatal exposureto alcohol or drugs, trauma before orduring birth, and so on.

In an effort to understand the natureof this nongenetic variance. Jensen goesthrough a lengthy (I H-pagel and con­fusing statistical analysis of the IQscores of IHO pairs of identical twins.The analysis focuses on the non nor­mality of the distribution, on comparingthe variance of the lower-scoring mem­bers of the twin pairs (the IQL'sl to thevariance of the higher-scoring members(the IQH's) in different subsets of thedata, and on comparing the correlationbetween 0, the difference in IQ scoresbetween members of a twin pair, andIQL, the IQ of the lower-scoring rnem­her, to the correlation between D andIQH, the IQ of the higher-scoring mem­her. (See sidebar for details of Jensen'sanalysis.)

Following this odd and. in my opin­ion, not verymeaningful analysis, Jensensomehow manages to arrive at the fol­lowing conclusions:

The neural basis of mental devel­opment is affected in each indi­vidual by a limited number ofphysical events beginning shortlyafter conception, each with a bio­logic effect usually too small to hedetected individually ... Thesesmall biologic events are a ran­dom selection from among allsuch micro-environmental eventsthat may affect development ...Because the net deviations haveresulted from many small, inde­pendent events, they are normallydistributed in the population.

Superimposed on this normal dis­trihution of random environmen­tal effects on IQ is a distributionresulting from a small number ofcomparatively large environmen­tal effects, more often negativethan positive, that "hit"only a frac­tion of the population. They areattributable to (I) a nonrandom,stochastic snowball effect on a fewunlucky individuals ... and (2) theoccurrence of rare events withlarge effects that "hit" only a smallfraction of the population. Thecomposite of these two distribu-

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 7: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

Jensen on Nongenetic Variance

In an effort to understand the nature of the nongeneticportion of the variabil­ityin IQ scores,Jensen went through the following series of calculationsusingthe IQ scoresof 180 pairsof identical twins:

Foreach pairoftwins,Jensen calculatedthe absolutedifference(D)betweenthe two IQ scores. He then graphically compared the distributionof these dif­ferences to what wouldbe expectedunder normality and found that it deviatessubstantially from the normal.

NextJensen calculated,foreach pairof twins,the deviations of the twoindi­vidualIQ scores from the mean of the two scores. (Forexample, if the IQ's fora pairof twins are 106 and 112, Dis 6, the mean is 109, and the deviations are-3and +3.) He then calculated the kurtosisof the resulting360 deviations. Thekurtosisworksout to 3.93, substantially greaterthanthe valueof 3 fora normaldistribution.

Jensen next lookedat the varianceof IQ scores for only the lower-scoringmembersof the 180 twinpairs (IQLs) and comparedit to the varianceofscoresfor the higher-scoring members (IQHs). He calculated the ratio of these vari­ances, which comes to .99.

Foreach valueof D. Jensen then calculated the varianceof the IQLs for allpairsof twinswith that valueof D or larger. He calledthese the cumulativevari­ances of the IQLs,whichhe labeledcVIQL.Similarly, he calculatedthe cumu­lative variances (cVIQH)for the IQHs. Foreach valueofD, he then calculatedthe ratioof cVIQL to cVIQH. He found that this ratio is greater than I for val­ues of D that are 6 or largerbut less than 1 for most valuesof D that are 5 orsmaller.

Jensen then calculated the cumulativemeans of the IQLs and IQHs, anal­ogousto the cumulativevariances.These he labeledcIQL and cIQH. He thencalculatedthe correlation betweenOandcIQL (.293)andbetween Dand cIQH(-.075). He then recalculated these correlations separatelyfor twin pairs withvaluesof D that are 10 or largerand twin pairs with values of 0 that are 9 orsmall. In the firstgroup,the correlation betweenDand cIQLis -.76 and betweenD and clQH is -.45. In the second group, the corresponding correlations are.035 and .678.

The calculationsof D, cIQL, cIQH, cVIQL and cVIQH for a hypotheticalset of data are illustrated below. Jensen uses the results to argue that the dif­ferent behavior in the largeD pairs indicates largeenvironmental effects (seetext for more details). Does this follow from these calculations?

IQl IQH D c1Ql c1QH cVIQl cVIQH

93 113 20 93.0 113.0108 126 18 100.5 119.5 112.5 84.5116 130 14 105.7 123.0 136.3 79.0109 122 13 106.5 122.8 93.7 52.982 89 7 101.6 116.0 190.3 267.5120 122 2 104.7 117.0 208.7 220.0109 110 1 105.3 116.0 176.6 190.3106 106 0 105.4 114.8 151.4 175.6

Jensen's calculations fromIrJhe Puzzle of Non-GeneticVariance,· illustrated usingartificial data.Thelint two columns represent the· raw data -IQ scores for eightpairsof identical twins. IQL isthe IQ of the loweHCoring twin. IQH isthe IQ ofthe higher-scoring twin.D,clQL, clQH, cVlQL, and cVIQH are described above.

tionsof net environmental effectsforms a population distributionthat isleptokurtic, withexcess fre­quencies inthe twotails, especiallyin the tailon the negative side ...

In the picture we see emergingfrombehavior-genetic analyses ofmental abilities, the psychomet­ric construct calledg appears tobe a biological phenomenon.

This is vintage Jensen. After goingthrough a complex, lengthy, and super­ficially impressive statisticalanalysis hereaches a clear conclusion that, oncloserconsideration, has littleor nojus­tification. Statisticsbecomeslittlemorethan windowdressingforJensen'sspec­ulative theorizing.

Intensive, Detailed,Exhaustive

That, at any rate, is howI see it. Otherssee it differently.

One personwhosees it differently isThomas Bouchardof the University ofMinnesota, one of the country's mostprominent scholarsin the area of intel­ligence testing. In an article entitled"Intensive, Detailed, Exhaustive" thatappeared in the 1998 issue of Intelli­gence that was dedicated in Jensen'shonor, Bouchardhad this to say:

These three terms [intensive,detailed, exhaustive] capturemuch of the flavor of Jensen'swritings. I should also add fair­minded, temperate, and coura­geous ... Jensen's writings arevirtual tutorials on how to writescienceand howtodealwithcon­troversy - stick to the availableevidence in it's [sic] full context,carefully explain the methods,their rationale and the assump­tions,acknowledge the lackofevi­dence when it does not existandavoid ad hominem arguments.

Douglas DettermanofCaseWesternReserve University, the journal'seditor,had this to say in the same issue:

I haveneverknownanybody withfewerprejudices[thanJensen] ...Jensen has no loyalty whatsoever

CHANCE 19

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 8: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

to any theory or hypothesis evenif they come from his own ideas.He would gladly know the trutheven if it proved him wrong. Infact, he would be excited to knowthe truth.

Sandra Scarr, another prominentscholar in the field, had this to say:

Art Jensen's contributions to psy­chological science are enormous,and they continue to mount. Hiswork includes the impeccabletome on test bias, the mostthoughtful research on learningand intelligence, and some criti­cal studies on race and environ­ment. The massive body of workwill persist for generations.

Scarr dismissed Jensen critics Mar­cus Feldman, Stephen Jay Gould, and

Leon Kamin as being "thugs with pens... politically driven liars ... [and] despi­cable." Strong language indeed for ascholarly journal, and an indication ofjust how emotionally charged the wholeissue of intelligence testing hasbecome.

Should We Care?Should statisticians care about all this?

I think we should. It's bad enoughwhen statistics is misused in a book likeThe Bell Curve, written by nonspecial­ists for the general public, But when sta­tistics gets extensively misused by aprominent, widely cited scholar who isadmired practically to the point of heroworship by other prominent scholars,then clearly something is seriouslyamiss.

References and Further Reading

Gottfredson, L. S. (1998), "Jensen,jcnsenisrn, and the Sociology ofIntelligence," Intelligence, 26,291-299.

Hirsch, J. (1975), "Jensenism: TheBankruptcy of 'Science' WithoutScholarship," Educational Theory,25,3 27.

Jacoby, n., and Glauberman, N. (eds.)(1995), The Bell Curve Debate: His­tory, Documents, Opinions, NewYork: Random House.

Jensen, A. R. (1998), "Jensen on'jensenism'," Intelligence, 26,181-20H.

Kempthorne, O. (1978), "Logical,Epistemological, and Stat isticalAspects of Nature-Nurture DataInterpretation," Biometrics, 34,1-23.

Comment:Unhelpful Criticism in the Study of Intelligence

Conor V. Dolan

According to Jack Kaplan, Jensen mis­uses statistics in his studies of intelli­gence. To demonstrate this, Kaplandiscusses a number of Jensen's results.Although I welcome critical examinationof results presented in the study of intel­ligence, especially if they have a bearingon the issue of black-white differencesin IQ, I find Kaplan's criticism largelyunhelpful. First, the serious accusationof misuse is hard to judge, becauseKaplan fails to state what he means bymisuse ofstatistics. Second, Kaplan crit­icizes Jensen but at times does so With­out cogent argumentation. Finally,although this is somewhat removed fromthe criticism per se, I find Kaplan's jux­taposition of his critical comments withcomments made in a special issue ofIntelligence in honor of Jensen facile.

20 VOl.. 14. NO.4. 2001

What Is a "Misuse ofStatistics"?As the theme of Kaplan's article isJensen's misuse ofstatistics, it is unfortu­nate that Kaplan does not provide thereader with his definition of this term.Consider the following. Imagine a per­son whose job it is to sell cigarettes. Iwould consider it a misuse of statisticsif this person presents the positive cor­relation between number of cigarettessmoked a day and lungcapacity obtainedin a sample of 13- to 18-year-olds as evi­dence in favor of smoking (conditioningon age will quickly reveal the actualeffect of smoking on lung capacity).Assuming that this person is cognizantof the source of the positive correlation,he is misusing a statistic in pursuit of his

hidden agenda. It should be clear to any­one who reads Jensen carefully thatJensen docs not engage in this kind ofmisuse.

An examination of Kaplan's actual crit­icism docs not help to pinpoint his mean­ing of this term. Consider the section"Jensen on Variability."The standard devi­ations of IQ test scores in black samplesare consistently smaller that those inwhite samples. Jensen suggested a num­ber of hypotheses to account forthese dif­ferences, including differences in geneticand environmental variance and differ­ences in the degree of assortative mating.Kaplan offers the explanation that thedifferences are due to the normalizationofscores in the white population. Kaplanpresents a result in support of his hypoth­esis, but the absence of any details con-

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 9: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

cerning hiscalculations makes his resultshard to judge. However, with respect tothe problem at hand - namely the def­inition of misuse -we have this: Jensenoffers three hypotheses to account forblack/white differences in variance andKaplan offered one of his own. I fail tosee the misuse of statistics that Jensenhas perpetrated by positing his hypothe­ses. Even if Kaplan is completely rightabout the causes of the dill"erencein vari­ance, what misuse of statistics hasJensen actually committed ? And whydocs Jensen's formulation of hypothesesconstitute a "particularly blatant andobvious" error?

Next, consider the section "Jensen onNeurological Accidents." Here Jensenrelated a dichotomy (survival/deathamong infants) to a hypothetical latentvariable, which he called "organismicviability" (OV). He did this byemploy­ing the so-called liability thresholdmodel, a model that is used in geneticepidemiology (c.g., see Faraone, Tsuang,and Tsuang 1999). OV is modeled as anormal variable. An infant with a scorebeyond a certain value on the variable,the so-called threshold, dies. Jensenposited that the threshold is fixed andthat the variance of OV is identical inboth populations. He then modeledblack/white differences in infant mor­tality by positing a difference in meanon the latent OV variable. Kaplan findsthe assumption of equal variance artifi­cial. This may be so, but in a model likethe liability-threshold model one has tointroduce some constraints. One cannotallow a difference in both means andvariances because this gives rise to prob­lems of identification. Jensen chose toconsider the implications of a differencein means. The resulting model may beartilicial, but why exactly is Jensen'sapplication of it a misuse of statistics?

Unhelpful CriticismKaplan, reiterating the point made byJensen critics Flynn and Brody, explainswhy sib regression is not convincing sup­port of the plausibility of the genetichypothesis. I consider this to be con­structive criticism because Kaplanexplains why such results cannot heusedin this fashion (see also the section"Jensen on Normality"). It is, however,unfortunate that Kaplan fails to maintain

this degree of constructiveness through­out his criticism. Consider again the sec­tion "Jensen on Neurological Accidents."Kaplanviewsthe conceptualization of thelatent variable OV as a single quantitativevariableas "artificialand probably unhelp­fuL" As criticism, "probably unhelpful" isdefinitely unhelpful. Moreover, Kaplanconsiders the conceptualization of the OVvariable as a normal variable with equalvariances in the black and white popula­tions to be "clearly unjustified." As men­tioned, equal variances should beseen asa modeling constraint, which is oftenemployed in the liability-thresholdmodel.Kaplan does not explain why aspects ofthis model are "probably unhelpful" or"clearly unjustified." We see the same inKaplan's criticism of Jensen analysis ofnongenetic variance. Kaplan finds thisanalysis to he"confusing," "odd," and not"verymeaningful."AgainKaplan maywellbe right, but he makes little effort toexplain why this is so. As such, I regardthese criticisms to begratuitous and there­tore unhelpful.

Is Something SeriouslyAmiss?

Kaplan juxtaposes his cnticrsm ofJensen's work with the words of praisethat Bouchard, Detterman, and Scarrexpressed in their contributions to a spe­cial issue of Intelligence, which was ded­icated in Jensen's honor. The messageseems to be that, although Kaplan ishighly critical, prominent scholars likeBouchard, Detterman, and Scarr admireJensen to the point of hero worship.Something, therefore, is "seriouslyamiss," as Kaplan puts it. I find this facile.To explain this, I shall speak for myself.Jensen proposed in his book The g Fac­tor that g, general intelligence, is thedominant dimension along which blackand whites differ. Differences on thelatent variable g arc hypothesized toaccount for most (but not all) of the dif­ferences in psychometric IQ test scores.To investigate this hypothesis, Jensendevised the method ofcorrelated vectors,which in this context involves correlat­ing factor loading of the observed sub­test scores on g and the standardizedblack/white differences on the subtest.A high correlation is supposed to beindicative of the importance of g in

black/white differences. Jensen calledthis method "efficient, practical and sta­tistically rigorous" (jensen 2000, p. 42).I believe that this method is neither effi­cient, nor statistically rigorous (Lubke,Dolan, and Kelderman 200 I). I alsobelieve that there arc better ways toinvestigate this issue (Dolan 2000; Dolanand Hamaker 200 I). So I like to thinkthat I am critical. Still, should I agree tocontribute to a Jensen Festschrift, Iwould have no trouble in expressing myappreciation forJensen's efforts in tryingto advance the understanding of thecauses of black/white differences in IQtest scores. My point is that one can beappreciative of Jensen's contributions tothe study of intelligence but still disagreewith him on a variety of issues. I imag­ine that this is the case with Bouchard,Detterman, and Scarr (e.g., see Scarr1981, p. 515 ff.). I see no problem hereand doubt that in this respect somethingis "seriously amiss."

ConclusionIf Jack Kaplan's article results in his fel­low statisticians addressing the problemof the black/white test score gap, Ishould consider his article a great suc­cess. Few researchers are willing to con­sider this problem (jensen is certainlyan exception). In the introduction totheir recent book on the black-whiteIQ test score gap, Jencks and Phillips(1998) wrote that their book tries toupdate the reader's knowledge aboutmany aspects of the problem. However,they stated"... almost every chapterraises as many questions as it answers.This is partly because psychologists,sociologists and educational researchershave devoted far less attention to thetest score gap ... than its political andsocial consequences warranted. Mostsocial scientists have chosen safer top­ics and hoped the problem would goaway. It didn't. We can do better" (p.47). I agree: We can do better.

Note: References from this comment appear fo/­

/owingJensens comment.

[I thank Sofievan der Sluis and Ellen Hamakerfor their critical bUI constructive remarks on anearlier version or this comment. The research orCVD has been made possible by a fellowship orthe Boyal Dutch Academy or Arts and Sciences.]

CHANCE 21

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 10: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

Comment:Misleading Caricatures of Jensen's Statistics

Arthur R. Jensen

Ignoring the old-hat song-and-danceroutine that accompanies Kaplan's spe­cific complaints about my use of statis­tics, I will comment only on each of hisfour main points of contention. Becausehe frequently omits page references tohis quotations and paraphrases from mywritings, I will provide these for readersso they can see for themselves what Ihave actually said on these topics.Kaplan's critique aims to create theimpression that I have all along beensomewhere between naive and bizarrein what he purports to be my misuse ofstatistics.

Sibling regressionIn my discussion of this topic (jensen,1973,pp.117-119;1998,pp.467-472)I pointed out that a simple polygenicmodel of IQ differences that predictsthe correlations between individuals ofvariousdegrees ofgenetic kinship showsthe same sibling regression effect forboth whites and blacks, and the regres­sions are linear throughout the full rangeof IQs in both racial groups. There is nopurely environmental/cultural theorythat makes any specific prediction of thedegree of regression that would be foundfor any particular degree of kinship. Aspecific model that makes quantitativepredictions of a phenomenon (in thiscase sibling regression) is more valuablescientifically than one that can onlycome up with ad hoc explanations afterthe fact. Of course regression couldreflect eithergenetic, environmental, orerror effects; the point 1was making isthat genetic theory can make empiri­cally testable quantitative predictions,which purely environmental theories ofIQ variance cannot do. Moreover,a poly-

22 VOL. 14. NO.4. 2001

genic model that shows essentially thesame regressioneffects in representativewhite and black samples is consistentwith my"default hypothesis" - namely,that the difference between the whiteand black distributions on psychomet­ric g (t.e.rhe common factor in all cog­nitive tests) has the same genetic andenvironmental causes, in about thesame degree, as individual differenceswithin each population [Jensen 1998,chap. 12; and note my discussion (p.457) of the overworked "corn analogy"used by Kaplan]. There is no need toposit unique environmental factors foreither group. Hence there is no scion-

The point I wasmaking is that

genetic theory canmake empirically

testable quantitativepredictions, which

purely environmentaltheories of IQ

variance cannot do.

tific basis for treating members of vari­ous racial populations differently thanone would treat comparable individualswithin each population. Group differ­ences in g, according to this theory, arejust aggregated individual differences.

I refer readers to my most compre­hensive explication of Galton's so-called"lawof filialregression," in which 1state:'The phenomenon of regression is a

valid argument to support a hypothesisof genetic inheritance of a given traitonly ifthe amount of regressioniscloselyconsistent with an explicit geneticmodel that predicts the degree of regres­sion that should be theoreticallyexpected for any given degree of kin­ship.... There is nothing in the phe­nomenon of regressionpersethat proveseither genetic or environmental causesor some combination of these" (Jensen1984, p. 312). Clearly, Kaplan's com­plaint isvacuous and misleading in viewof what I have actually written aboutkinship regressions toward their popu­lation mean.

Reproductive casualtyMy literature review on this topic(Jensen 1973, pp. 341-348) shows thatthe riskof reproductive casualty (HC) ishigher in the black population than inwhites and Asians.The observed effectsof RC arc not an ali-or-none disadvan­tage but appear as a continuous vari­able. These disadvantageous prenataland perinatal effects are associated withthe mother's age, lack of prenatal care,drug abuse, immunogenic incompati­bilities between mother and fetus, pre­maturity, low birth weight, poorobstetrics, nutritional deficiencies, andthe like. Ihavesuggested that these con­ditions are among the various environ­mental factors that may adverselyaffectlater mental development. However,empirical evidence, which 1cited, alsoindicates that the frequency of neuro­logicallydetectable HC,although show­ingsignificant racialdifferences, was notgreat enough to account for more thana relativelysmall part of the average IQdifferences between the major racial

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 11: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

populations in the United States, SinceI first wrote on this subject. new evi­dence has appeared that I believestrengthens support lor the hypothesisthat a variety of prenatal and perinatalconditions are not a negligible casualfactor in f.( variance, including subpop­ulation differences in ~ (Jensen 199H,PI'. 500-509). This hypothesis. forwhich there is considerable empiricalevidence (cited by Jensen 1997, 1998),is germane to my theory of microenvi­ronmental effects on~ (discussed in thelast section) .

Variance of IQ in the blackand white populations

In 65% of 200 samples of white andblack samples that took various IQ tests,whites had the larger variance (Jensen1973, PI'. 211-216). In thc largest sam­ples, which show this variance differ­ence most clearly, both the black andwhite distributions are spread across thefull range of test scores; there appears tohe no scale artifact that constrains thevariance of the score distributions ineither sample. This phenotypic differ­ence in variances (or standard devia­tions) is of interest psychometricallybecause it enters into the calculation ofthe "effect size" (ES) of race (c.g.,black/white) on IQ scores, ES =the dif­ference between the group meansdivided by the square root of the within­groups variance. In the disputed work(jensen 1973) I pointed out that mentaltest scores, including IQ, are certainlynot a ratio scale, and may not even be aninterval scale throughout the full rangeof scores in the population. Withoutassuming approximate normality of thepopulation distribution of intelligence,IQ scores cannot be treated as other thanan ordinal, or rank-order, scale. Afterpointing out the effects of various trans­formationson the IQ distributions, I con­cluded, 'Thus, the smaller IQ varianceof Negroes than of whites could bemerely an artifact of our scale for mea­suring intelligence" (I'. 213). Then I goon to explain why this question itself(assuming an interval or ratio scale) istheoretically important for understand­ing the nature of the black-white IQ dif­ference in relation to the heritability ofIQ within each population. where a her-

itability analysis (based on various kin­ship correlations) estimates the propor­tion of the total variance of a given metrictrait into itsgenetic and nongenetic com­ponents. It is suggested that such meth­ods might help to discover the answer tothis question by testing whether IQbehaves in kinship regression analyses astheoretically would be expected if theIQ measurements were truly an intervalscale. Kaplan's claim that any transfor­mation of scale would result in the samedegree of regression toward the mean aspredicted by the genetic model is eitherunclear or incorrect. The degree ofregression predicted by the additivegenetic model originallyproposed by R.A.Fisher (1918),for example, would notpredict the same absolute values foundwith an interval scale if it were subjectedto a nonlinear transformation. Moreover,the Pearson r between relativespredictedby Fisher's genetic model, would neces­sarilybeaffected byany nonlinear trans­formations of the correlated variablesmeasured on an interval or ratio scale,such as height and weight.)

This section in my 1973 book alsodiscussed in theoretical terms thediverse possible causes - genetic, envi­ronmental, and psychometric - of adifference between populations in thevariance of a trait. There is nothing atall in this discussion that is in the leastcontradicted by anything Kaplan has tosayabout it, and the generality of his onepossible explanation for the differencein population variances is unsupportedby large studies that fail to show anyscale artifact that would restrict the IQvariance within either group, assumingan equal-interval scale throughout (seealso Jensen 1980, pp. 98-100) . Thematter could be settled definitively, ofcourse, if there existed an undisputedinterval or ratio scale of mental mea­surement. Chronometric measures,such as choice reaction time and inspec­tion time, are the only behavioral ratioscales that are significantly correlatedwith IQ (jensen 1998, chap. 8).

Normality of the IQ dis­tribution

My most comprehensive and detaileddiscussion of the normality of the IQ dis­tribution begins with the following sen-

tence: "Nothing of fundamental empir­icalor theoretical importance is revealedby the frequency distribution per se ofthe scores on any psychometric testcomposed of items. This is true regard­less of whether we are dealing with rawscores, z scores, or any otherwise trans­formed scores" (Jensen 1998, pp.100-1(3). I further explain (I) how testconstructors can manipulate themoments (i.e., mean, SD,skewness, andkurtosis) of any distribution of testscores by item selection based on itemdifficulty and inter-item and item-totalscore correlations, or simply by normal­izing the z distribution of test scores viatheir percentile ranks in the normalcurve, (2) the purely statistical advan­tages of approximatinga normal (Gauss­ian) distribution as closelyas possible forpopulation "norms," and (3) the severaltheoretical and empirical arguments forthe plausibility of assuming that thelatent trait (g) that IQ tests attempt tomeasure would approximate a normaldistribution. I believe that it soon willbe possible empirically to test the nor­mality of g in the general population bymeans of mental chronometry, based onphysical, or ratio, scales, such as reac­tion time in elementary cognitive tasks,and various physiologicalmeasurementsthat are known to be related tog (jensen1998, chap. 8).

The puzzle of nongeneticvariance

Kaplan refers to my book chapter(jensen 1997) with this title as my"mostbizarre use of statistics" and claims thatit merely serves as irrelevant "windowdressing" for the conclusions that follow.He is wrong. The analysis 1performedon the IQs of monozygotic (MZ) twinsis neither" bizarre" nor "window dress­ing" but is the mainstay of the article.The background for the analysis, spelledout in the article, happens to be one ofthe most surprising and well-establishedfindings of behavior-genetic research inrecent years - namely, that by late ado­lescence the between-families (BF)component of the nongenetic (or envi­ronmental) variance virtuallydisappears,leaving only the within-families (WF)component of nongenetic variance. TheBF variance is called the shared envi-

CHANCE 23

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 12: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

ronment because it is attributable tovariables on which families differ (socialclass, ethnicity, culture, income, diet,etc.) but is shared by individuals rearedtogether in the same family. It is theshared environment that increases sim­ilarity between siblings (or any otherchildren) who are reared together. TheWF source of variance is unshared bychildren reared together; it makes themless similar to one another. Because theBF variance dwindles to near 0 by lateadolescence, the heritability of IQ isaround 70%, and measurement error isaround 5% of the total variance, we areleft with at least 25% of the IQ varianceconsisting of unshared (WF) variance.The as-yet-unsolved puzzle is the causalnature of this substantial source of non­genetic WF variance. I decided that agood beginning point for hypothesizingon this puzzle was to suppose that thereis an indefinitely large number of advan­tageous (+) and disadvantageous (-)microenvironmental events, each ofwhich has some small effect on mentalgrowth. The hypothesis holds that theserandom microenvironmental events andtheir effects on mental development areeffectively random with respect to mostindividuals reared in the same family;the random effects are additive and nor­mally distributed, and some individualshave good luck (+) and others bad luck(-) in the cumulative effect of thesemicroenvironmental events, just as thereare winners and losers leaving a casino.

Metric data on MZ twins rearedtogether afford the most direct measureof the WF environmental variance.Because MZ twins have identical geno­types, any difference between a pair ofMZ twins reared together that is notattributable to measurement error(which can be separately taken intoaccount) is by definition a WF environ­mental difference. Therefore, I testedthe microenvironmental model by ana­lyzing the distribution of MZ twin dif­ferences on the Stanford-Binet IQ. Ifthe model were correct, these IQ dif­ferences should closely approximate thechi distribution for 1df,which is the dis­tribution of absolute differencesbetween all possible pairs of values in anormal distribution.

This analysis revealed at least twoimportant findings that could not havebeen discovered just by eyeballing a

24 VoL. 14.NO.4. 2001

large collection of MZ twin data: (I) Thechi distribution model perfectly fits onlythe 80% of twin pairs who show thesmaller IQ differences «9 points), but20% of the twin pairs departed from thechi distribution - that is, they showedlarger IQ differences than could beaccounted for by the summation ofentirely random microenvironmentaleffects - and (2) the WF environmen­tal factors in MZ twins, on average. havelarger negative effects than positiveeffects on IQ; that is, environmentaleffects are not symmetrically + or - and,at least for MZ twins, the WF environ­ment is more often harmful than bene­ficial. After presenting the evidence forthese quantitative effects in consider­able detail, I cited evidence in the liter­ature on factors in the biologicalenvironment that seemed most likely toaccount for these results and suggestedthem as hypotheses worth testing as alikely explanation for my findings. Isn'tthis how science works? I would urgereaders to consult this article itself ifthey care to know what I have done andwhy,rather than relying on Kaplan's con­fusing caricature of it, in which his side­bar is meaningless absent my article'swhole theoretical context.

References and Further Readingfrom the Comments

Brody, N. (1991), Intelligence (2nded.), San Diego: Academic Press.

Dolan, C. V. (2000), "InvestigatingSpearman's Hypothesis by Meansof Multi-group Confirmatory b'ac­tor Analysis," Multivariate Behav­ioral Research, 35, 21-50.

Dolan, C. V., and Hamaker, E. L.(200 I), "Investigating Black-WhiteDifferences in Psychometric IQ:Multi-group Confirmatory factorAnalyses of the WISC-R and K­ABC and a Critique of the Methodof Correlated Vectors," in Advancesin Psychological Research, (vol. VI),ed. F. Columbus, Commack, NY:Nova Science Publishers.

Faraone, S. V., Tsuang, M. T., andTsuang, D. W. (1999), GeneticsofMental Disorders. A Guide for Stu-

dents, Clinicians, and Researchers,New York: Cuildford Press.

Fisher, R.A. (1918), 'The CorrelationBetween Relatives on the Supposi­tion of Mendelian Inheritance."Transactions of the Royal SocietyofEdinhu~h, 52, 399-433.

Flynn, J. R. (19HO), Race, lQ andJensen, London: Routledge & KoganPaul.

Jencks, C., and Phillips, M. (1998),"The Black-White Test Score Gap:An Introduction," in TheBlack-White 'lest Score Gap, cds.C. Jencks and M. Phillips, Wash­ington DC: Brookings InstitutePress, pp. I-54.

Jensen, A. R. (1973). Educability andGroup Differences, London:Methuen.

---. (1980), Bias in Mental ]e.~t­

ing, New York: The Free Press.

--. (1984), "Law of Filial Regres­sion," in Encyclopedia of P~ychology(Vol. 2), ed. H. J. Corsini. New York:Wiley, pp. 280-281.

---. (1997), "The Puzzle of Non­genetic Variance," in Intelligence,Heredity, and Environment, cds. R.J. Sternberg and E. L. Grigorenko,Cambridge: Cambridge UniversityPress.

---. (1998), The g Factor. The Sci­ence of Mental Ahility, Westport:Pracgcr,

--. (2000), The g Factor: Psycho­metrics and Biology, In The Natureof Intelligence, cds. G. R. Bock. J. A.Goode, and K. Webh. Chichester,U.K.: Wiley, pp. 37-47.

Lubke, G. H., Dolan, C. v.. and Kcl­derrnan, H. (200n, "lnvcstigatlngGroup Differences on CognitiveTests Using Spearman's Hypothe­sis: An Evaluation of Jensen'sMethod," Multivariate BehavioralResearch, 36, 299-324.

Scurr, S. ( 1981 ), Race, Social Class andIndividual Differences in IQ, Hills­dale, N]: Lawrence Erlbaum.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 13: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

Reply

Jack Kaplan

Dolan

First of all, I want to make clear that allI mean by the term misuses of statisticsisincorrect uses of statistics. Iam not sug­gesting (as Sandra Scarr does concern­ing Stephen Gould and Leon Kamin)that Jensen is being intentionally decep­tive, only that he is mistaken.

Conor Dolan doesn't find my criti­cism helpful, but it isn't clear that heactually disagrees with any of it.

Concerning the point 1made aboutwhether blacks have less variability inintelligence than whites, he writes,"Jensen offers three hypotheses toaccount for black/white differences invariance and Kaplan offers one of hisown ... Even ifKaplaniscompletely right... what misuse of statistics has Jensenactually committed?"

Concerning the point about thedegree to which neurological accidentsprior to or during birth affect intelli­gence, he writes, "Kaplan finds theassumption of equal variance artificial.This may be so, but in a modcllike theliability-threshold model one has tointroduce some constraints ... Theresulting model may be artificial, butwhy exactly is Jensen's application of ita misuse of statistics?"

Concerning the point about non­genetic variance, he writes, "AgainKaplan may well be right, but he makeslittle effort to explain why this is so. A"such, I regard these criticisms to be gra­tuitous and unhelpful."

None of these comments impliesthat I'm actually wrong about anything.

Dolan writesabout the example I useto illustrate the point about black ver­sus white variability in intelligence,"Kaplan presents a result in support ofhis hypothesis, but the absence of any

details concerning his calculationsmakes his results hard to judge." IfDolan (or anyone else) wants furtherdetails, I will be happy to supply them.

Dolan writes, "If Jack Kaplan's arti­cle results in his fellow statisticiansaddressing the problem of theblack/white test score gap, I should con­sider his article a great success."

I agree - sort of. Both Dolan and Iwould like to see a dialogue developbetween statisticians and hereditarians.But we envision different outcomesfrom that dialogue.

I envision statisticianstelling hereditariansthat their methodsare deeply flawed

and that many of theconclusions they've

drawn are simply nottrue.

Dolan presumably envisions statisti­cians assisting hereditarians in refiningtheir methods and extending theirresults. I envision statisticians tellinghereditarians that their methods aredeeply flawed and that many of the con­clusions they've drawn are simply nottrue. Amongtheir wrongconclusions, inmy opinion, are their two most basicones - that IQ scores are a measure ofgeneral intelligence (using any reason­able definition of general intelligence)and that the heritability of IQ scores isgreater than 40%.

Jensen

Most of Jensen's comments do notaddress the specific points I made in myarticle. Instead he repeatedly digressesinto irrelevant discussions of otherissues or refers to articles or books he'swritten other than the one I was com­mentingon.

SiblingRegression

My article addresses statementsJensen made in his 1973 book, Educa­bility and Group Differences. But hisresponse relies mainly on a quote froman article he wrote in 1984 and a sum­mary of arguments he made in a bookin 1998.

My article accurately describes (andgives page references for) what Jensenwrote in 1973. What he wrote then ­that regression to the mean providesevi­dence that the black/white difference inaverage IQ is at least partly genetic ­isclearlya misinterpretation ofstatistics.Regression to the mean applies just aswell to populations that differ for envi­ronmental reasons as it does to popula­tions that differ for genetic reasons.

Reproductive Casualty

Jensen's comments are not responsive tothe point I made in my article.

1 don't disagree with the basic con­cept that "the observed effects of [repro­ductive casualty] are not an all-or-nonedisadvantage" and that the higher fetaldeath rate for blacks as compared towhites suggests that blacks mayalsosuf­fer more often from less severe problemsthat don't cause detectable harm butmay impair later mental development.What 1 disagree with is Jensen's jumpfrom population differences in fetaldeath rates to average population dff-

CHANCE 25

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Page 14: The Case of Arthur Jensen Misuses of Statistics in the Study of ... · New York), as the primary reference for theirclaim that certain assertionsabout intelligence - that intelligence

ferences on a continuous "organismicviability" variable. That jump requiresassuming that fetal death correspondsto a value above some threshold on avariable that is not only continuous butis normally distributed with equal vari­ances in both populations. That is quitean assumption!

It's true that I paraphrased Jensen onthis point rather than quoting him ver­batim. The verbatim quote (from page346 of Educability and Group Differ­ences) is

In 1965, fetal deaths (for gesta­tion periods of twenty weeks ormore) nationwide had almosttwice as high a rate amongNegroes as among whites (25.8 v.13.3 per 1000 live births) ...Assuming fetal death to be athreshold effect on a normallydis­tributed variable, the Negro andwhite populations can be said toshow a mean difference of 0.46<1on this variable. This is a largedif­ference byany standard. But evenif this variable (organismic viabil­ity, freedom from impairment, orwhatever it is) were perfectly cor­related with intelligence, it couldaccount for less than half of theNegro-white difference [italics inoriginal].

Readers can judge for themselveswhether myparaphrase ofJensen's com­ments is in any way misleading.

Variability

Jensen writes, "The generality of[Kaplan's] one possible explanation forthe difference in population variances isunsupported by large studies that fail to

show any scale artifact that wouldrestrict the IQ variance within eithergroup, assuming an equal-interval scalethroughout" [italics added].

Ifone assumes an interval scale, thenmy argument makes no sense whatso­ever. But there is, in my opinion, no rea­sonable statistical argument that IQscores constitute an interval scale.

Weight is an example of an intervalscale. The difference in weight betweentwo people who weigh 170 and 175pounds is the same as the differencebetween two people who weigh 200 and205 pounds. But is the difference in"intelligence" (or whatever it is that IQmeasures) between two people with IQscores of 100 and 105 the same as thedifference in "intelligence" between twopeople with IQ scores of 130 and 135?I think not.

Normality

My article addresses statements Jensenmade about normality in 1969 in hisfamous Harvard Educational Reviewarticle (as reprinted in Genetics andEdu­cation). But Jensen's response refersentirely to statements he made on thesubject in his 1998 book, The g Factor.

What Jensen says about normality inthe Harvard Educational Reviewarticleis nonsense. I'm not familiar with whathe says about normality in The g Factor.

Nongenetic Variance

I don't think bizarre is too strong a wordto use to describe Jensen's use of statis­tics in 'The Puzzle of Nongenetic Vari­ance." Readers who doubt this shoulddo three things:

I. Read Jensen's description of hiscomplex and confusing statistical analy-

sis of the IQ's of 180 pairs of identicaltwins - all 14 pages of it (pp. 61-74).

2. Read Jensen's conclusions (pp.80-82).

3. Ask themselves, What is the rela­tionship - if any - between the analy­sis and the conclusions?

The underlying question Jensen istrying to answer is whether nongeneticvariation of IQ scores is explained bysocial environmental factors or by bio­logical environmental factors. His sta­tistical analysis is basically a clumsyattempt to characterize the wayin whichhis dataset departs from a normal dis­tribution.

But how can departures from nor­mality possibly distinguish betweensocial and biologicalfactors? What typeof departure from normality would oneexpect from biological factors? Whattype ofdeparture would one expect fromsocial factors?

Mysuspicion is that Jensen, who hasspent more than three decades arguing(unpersuasively) that variation in intel­ligence is explained primarily by genet­ics, is pre-disposed to favor a biologicalrather than a social interpretation forthe remaining nongenetic variation. Butthe distribution of his data can beexplained just as readily by either one.His assertion that biological factors arethe probable source is nothing morethan speculative theorizing. The statis­tical analysis is just window dressing.

I join with Jensen in urging readersto consult the full article. But I doubtthat the article's "whole theoretical con­text" will make Jensen's analysis seemany more reasonable.

Upcoming in Chance-

An SAT Coaching Program thatWorks, by Jack Kaplan

Ancient Geometry Comes to theAid of Modern Medics, by Matt Reed and Vyvyan Howard

~etter~Frequency Bias in an Electronic Scrabble Game, by Charles J. Robinve

Were the 1996-2000 Yankees the Best Baseball Team Ever? by Gary A. Simon and Jeffrey S.Simonoff

26 VOL. 14. NO.4. 2001

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017