the case of arthur jensen misuses of statistics in the study of ... · new york), as the primary...

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=ucha20

Download by: [University of Washington Libraries] Date: 12 October 2017, At: 08:44

CHANCE

ISSN: 0933-2480 (Print) 1867-2280 (Online) Journal homepage: http://www.tandfonline.com/loi/ucha20

Misuses of Statistics in the Study of Intelligence:The Case of Arthur Jensen

Jack Kaplan , Conor V. Dolan & Arthur R. Jensen

To cite this article: Jack Kaplan , Conor V. Dolan & Arthur R. Jensen (2001) Misuses ofStatistics in the Study of Intelligence: The Case of Arthur Jensen, CHANCE, 14:4, 14-26, DOI:10.1080/09332480.2001.10542294

To link to this article: http://dx.doi.org/10.1080/09332480.2001.10542294

Published online: 20 Sep 2012.

Submit your article to this journal

Article views: 39

View related articles

Citing articles: 2 View citing articles

http://www.tandfonline.com/action/journalInformation?journalCode=ucha20

http://www.tandfonline.com/loi/ucha20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/09332480.2001.10542294

http://dx.doi.org/10.1080/09332480.2001.10542294

http://www.tandfonline.com/action/authorSubmission?journalCode=ucha20&show=instructions

http://www.tandfonline.com/action/authorSubmission?journalCode=ucha20&show=instructions

http://www.tandfonline.com/doi/mlt/10.1080/09332480.2001.10542294

http://www.tandfonline.com/doi/mlt/10.1080/09332480.2001.10542294

http://www.tandfonline.com/doi/citedby/10.1080/09332480.2001.10542294#tabModule

http://www.tandfonline.com/doi/citedby/10.1080/09332480.2001.10542294#tabModule

Statistics plays a key role in the nature/nurture debateabout intelligence.

Misuses of Statistics inthe Study of Intelligence:The Case of Arthur Jensen

Jack Kaplan Vohlf"" ~ •• Numhl'~ 3. 1'fq8

17\\ Ablex Publishing Corporation\j-\J Stamford, Connecticut

INTELLIGENCEFew scientific topics generate as muchcontroversy as the study of intelligence.Practically from the moment the firstintelligence test was published inFrance by Albert Binet in 1905, scholars and members of the general publichave debated what intelligence is;whether it is, in fact, reliably measuredby IQ tests; the degree to which it isgenetically determined; whether someracial, ethnic, and social class groupsarc on average more intelligent thanother groups; the relationship betweenintelligence and social problems such ascrime, poverty, and immorality; andrelated questions. These debates are, toa great degree, arguments about theproper use and interpretation of statistical methods.

In the United States there have beenthree periods of particularly intensedebate about intelligence. The firstsuch period occurred in the I920s,when the results of IQ tests given toArmy recruits during World War Ibecame ammunition for advocates ofimmigration restriction who argued thatimmigrants from Southern and EasternEurope were lowering the genetic quality of the country's population. Thesearguments played a role - how muchof a role is a matter of dispute - in thepassage of the racist Immigration

14 VOL. 14. NO.4. 2001

Restriction Act of1924, which heavily favored immi-grants fromWestern andNorthern Europe(members of theso-called Nordicraces) over theallegedly inferiorAlpine andMediterraneanraces from South-ern and EasternEurope who inrecent decadeshad been comingto the country inlarge numbers.

A secondperiod of intensedebate took placein the late 1960sand the 19705,centered largelyonthe theoriesadvanced byWilliam Shockley,a Nobel prize-winning physicist whoturned to the study of intelligence laterin life, and Arthur Jensen, an educational psychologist at the University ofCalifornia at Berkeley.

" Mullldlodplla...,. Jounuol

Editor: Douglas K. Detterman

Special Issue

A King Among Men;

AnhurJcnsen

The last period occurredjust recentlyfollowing the 1994 publication of TheBell Curve by Hichurd l lcrrnstcin andCharles Murray (Simon and Schuster,Ncw rork).

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

The Role of ArthurJensen

Although Shockley's writings are largelyforgotten, Jensen has continued to be acentral figure in the IQ debate up untilthe present day. His writings arc stillwidely cited, and he continues (he isnow in his late seventies) to be activelyengaged in research. Anyone makingeven a cursory examination of recenthooks and articles on intelligence willinevitably run across many citations tohis five hooks and literally hundreds ofarticles about intelligence. Herrnstcinand Murray, for example, cite his 1980book, Bias in Mental 'Testing (Free Press,New York), as the primary reference fortheir claim that certain assertions aboutintelligence - that intelligence is reliably measured by IQ tests, that intelligence is highly heritable, that IQ testsare not culturallybiased, and so forthhave been scientifically proven (or, intheir words, "arc by now beyond significant technical dispute").

One indication of Jensen's continued influence is that a few years ago thejournal Intelligence dedicated an issuein Jensen's honor. The issue was largelydevoted to essays effusively praisingJensen, whom the journal extolled on itstitle page as "A KingAmong Men" (seeillustration on previous page).

Jensen recently published a majorbook, The g Faaor (Praeger, West Port,C'l. 1998) and an article of his, 'ThePuzzle of Nongenetic Variance," wasincluded in a recently published collection, Intelligence, Heredity, and Environment (Cambridge University Press,Cambridge, NY, 1997) edited by HobertSternberg and Elena Grigorenko, thatalso included articles by Sternberg,Howard Gardner, H. J. Eyscnck, Sandra Scarr, Hobert Plomin, John Lochlin,Thomas Bouchard, and other leadingintelligence researchers.

Jensen's prominence dates from thepublication in 1969 of a long articleabout intelligence in the Harvard Educational Review, "How Much Can WeBoost IQ and ScholasticAchievement?"The article begins with the provocativesentence, "Compensatoryeducation hasbeen tried and it apparently has failed."

Jensen went on to argue that the reason for its failure is that it assumes,incorrectly, that environmental factorsare primarily responsible for the low

achievement levels of most childrenfrom disadvantaged families in theUnited States. On the contrary, according to Jensen, a child's potential for educational achievement is largelydetermined by his or her intelligencewhat some psychometricians call g and thatg is largelydetermined bya person's genes and therefore cannot be substantially raised by an improvedenvironment.

Jensen estimated in his article thatthe heritability of intelligence is around80%. He quoted approvinglya commentby the psychologist Edward Thorndikein 1905 that "In the actual race of life... the chiefdetermining factor is heredity." Jensen wrote that "the preponder-

Although dozens (ifnot hundreds) of

articles and quite afew books have been

written criticizingJensen, his use of

statistics in particularhas not been

subjected to the sortof scrutiny given to

Herrnstein andMurray...

ance of evidence has proved[ThorndikeI right, certainly as concernsthose aspects of life in which intelligence plays an important part."

Jensen's article provoked a huge controversy, among both scholars and thegeneral public, not unlike the one thatoccurred more recently following the1994 publication of TheBellCurve. Thetwo subsequent issues of the HarvardEducational Heview were largelydevotedto responses, mostly critical, to Jensen'sarticle, and many other articles appearedin other scholarly and general interestpublications. Some of these articleswere later collected and published as abook,TheIQControversy: Critical Readings (Pantheon Books, New York, 1976)

edited by N. J. Rlock and GeraldDworkin.

Over the next decade or so a number of books were written, at least inpart as a critical response to Jensen,including TheMismeasure ofMan (Norton, New York, 1981) by Stephen Gould;The Science and Politics of IQ (L. Erlbaum Assoc, Potomac, MD, 1974) byLeon Kamin; Not in Our Genes (Pantheon Rooks, New York, 1984) by R.Lewontin, S. Rose, and L. Kamin; Race,IQ, and Jensen (Routledge and KeganPaul, Boston, 1980) by J. R. Flynn; TheIQ Game (Rutgers University Press,New Brunswick, NJ, 1980) byH. E Taylor; and ArthurJensen, Consensus andControversy (Farmer Press. Philadelphia, 1987) edited by Sogan and CecilMogdil. Jensen himself elaborated hisarguments in three books - Educability and Group Differences (Methuen,London, 1973) Bias in Mental 'Testing,and Straight Talk About Mental Tests(Free Press, New York, 1981) - anddozens of articles. (A fourth book,Geneticsand Education [Methuen, London, 1972] reprinted some of Jensen'sjournal articles, including the one fromthe 1969 Harvard Educational Review.)Jensen became such a controversial figure personally that his classes and outof-town lectures were sometimesdisrupted bydemonstrators. For a while,two plainclothes policemen wereassigned to accompany him wheneverhe walked to and from his classes.

One indication of Jensen's prominence is that the hereditarian school ofthought about intelligence is oftenreferred to as "[ensenism." The term iseven included in some dictionaries.

Jensen's Use of StatisticsHerrnstein and Murray's use of statistics has been severely criticized by anumber of scholars. In a book review ofThe Bell Curve that was published in1995 in the Journal of theAmerican Statistical Association, two statisticians(Stephen Fienberg and KathrynRoeder), a historian (Daniel Resnick).and a psychiatrist (Bernie Devlin) atCarnegie Mellon University wrote thefollowing:

The Bell Curve is superficially apowerful book, with a clear mes-

CHANCE 15

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

sage forall societies. It offers intellectual reinforcement to manybeliefs and prejudices that continue to be voiced both publiclyand privately. We believe that itshould be read, but with a skeptical eye. Where we have delvedwith care into its arguments andanalyses, we havefound these tobedeeply flawed and misleading ...The premises of [the authors']main theme need to be stronglyqualified at the very least. What'sworse, the conclusions that theydraw from their premises are notsupported by the empirical evidence marshaled in their support.[Italics added] (pp. 1497-1498)

The same authors also wrote an article, Wringing The Bell Curve, in theSummer 1995 issue of Chance andedited a book, Intelligence, Genes, andSuccess: Scientists Respond to The BellCurve (Springer, New York, 1997) thatincluded a number of articles discussingthe use and (mostly) misuse of statisticsby Hermstein and Murray. Chancealsopublished another article about Herrnstein and Murray's use of statistics, AStatistical Error in the Bell Curve, byJack Kaplan, in the Winter 1997 issue.

Although dozens (if not hundreds) ofarticles and quite a few books have beenwritten criticizing Jensen, his use of statistics in particular has not been subjected to the sort of scrutiny given toHermstein and Murray by Fienberg,Roeder, and other statisticians, and asfar as I am aware no articles have beenpublished about Jensen intended primarily for an audience of statisticians.This is unfortunate becauseJensen's useof statistics is, in my opinion, at least asflawed as Herrnstein and Murray's, andthe fact that (unlike Herrnstein andMurray) he has published numerousjournal articles and several books aimedat specialists tends to give his opinionssome prima facie credibility.

In reading through Jensen's booksand some of his articles, I have numerous times run across statistical statements that are, at a minimum, highlyquestionable. Described here are foursuch statements. Not all of them concern issues that are important in and ofthemselves, but they all serve to illustrate what I would argue is Jensen's tendency to draw clear-cut, strongly stated

16 VOL. 14. NO.4. 2001

This is a standardillustration that

differences betweenpopulations can be

entirely environmentaleven when

differences withinpopulations is largely

genetic.

conclusions from woefully inadequateevidence. The examples concern the following questions about intelligence:

• Is the black/white difference inaverage IQ scores explained, at least inpart, by genetic differences between thepopulations?

• Is the black/white difference inaverage IQ scores explained, at least inpart, by blacks' being at greater risk ofhaving neurological "accidents" duringpregnancy and childbirth?

• Do blacks have less variability inintelligence than whites?

• Is intelligence (as distinct fromscores on intelligence tests) normallydistributed?

Jensen on GeneticsAmerican blacks on average score aboutone standard deviation lower than American whites on most IQ tests. Whetherthis difference is at least partly geneticis probably the most controversial question raised by the study of intelligence.

Jensen's Statement

Jensen wrote in Educability and GroupDifferences that genetics most likelyaccounts for between 50% and 75% ofthe average difference between the twopopulations. As evidence he pointed tostudies of sibling regression to the population mean. Such studies, accordingto Jensen, show that siblings of whiteshave IQ scores that regress to the meanIQ of the white population, whereas siblings of blacks have IQ scores thatregress to the mean of the black population. For example, if one takes a sam-

pie of blacks with IQs of 120, the average IQ of their siblings will be around100;whereas fora sample of whites withIQs of 120, their siblings will have anaverage IQ of around 110. Jensen concludes the following:

[This result) is entirely expectedifone assumes a genetic model ofintragroup and intergroup differences .... [but] seems difficult toreconcile with any strictly environmental theory ... that has yetbeen proposed. (pp. 117-118)

Discussion

Actually the result is entirely expectedeither way. Regression to the meanapplies regardless of whether the populations differ for genetic or environmental reasons. (This point was madeby J. H. Flynn in Race, IQ, andJensen.)

Consider, for example, planting cornon two plots ofland, one with fertile soiland the other with poor soil. Assumeyou use seed from the same source forboth plots. Naturally the yield will behigher for corn planted in the fertile soil.This will he entirely due to the environment (the soil) rather than genetics (theseed), since both plots use the sameseed. This is a standard illustration (usedby Herrnstein and Murray, among others) that differences between populations can heentirely environmental evenwhen differences within populations islargely genetic.

Now consider what happens withregression to the mean. For purposes ofdiscussion. let us assume the averageplant from the fertile plot yields 40ounces of corn compared to 32 ouncesfor the average plant from the unfertileplot. Now take a sample of plants fromthe fertile plot that all yielded exactly40ounces of corn, and re-plant the samefield with seed from those plants. Onaverage we would expect the new plantsto again yield 40 ounces of corn.

Now take a sample of plants from theunfertile plot that also yielded exactly40ounces of corn and replant that fieldwith seed from those plants. On average we would expect the new plants toyield something less than 40 ounces ofcorn, how much less depending on theheritability of yield.

Thus a plant with yield 40 ouncesfrom the fertile plot will tend to havehigher-yielding offspring than a plant

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

with yield 40 ounces from the unfcrtilcplot. despite the fact that the two populations are genetically identical. Intuitively, the reason for this is that plantsfrom the unfertile plot that yield 40ounces of corn will mostly be genetically superior plants whose superiorgenes are offset by a poor environment.Offspring of such plants will tend to beless genetically superior and thus havean average yield below 40 ounces. Plantsfrom the fertile plot that yield 40 ouncesof corn, on the other hand. will mostlybe genetically average plants. Offspringwill also tend to be average and willtherefore have the same 40-ounce average yield.

Regression to different populationsmeans is not in any way inconsistentwith an environmental explanation forpopulation differences.

Jensen on NeurologicalAccidents

One possible explanation forblack/white IQ differences is that blacksmay be at higher risk for suffering neurological damage from adverse eventsoccurring prior to or during birth. Suchevents could result in a degree of intellectual impairment without being severeenough to cause a detectable abnormality.

There arc a number of plausible reasons why blacks may be more vulnerable than whites to such incidents poorer maternal health care. a higherratc of teen pregnancy. a higher prevalence of substance abuse, greater exposure to lead and other environmentalpollutants, and so on. Moreover, blacksare known to experience a much higherrate than whites of measurable pregnancy complications such as prematurity, low blrthwcight. and infantmortality. It would seem reasonable toexpect that whatever factors cause thesemeasurable complications may, in lesssevere form. cause undetected harmthat can impair intellectual functioninglater in life.

Jensen's Statement

To evaluate this possibility, Jensenhypothesized in Educabilit)' and GroupDifferences (p. 346) that fetal damageresults from stresses that can be mea-

sured by a variable that might be called"organismic viability" or "freedom fromimpairment." He then assumes that.above a certain threshold, values of thisvariable can lead to observable harm inan individual fetus. whereas values thatare somewhat high but below thethreshold can lead to nonobservableconsequences. including impaired mental capacity.

Jensen then compared nationwidefetal death rates for blacks and whites- 25.H deaths per 1,000 live births forblack." 13.3 per 1,000 for whites. Heassumes that these deaths correspondto values on the "organismic viability"variable that are above a certain threshold. Assuming that this variable is normally distributed with equal variancesin the two populations. one can calculate that the average black value is .46standard deviations below the averagewhite value.

But blacks and whites differ on IQscores by a full standard deviation.Jensen concludes that reproductivecasualty can at most explain a small fraction of the black/white IQ difference.

Discussion

Although the basic premise of this argument is plausible - factors that insevere form cause observable harm may,in less severe form, cause unobservablemental impairment - to view these factors as values of a single underlying"organismic viability" quantitative variable is artificial and probably unhelpful.Toassume furthermore that this variableis normally distributed with equal variances in both the black and white populations seems to me clearly unjustified.

Jensen on VariabilityIn most studies comparing the IQs ofblacks and whites, IQs of the black sample have a smaller standard deviationthan IQs of the white sample. as well asa smaller mean.

Jensen's Statement

Jensen. in Educability and Group Differences. spent six pages discussing inconsiderable detail possible reasons forthis (pp. 211-216). He suggested andevaluated several possibilities - thatthe black population has less geneticvariability than the white population.

that blacks engage in assortative matingto a lesser extent than the white population, and that blacks experience lessvariability in their environment than thewhite population.

Discussion

The smaller standard deviation of IQscores for blacks compared to whites ismost likely nothing more than a statistical artifact - a consequence of thefact that IQ scores have been normalized on a predominantly white population. Since most blacks score lower thanthe average white score, their scores aremostly clustered in the lower half of theIQ distribution. If IQ scores were normalized on a predominantly black population, scores for whites would bemostly clustered in the upper halfof thedistribution, in which case the standarddeviation for blacks would probably begreater than the standard deviation forwhites - although it's not possible toknow this for certain without knowingthe underlying distributions of each pop·ulation,

As a test of how this would work outwith real data, I analyzed scores on avocabulary test given to 2,28 I black and4,333 white high school students as partof a study conducted by the federal government. The scores of blacks on theoriginal scale had a mean of 46. I and astandard deviation of 9.04. The scoresof whites had a mean of 55.0 and a standard deviation of 9.46.

Transforming the data so that thescores of whites were normally distributed with mean 100 and standard deviation IS, scores of blacks had a standarddeviation of 13.4. But transforming thedata so that the scores of blacks werenormally distributed with mean 100 andstandard deviation 15, the scores ofwhites had a standard deviation of 14.7.

Which population had more variability thus depended, as expected, onwhich population was used for normalizing the scores.

Jensen on NormalityEven assuming that IQ tests do measureintelligence, the question of whetherintelligence is normally distributed is different from the question of whetherintelligence test scores are normally distributed. Scores are forced to be nor-

CHANCE 17

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

mally distributed bya normalizing transformation. This can bedone for any continuous variable and says nothing aboutthe underlying trait being measured.

Jensen's Statement

Jensen argued in his Harvard Educational Review article that intelligenceitself must be approximately normallydistributed because the normalized testscores behave like an interval variable:

If we assume that intelligence is"really"normally distributed in thepopulation, and then measure itin such a way that we obtain anormal distribution of scores, ourmeasurements (JQs) can beregarded as constituting an inter-val scale. If, then, the scale in factbehaves like an interval scale,there is some justification for saying that intelligence itself (not justIQ) is normally distributed. Whatevidence is there of the IQsbehaving like an interval scale?The most compelling evidence, 1believe, comes from studies of theinheritance of intelligence, inwhich we examine the pattern ofintercorrelations among relativesof varying degrees of kinship.[Genetics and Education, p. 911

Jensen went on to compare the sta-tistical behavior of IQ scores of kinshippairs (e.g, parents and children, siblings,cousins, etc.) with the statistical behavior of heights of kinship pairs. He concluded that they both show regressionto the mean, with the degree of regression to the mean varying with the closeness of the biological relationship. Fromthis he concluded that IQ, like height,must be an interval variable and that thenormal distribution of lQ scores therefore implies the normal distribution ofintelligence.

Discussion

What Jensen appears to be saying hereis that regression to the mean appliesonly to variables that are measured on aninterval scale. Since IQ scores of relatedindividuals show regression to the mean,according to this view,they must bemeasured on an interval scale - implying,for example, that the difference in intelligence between a person with an IQ of100 and a person with an IQ of 105 isthe same as the difference between a

18 VOL. 14. NO.4. 2001

After going through acomplex, lengthy, and

superficiallyimpressive statisticalanalysis he reaches aclear conclusion that,

on closerconsideration, has

little or nojustification.

person with an IQ of 150 and a personwith an IQ of 155. Furthermore, if IQ isan interval variable and IQ scores arenormally distributed and IQ is a measureof intelligence, then intelligence mustbe normally distributed.

Apparently Jensen does not understand that regression to the mean is aproperty that holds true for any variableswhich have a bivariate normal distribution, regardless of the type of variable.It makes no difference whether they areinterval variables, transformed intervalvariables, or transformed ordinal variables.

The Puzzle ofNongenetic Variance

The four statisticaIerrors described previously are particularly blatant and obvious ones, butJensen makes many othersas well. They are part of a larger patternof careless thinking and sloppy analysis.

Perhaps Jensen's most bizarre use ofstatistics is to be found in 'The Puzzleof Nongenetic Variance," the article thatwas published a few years ago in Intelligence, Heredity, and Environment.

The purpose of the article was tostudy the sources of IQ variance thatcannot be explained genetically. Suchvariance is usually attributed to a person's social and cultural environmentsuch things as how a person was raisedby his or her parents, the quality andlength of a person's education, the typeof neighborhood lived in, and so on. But

IQ can also be affected hy the biological environment - such things as nutrition, exposure to lead, prenatal exposureto alcohol or drugs, trauma before orduring birth, and so on.

In an effort to understand the natureof this nongenetic variance. Jensen goesthrough a lengthy (I H-pagel and confusing statistical analysis of the IQscores of IHO pairs of identical twins.The analysis focuses on the non normality of the distribution, on comparingthe variance of the lower-scoring members of the twin pairs (the IQL'sl to thevariance of the higher-scoring members(the IQH's) in different subsets of thedata, and on comparing the correlationbetween 0, the difference in IQ scoresbetween members of a twin pair, andIQL, the IQ of the lower-scoring rnemher, to the correlation between D andIQH, the IQ of the higher-scoring memher. (See sidebar for details of Jensen'sanalysis.)

Following this odd and. in my opinion, not verymeaningful analysis, Jensensomehow manages to arrive at the following conclusions:

The neural basis of mental development is affected in each individual by a limited number ofphysical events beginning shortlyafter conception, each with a biologic effect usually too small to hedetected individually ... Thesesmall biologic events are a random selection from among allsuch micro-environmental eventsthat may affect development ...Because the net deviations haveresulted from many small, independent events, they are normallydistributed in the population.

Superimposed on this normal distrihution of random environmental effects on IQ is a distributionresulting from a small number ofcomparatively large environmental effects, more often negativethan positive, that "hit"only a fraction of the population. They areattributable to (I) a nonrandom,stochastic snowball effect on a fewunlucky individuals ... and (2) theoccurrence of rare events withlarge effects that "hit" only a smallfraction of the population. Thecomposite of these two distribu-

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Jensen on Nongenetic Variance

In an effort to understand the nature of the nongeneticportion of the variabilityin IQ scores,Jensen went through the following series of calculationsusingthe IQ scoresof 180 pairsof identical twins:

Foreach pairoftwins,Jensen calculatedthe absolutedifference(D)betweenthe two IQ scores. He then graphically compared the distributionof these differences to what wouldbe expectedunder normality and found that it deviatessubstantially from the normal.

NextJensen calculated,foreach pairof twins,the deviations of the twoindividualIQ scores from the mean of the two scores. (Forexample, if the IQ's fora pairof twins are 106 and 112, Dis 6, the mean is 109, and the deviations are-3and +3.) He then calculated the kurtosisof the resulting360 deviations. Thekurtosisworksout to 3.93, substantially greaterthanthe valueof 3 fora normaldistribution.

Jensen next lookedat the varianceof IQ scores for only the lower-scoringmembersof the 180 twinpairs (IQLs) and comparedit to the varianceofscoresfor the higher-scoring members (IQHs). He calculated the ratio of these variances, which comes to .99.

Foreach valueof D. Jensen then calculated the varianceof the IQLs for allpairsof twinswith that valueof D or larger. He calledthese the cumulativevariances of the IQLs,whichhe labeledcVIQL.Similarly, he calculatedthe cumulative variances (cVIQH)for the IQHs. Foreach valueofD, he then calculatedthe ratioof cVIQL to cVIQH. He found that this ratio is greater than I for values of D that are 6 or largerbut less than 1 for most valuesof D that are 5 orsmaller.

Jensen then calculated the cumulativemeans of the IQLs and IQHs, analogousto the cumulativevariances.These he labeledcIQL and cIQH. He thencalculatedthe correlation betweenOandcIQL (.293)andbetween Dand cIQH(-.075). He then recalculated these correlations separatelyfor twin pairs withvaluesof D that are 10 or largerand twin pairs with values of 0 that are 9 orsmall. In the firstgroup,the correlation betweenDand cIQLis -.76 and betweenD and clQH is -.45. In the second group, the corresponding correlations are.035 and .678.

The calculationsof D, cIQL, cIQH, cVIQL and cVIQH for a hypotheticalset of data are illustrated below. Jensen uses the results to argue that the different behavior in the largeD pairs indicates largeenvironmental effects (seetext for more details). Does this follow from these calculations?

IQl IQH D c1Ql c1QH cVIQl cVIQH

93 113 20 93.0 113.0108 126 18 100.5 119.5 112.5 84.5116 130 14 105.7 123.0 136.3 79.0109 122 13 106.5 122.8 93.7 52.982 89 7 101.6 116.0 190.3 267.5120 122 2 104.7 117.0 208.7 220.0109 110 1 105.3 116.0 176.6 190.3106 106 0 105.4 114.8 151.4 175.6

Jensen's calculations fromIrJhe Puzzle of Non-GeneticVariance,· illustrated usingartificial data.Thelint two columns represent the· raw data -IQ scores for eightpairsof identical twins. IQL isthe IQ of the loweHCoring twin. IQH isthe IQ ofthe higher-scoring twin.D,clQL, clQH, cVlQL, and cVIQH are described above.

tionsof net environmental effectsforms a population distributionthat isleptokurtic, withexcess frequencies inthe twotails, especiallyin the tailon the negative side ...

In the picture we see emergingfrombehavior-genetic analyses ofmental abilities, the psychometric construct calledg appears tobe a biological phenomenon.

This is vintage Jensen. After goingthrough a complex, lengthy, and superficially impressive statisticalanalysis hereaches a clear conclusion that, oncloserconsideration, has littleor nojustification. Statisticsbecomeslittlemorethan windowdressingforJensen'sspeculative theorizing.

Intensive, Detailed,Exhaustive

That, at any rate, is howI see it. Otherssee it differently.

One personwhosees it differently isThomas Bouchardof the University ofMinnesota, one of the country's mostprominent scholarsin the area of intelligence testing. In an article entitled"Intensive, Detailed, Exhaustive" thatappeared in the 1998 issue of Intelligence that was dedicated in Jensen'shonor, Bouchardhad this to say:

These three terms [intensive,detailed, exhaustive] capturemuch of the flavor of Jensen'swritings. I should also add fairminded, temperate, and courageous ... Jensen's writings arevirtual tutorials on how to writescienceand howtodealwithcontroversy - stick to the availableevidence in it's [sic] full context,carefully explain the methods,their rationale and the assumptions,acknowledge the lackofevidence when it does not existandavoid ad hominem arguments.

Douglas DettermanofCaseWesternReserve University, the journal'seditor,had this to say in the same issue:

I haveneverknownanybody withfewerprejudices[thanJensen] ...Jensen has no loyalty whatsoever

CHANCE 19

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

to any theory or hypothesis evenif they come from his own ideas.He would gladly know the trutheven if it proved him wrong. Infact, he would be excited to knowthe truth.

Sandra Scarr, another prominentscholar in the field, had this to say:

Art Jensen's contributions to psychological science are enormous,and they continue to mount. Hiswork includes the impeccabletome on test bias, the mostthoughtful research on learningand intelligence, and some critical studies on race and environment. The massive body of workwill persist for generations.

Scarr dismissed Jensen critics Marcus Feldman, Stephen Jay Gould, and

Leon Kamin as being "thugs with pens... politically driven liars ... [and] despicable." Strong language indeed for ascholarly journal, and an indication ofjust how emotionally charged the wholeissue of intelligence testing hasbecome.

Should We Care?Should statisticians care about all this?

I think we should. It's bad enoughwhen statistics is misused in a book likeThe Bell Curve, written by nonspecialists for the general public, But when statistics gets extensively misused by aprominent, widely cited scholar who isadmired practically to the point of heroworship by other prominent scholars,then clearly something is seriouslyamiss.

References and Further Reading

Gottfredson, L. S. (1998), "Jensen,jcnsenisrn, and the Sociology ofIntelligence," Intelligence, 26,291-299.

Hirsch, J. (1975), "Jensenism: TheBankruptcy of 'Science' WithoutScholarship," Educational Theory,25,3 27.

Jacoby, n., and Glauberman, N. (eds.)(1995), The Bell Curve Debate: History, Documents, Opinions, NewYork: Random House.

Jensen, A. R. (1998), "Jensen on'jensenism'," Intelligence, 26,181-20H.

Kempthorne, O. (1978), "Logical,Epistemological, and Stat isticalAspects of Nature-Nurture DataInterpretation," Biometrics, 34,1-23.

Comment:Unhelpful Criticism in the Study of Intelligence

Conor V. Dolan

According to Jack Kaplan, Jensen misuses statistics in his studies of intelligence. To demonstrate this, Kaplandiscusses a number of Jensen's results.Although I welcome critical examinationof results presented in the study of intelligence, especially if they have a bearingon the issue of black-white differencesin IQ, I find Kaplan's criticism largelyunhelpful. First, the serious accusationof misuse is hard to judge, becauseKaplan fails to state what he means bymisuse ofstatistics. Second, Kaplan criticizes Jensen but at times does so Without cogent argumentation. Finally,although this is somewhat removed fromthe criticism per se, I find Kaplan's juxtaposition of his critical comments withcomments made in a special issue ofIntelligence in honor of Jensen facile.

20 VOl.. 14. NO.4. 2001

What Is a "Misuse ofStatistics"?As the theme of Kaplan's article isJensen's misuse ofstatistics, it is unfortunate that Kaplan does not provide thereader with his definition of this term.Consider the following. Imagine a person whose job it is to sell cigarettes. Iwould consider it a misuse of statisticsif this person presents the positive correlation between number of cigarettessmoked a day and lungcapacity obtainedin a sample of 13- to 18-year-olds as evidence in favor of smoking (conditioningon age will quickly reveal the actualeffect of smoking on lung capacity).Assuming that this person is cognizantof the source of the positive correlation,he is misusing a statistic in pursuit of his

hidden agenda. It should be clear to anyone who reads Jensen carefully thatJensen docs not engage in this kind ofmisuse.

An examination of Kaplan's actual criticism docs not help to pinpoint his meaning of this term. Consider the section"Jensen on Variability."The standard deviations of IQ test scores in black samplesare consistently smaller that those inwhite samples. Jensen suggested a number of hypotheses to account forthese differences, including differences in geneticand environmental variance and differences in the degree of assortative mating.Kaplan offers the explanation that thedifferences are due to the normalizationofscores in the white population. Kaplanpresents a result in support of his hypothesis, but the absence of any details con-

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

cerning hiscalculations makes his resultshard to judge. However, with respect tothe problem at hand - namely the definition of misuse -we have this: Jensenoffers three hypotheses to account forblack/white differences in variance andKaplan offered one of his own. I fail tosee the misuse of statistics that Jensenhas perpetrated by positing his hypotheses. Even if Kaplan is completely rightabout the causes of the dill"erencein variance, what misuse of statistics hasJensen actually committed ? And whydocs Jensen's formulation of hypothesesconstitute a "particularly blatant andobvious" error?

Next, consider the section "Jensen onNeurological Accidents." Here Jensenrelated a dichotomy (survival/deathamong infants) to a hypothetical latentvariable, which he called "organismicviability" (OV). He did this byemploying the so-called liability thresholdmodel, a model that is used in geneticepidemiology (c.g., see Faraone, Tsuang,and Tsuang 1999). OV is modeled as anormal variable. An infant with a scorebeyond a certain value on the variable,the so-called threshold, dies. Jensenposited that the threshold is fixed andthat the variance of OV is identical inboth populations. He then modeledblack/white differences in infant mortality by positing a difference in meanon the latent OV variable. Kaplan findsthe assumption of equal variance artificial. This may be so, but in a model likethe liability-threshold model one has tointroduce some constraints. One cannotallow a difference in both means andvariances because this gives rise to problems of identification. Jensen chose toconsider the implications of a differencein means. The resulting model may beartilicial, but why exactly is Jensen'sapplication of it a misuse of statistics?

Unhelpful CriticismKaplan, reiterating the point made byJensen critics Flynn and Brody, explainswhy sib regression is not convincing support of the plausibility of the genetichypothesis. I consider this to be constructive criticism because Kaplanexplains why such results cannot heusedin this fashion (see also the section"Jensen on Normality"). It is, however,unfortunate that Kaplan fails to maintain

this degree of constructiveness throughout his criticism. Consider again the section "Jensen on Neurological Accidents."Kaplanviewsthe conceptualization of thelatent variable OV as a single quantitativevariableas "artificialand probably unhelpfuL" As criticism, "probably unhelpful" isdefinitely unhelpful. Moreover, Kaplanconsiders the conceptualization of the OVvariable as a normal variable with equalvariances in the black and white populations to be "clearly unjustified." As mentioned, equal variances should beseen asa modeling constraint, which is oftenemployed in the liability-thresholdmodel.Kaplan does not explain why aspects ofthis model are "probably unhelpful" or"clearly unjustified." We see the same inKaplan's criticism of Jensen analysis ofnongenetic variance. Kaplan finds thisanalysis to he"confusing," "odd," and not"verymeaningful."AgainKaplan maywellbe right, but he makes little effort toexplain why this is so. As such, I regardthese criticisms to begratuitous and theretore unhelpful.

Is Something SeriouslyAmiss?

Kaplan juxtaposes his cnticrsm ofJensen's work with the words of praisethat Bouchard, Detterman, and Scarrexpressed in their contributions to a special issue of Intelligence, which was dedicated in Jensen's honor. The messageseems to be that, although Kaplan ishighly critical, prominent scholars likeBouchard, Detterman, and Scarr admireJensen to the point of hero worship.Something, therefore, is "seriouslyamiss," as Kaplan puts it. I find this facile.To explain this, I shall speak for myself.Jensen proposed in his book The g Factor that g, general intelligence, is thedominant dimension along which blackand whites differ. Differences on thelatent variable g arc hypothesized toaccount for most (but not all) of the differences in psychometric IQ test scores.To investigate this hypothesis, Jensendevised the method ofcorrelated vectors,which in this context involves correlating factor loading of the observed subtest scores on g and the standardizedblack/white differences on the subtest.A high correlation is supposed to beindicative of the importance of g in

black/white differences. Jensen calledthis method "efficient, practical and statistically rigorous" (jensen 2000, p. 42).I believe that this method is neither efficient, nor statistically rigorous (Lubke,Dolan, and Kelderman 200 I). I alsobelieve that there arc better ways toinvestigate this issue (Dolan 2000; Dolanand Hamaker 200 I). So I like to thinkthat I am critical. Still, should I agree tocontribute to a Jensen Festschrift, Iwould have no trouble in expressing myappreciation forJensen's efforts in tryingto advance the understanding of thecauses of black/white differences in IQtest scores. My point is that one can beappreciative of Jensen's contributions tothe study of intelligence but still disagreewith him on a variety of issues. I imagine that this is the case with Bouchard,Detterman, and Scarr (e.g., see Scarr1981, p. 515 ff.). I see no problem hereand doubt that in this respect somethingis "seriously amiss."

ConclusionIf Jack Kaplan's article results in his fellow statisticians addressing the problemof the black/white test score gap, Ishould consider his article a great success. Few researchers are willing to consider this problem (jensen is certainlyan exception). In the introduction totheir recent book on the black-whiteIQ test score gap, Jencks and Phillips(1998) wrote that their book tries toupdate the reader's knowledge aboutmany aspects of the problem. However,they stated"... almost every chapterraises as many questions as it answers.This is partly because psychologists,sociologists and educational researchershave devoted far less attention to thetest score gap ... than its political andsocial consequences warranted. Mostsocial scientists have chosen safer topics and hoped the problem would goaway. It didn't. We can do better" (p.47). I agree: We can do better.

Note: References from this comment appear fo/

/owingJensens comment.

[I thank Sofievan der Sluis and Ellen Hamakerfor their critical bUI constructive remarks on anearlier version or this comment. The research orCVD has been made possible by a fellowship orthe Boyal Dutch Academy or Arts and Sciences.]

CHANCE 21

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Comment:Misleading Caricatures of Jensen's Statistics

Arthur R. Jensen

Ignoring the old-hat song-and-danceroutine that accompanies Kaplan's specific complaints about my use of statistics, I will comment only on each of hisfour main points of contention. Becausehe frequently omits page references tohis quotations and paraphrases from mywritings, I will provide these for readersso they can see for themselves what Ihave actually said on these topics.Kaplan's critique aims to create theimpression that I have all along beensomewhere between naive and bizarrein what he purports to be my misuse ofstatistics.

Sibling regressionIn my discussion of this topic (jensen,1973,pp.117-119;1998,pp.467-472)I pointed out that a simple polygenicmodel of IQ differences that predictsthe correlations between individuals ofvariousdegrees ofgenetic kinship showsthe same sibling regression effect forboth whites and blacks, and the regressions are linear throughout the full rangeof IQs in both racial groups. There is nopurely environmental/cultural theorythat makes any specific prediction of thedegree of regression that would be foundfor any particular degree of kinship. Aspecific model that makes quantitativepredictions of a phenomenon (in thiscase sibling regression) is more valuablescientifically than one that can onlycome up with ad hoc explanations afterthe fact. Of course regression couldreflect eithergenetic, environmental, orerror effects; the point 1was making isthat genetic theory can make empirically testable quantitative predictions,which purely environmental theories ofIQ variance cannot do. Moreover,a poly-

22 VOL. 14. NO.4. 2001

genic model that shows essentially thesame regressioneffects in representativewhite and black samples is consistentwith my"default hypothesis" - namely,that the difference between the whiteand black distributions on psychometric g (t.e.rhe common factor in all cognitive tests) has the same genetic andenvironmental causes, in about thesame degree, as individual differenceswithin each population [Jensen 1998,chap. 12; and note my discussion (p.457) of the overworked "corn analogy"used by Kaplan]. There is no need toposit unique environmental factors foreither group. Hence there is no scion-

The point I wasmaking is that

genetic theory canmake empirically

testable quantitativepredictions, which

purely environmentaltheories of IQ

variance cannot do.

tific basis for treating members of various racial populations differently thanone would treat comparable individualswithin each population. Group differences in g, according to this theory, arejust aggregated individual differences.

I refer readers to my most comprehensive explication of Galton's so-called"lawof filialregression," in which 1state:'The phenomenon of regression is a

valid argument to support a hypothesisof genetic inheritance of a given traitonly ifthe amount of regressioniscloselyconsistent with an explicit geneticmodel that predicts the degree of regression that should be theoreticallyexpected for any given degree of kinship.... There is nothing in the phenomenon of regressionpersethat proveseither genetic or environmental causesor some combination of these" (Jensen1984, p. 312). Clearly, Kaplan's complaint isvacuous and misleading in viewof what I have actually written aboutkinship regressions toward their population mean.

Reproductive casualtyMy literature review on this topic(Jensen 1973, pp. 341-348) shows thatthe riskof reproductive casualty (HC) ishigher in the black population than inwhites and Asians.The observed effectsof RC arc not an ali-or-none disadvantage but appear as a continuous variable. These disadvantageous prenataland perinatal effects are associated withthe mother's age, lack of prenatal care,drug abuse, immunogenic incompatibilities between mother and fetus, prematurity, low birth weight, poorobstetrics, nutritional deficiencies, andthe like. Ihavesuggested that these conditions are among the various environmental factors that may adverselyaffectlater mental development. However,empirical evidence, which 1cited, alsoindicates that the frequency of neurologicallydetectable HC,although showingsignificant racialdifferences, was notgreat enough to account for more thana relativelysmall part of the average IQdifferences between the major racial

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

populations in the United States, SinceI first wrote on this subject. new evidence has appeared that I believestrengthens support lor the hypothesisthat a variety of prenatal and perinatalconditions are not a negligible casualfactor in f.( variance, including subpopulation differences in ~ (Jensen 199H,PI'. 500-509). This hypothesis. forwhich there is considerable empiricalevidence (cited by Jensen 1997, 1998),is germane to my theory of microenvironmental effects on~ (discussed in thelast section) .

Variance of IQ in the blackand white populations

In 65% of 200 samples of white andblack samples that took various IQ tests,whites had the larger variance (Jensen1973, PI'. 211-216). In thc largest samples, which show this variance difference most clearly, both the black andwhite distributions are spread across thefull range of test scores; there appears tohe no scale artifact that constrains thevariance of the score distributions ineither sample. This phenotypic difference in variances (or standard deviations) is of interest psychometricallybecause it enters into the calculation ofthe "effect size" (ES) of race (c.g.,black/white) on IQ scores, ES =the difference between the group meansdivided by the square root of the withingroups variance. In the disputed work(jensen 1973) I pointed out that mentaltest scores, including IQ, are certainlynot a ratio scale, and may not even be aninterval scale throughout the full rangeof scores in the population. Withoutassuming approximate normality of thepopulation distribution of intelligence,IQ scores cannot be treated as other thanan ordinal, or rank-order, scale. Afterpointing out the effects of various transformationson the IQ distributions, I concluded, 'Thus, the smaller IQ varianceof Negroes than of whites could bemerely an artifact of our scale for measuring intelligence" (I'. 213). Then I goon to explain why this question itself(assuming an interval or ratio scale) istheoretically important for understanding the nature of the black-white IQ difference in relation to the heritability ofIQ within each population. where a her-

itability analysis (based on various kinship correlations) estimates the proportion of the total variance of a given metrictrait into itsgenetic and nongenetic components. It is suggested that such methods might help to discover the answer tothis question by testing whether IQbehaves in kinship regression analyses astheoretically would be expected if theIQ measurements were truly an intervalscale. Kaplan's claim that any transformation of scale would result in the samedegree of regression toward the mean aspredicted by the genetic model is eitherunclear or incorrect. The degree ofregression predicted by the additivegenetic model originallyproposed by R.A.Fisher (1918),for example, would notpredict the same absolute values foundwith an interval scale if it were subjectedto a nonlinear transformation. Moreover,the Pearson r between relativespredictedby Fisher's genetic model, would necessarilybeaffected byany nonlinear transformations of the correlated variablesmeasured on an interval or ratio scale,such as height and weight.)

This section in my 1973 book alsodiscussed in theoretical terms thediverse possible causes - genetic, environmental, and psychometric - of adifference between populations in thevariance of a trait. There is nothing atall in this discussion that is in the leastcontradicted by anything Kaplan has tosayabout it, and the generality of his onepossible explanation for the differencein population variances is unsupportedby large studies that fail to show anyscale artifact that would restrict the IQvariance within either group, assumingan equal-interval scale throughout (seealso Jensen 1980, pp. 98-100) . Thematter could be settled definitively, ofcourse, if there existed an undisputedinterval or ratio scale of mental measurement. Chronometric measures,such as choice reaction time and inspection time, are the only behavioral ratioscales that are significantly correlatedwith IQ (jensen 1998, chap. 8).

Normality of the IQ distribution

My most comprehensive and detaileddiscussion of the normality of the IQ distribution begins with the following sen-

tence: "Nothing of fundamental empiricalor theoretical importance is revealedby the frequency distribution per se ofthe scores on any psychometric testcomposed of items. This is true regardless of whether we are dealing with rawscores, z scores, or any otherwise transformed scores" (Jensen 1998, pp.100-1(3). I further explain (I) how testconstructors can manipulate themoments (i.e., mean, SD,skewness, andkurtosis) of any distribution of testscores by item selection based on itemdifficulty and inter-item and item-totalscore correlations, or simply by normalizing the z distribution of test scores viatheir percentile ranks in the normalcurve, (2) the purely statistical advantages of approximatinga normal (Gaussian) distribution as closelyas possible forpopulation "norms," and (3) the severaltheoretical and empirical arguments forthe plausibility of assuming that thelatent trait (g) that IQ tests attempt tomeasure would approximate a normaldistribution. I believe that it soon willbe possible empirically to test the normality of g in the general population bymeans of mental chronometry, based onphysical, or ratio, scales, such as reaction time in elementary cognitive tasks,and various physiologicalmeasurementsthat are known to be related tog (jensen1998, chap. 8).

The puzzle of nongeneticvariance

Kaplan refers to my book chapter(jensen 1997) with this title as my"mostbizarre use of statistics" and claims thatit merely serves as irrelevant "windowdressing" for the conclusions that follow.He is wrong. The analysis 1performedon the IQs of monozygotic (MZ) twinsis neither" bizarre" nor "window dressing" but is the mainstay of the article.The background for the analysis, spelledout in the article, happens to be one ofthe most surprising and well-establishedfindings of behavior-genetic research inrecent years - namely, that by late adolescence the between-families (BF)component of the nongenetic (or environmental) variance virtuallydisappears,leaving only the within-families (WF)component of nongenetic variance. TheBF variance is called the shared envi-

CHANCE 23

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

ronment because it is attributable tovariables on which families differ (socialclass, ethnicity, culture, income, diet,etc.) but is shared by individuals rearedtogether in the same family. It is theshared environment that increases similarity between siblings (or any otherchildren) who are reared together. TheWF source of variance is unshared bychildren reared together; it makes themless similar to one another. Because theBF variance dwindles to near 0 by lateadolescence, the heritability of IQ isaround 70%, and measurement error isaround 5% of the total variance, we areleft with at least 25% of the IQ varianceconsisting of unshared (WF) variance.The as-yet-unsolved puzzle is the causalnature of this substantial source of nongenetic WF variance. I decided that agood beginning point for hypothesizingon this puzzle was to suppose that thereis an indefinitely large number of advantageous (+) and disadvantageous (-)microenvironmental events, each ofwhich has some small effect on mentalgrowth. The hypothesis holds that theserandom microenvironmental events andtheir effects on mental development areeffectively random with respect to mostindividuals reared in the same family;the random effects are additive and normally distributed, and some individualshave good luck (+) and others bad luck(-) in the cumulative effect of thesemicroenvironmental events, just as thereare winners and losers leaving a casino.

Metric data on MZ twins rearedtogether afford the most direct measureof the WF environmental variance.Because MZ twins have identical genotypes, any difference between a pair ofMZ twins reared together that is notattributable to measurement error(which can be separately taken intoaccount) is by definition a WF environmental difference. Therefore, I testedthe microenvironmental model by analyzing the distribution of MZ twin differences on the Stanford-Binet IQ. Ifthe model were correct, these IQ differences should closely approximate thechi distribution for 1df,which is the distribution of absolute differencesbetween all possible pairs of values in anormal distribution.

This analysis revealed at least twoimportant findings that could not havebeen discovered just by eyeballing a

24 VoL. 14.NO.4. 2001

large collection of MZ twin data: (I) Thechi distribution model perfectly fits onlythe 80% of twin pairs who show thesmaller IQ differences «9 points), but20% of the twin pairs departed from thechi distribution - that is, they showedlarger IQ differences than could beaccounted for by the summation ofentirely random microenvironmentaleffects - and (2) the WF environmental factors in MZ twins, on average. havelarger negative effects than positiveeffects on IQ; that is, environmentaleffects are not symmetrically + or - and,at least for MZ twins, the WF environment is more often harmful than beneficial. After presenting the evidence forthese quantitative effects in considerable detail, I cited evidence in the literature on factors in the biologicalenvironment that seemed most likely toaccount for these results and suggestedthem as hypotheses worth testing as alikely explanation for my findings. Isn'tthis how science works? I would urgereaders to consult this article itself ifthey care to know what I have done andwhy,rather than relying on Kaplan's confusing caricature of it, in which his sidebar is meaningless absent my article'swhole theoretical context.

References and Further Readingfrom the Comments

Brody, N. (1991), Intelligence (2nded.), San Diego: Academic Press.

Dolan, C. V. (2000), "InvestigatingSpearman's Hypothesis by Meansof Multi-group Confirmatory b'actor Analysis," Multivariate Behavioral Research, 35, 21-50.

Dolan, C. V., and Hamaker, E. L.(200 I), "Investigating Black-WhiteDifferences in Psychometric IQ:Multi-group Confirmatory factorAnalyses of the WISC-R and KABC and a Critique of the Methodof Correlated Vectors," in Advancesin Psychological Research, (vol. VI),ed. F. Columbus, Commack, NY:Nova Science Publishers.

Faraone, S. V., Tsuang, M. T., andTsuang, D. W. (1999), GeneticsofMental Disorders. A Guide for Stu-

dents, Clinicians, and Researchers,New York: Cuildford Press.

Fisher, R.A. (1918), 'The CorrelationBetween Relatives on the Supposition of Mendelian Inheritance."Transactions of the Royal SocietyofEdinhu~h, 52, 399-433.

Flynn, J. R. (19HO), Race, lQ andJensen, London: Routledge & KoganPaul.

Jencks, C., and Phillips, M. (1998),"The Black-White Test Score Gap:An Introduction," in TheBlack-White 'lest Score Gap, cds.C. Jencks and M. Phillips, Washington DC: Brookings InstitutePress, pp. I-54.

Jensen, A. R. (1973). Educability andGroup Differences, London:Methuen.

---. (1980), Bias in Mental ]e.~t

ing, New York: The Free Press.

--. (1984), "Law of Filial Regression," in Encyclopedia of P~ychology(Vol. 2), ed. H. J. Corsini. New York:Wiley, pp. 280-281.

---. (1997), "The Puzzle of Nongenetic Variance," in Intelligence,Heredity, and Environment, cds. R.J. Sternberg and E. L. Grigorenko,Cambridge: Cambridge UniversityPress.

---. (1998), The g Factor. The Science of Mental Ahility, Westport:Pracgcr,

--. (2000), The g Factor: Psychometrics and Biology, In The Natureof Intelligence, cds. G. R. Bock. J. A.Goode, and K. Webh. Chichester,U.K.: Wiley, pp. 37-47.

Lubke, G. H., Dolan, C. v.. and Kclderrnan, H. (200n, "lnvcstigatlngGroup Differences on CognitiveTests Using Spearman's Hypothesis: An Evaluation of Jensen'sMethod," Multivariate BehavioralResearch, 36, 299-324.

Scurr, S. ( 1981 ), Race, Social Class andIndividual Differences in IQ, Hillsdale, N]: Lawrence Erlbaum.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

Reply

Jack Kaplan

Dolan

First of all, I want to make clear that allI mean by the term misuses of statisticsisincorrect uses of statistics. Iam not suggesting (as Sandra Scarr does concerning Stephen Gould and Leon Kamin)that Jensen is being intentionally deceptive, only that he is mistaken.

Conor Dolan doesn't find my criticism helpful, but it isn't clear that heactually disagrees with any of it.

Concerning the point 1made aboutwhether blacks have less variability inintelligence than whites, he writes,"Jensen offers three hypotheses toaccount for black/white differences invariance and Kaplan offers one of hisown ... Even ifKaplaniscompletely right... what misuse of statistics has Jensenactually committed?"

Concerning the point about thedegree to which neurological accidentsprior to or during birth affect intelligence, he writes, "Kaplan finds theassumption of equal variance artificial.This may be so, but in a modcllike theliability-threshold model one has tointroduce some constraints ... Theresulting model may be artificial, butwhy exactly is Jensen's application of ita misuse of statistics?"

Concerning the point about nongenetic variance, he writes, "AgainKaplan may well be right, but he makeslittle effort to explain why this is so. A"such, I regard these criticisms to be gratuitous and unhelpful."

None of these comments impliesthat I'm actually wrong about anything.

Dolan writesabout the example I useto illustrate the point about black versus white variability in intelligence,"Kaplan presents a result in support ofhis hypothesis, but the absence of any

details concerning his calculationsmakes his results hard to judge." IfDolan (or anyone else) wants furtherdetails, I will be happy to supply them.

Dolan writes, "If Jack Kaplan's article results in his fellow statisticiansaddressing the problem of theblack/white test score gap, I should consider his article a great success."

I agree - sort of. Both Dolan and Iwould like to see a dialogue developbetween statisticians and hereditarians.But we envision different outcomesfrom that dialogue.

I envision statisticianstelling hereditariansthat their methodsare deeply flawed

and that many of theconclusions they've

drawn are simply nottrue.

Dolan presumably envisions statisticians assisting hereditarians in refiningtheir methods and extending theirresults. I envision statisticians tellinghereditarians that their methods aredeeply flawed and that many of the conclusions they've drawn are simply nottrue. Amongtheir wrongconclusions, inmy opinion, are their two most basicones - that IQ scores are a measure ofgeneral intelligence (using any reasonable definition of general intelligence)and that the heritability of IQ scores isgreater than 40%.

Jensen

Most of Jensen's comments do notaddress the specific points I made in myarticle. Instead he repeatedly digressesinto irrelevant discussions of otherissues or refers to articles or books he'swritten other than the one I was commentingon.

SiblingRegression

My article addresses statementsJensen made in his 1973 book, Educability and Group Differences. But hisresponse relies mainly on a quote froman article he wrote in 1984 and a summary of arguments he made in a bookin 1998.

My article accurately describes (andgives page references for) what Jensenwrote in 1973. What he wrote then that regression to the mean providesevidence that the black/white difference inaverage IQ is at least partly genetic isclearlya misinterpretation ofstatistics.Regression to the mean applies just aswell to populations that differ for environmental reasons as it does to populations that differ for genetic reasons.

Reproductive Casualty

Jensen's comments are not responsive tothe point I made in my article.

1 don't disagree with the basic concept that "the observed effects of [reproductive casualty] are not an all-or-nonedisadvantage" and that the higher fetaldeath rate for blacks as compared towhites suggests that blacks mayalsosuffer more often from less severe problemsthat don't cause detectable harm butmay impair later mental development.What 1 disagree with is Jensen's jumpfrom population differences in fetaldeath rates to average population dff-

CHANCE 25

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017

ferences on a continuous "organismicviability" variable. That jump requiresassuming that fetal death correspondsto a value above some threshold on avariable that is not only continuous butis normally distributed with equal variances in both populations. That is quitean assumption!

It's true that I paraphrased Jensen onthis point rather than quoting him verbatim. The verbatim quote (from page346 of Educability and Group Differences) is

In 1965, fetal deaths (for gestation periods of twenty weeks ormore) nationwide had almosttwice as high a rate amongNegroes as among whites (25.8 v.13.3 per 1000 live births) ...Assuming fetal death to be athreshold effect on a normallydistributed variable, the Negro andwhite populations can be said toshow a mean difference of 0.46<1on this variable. This is a largedifference byany standard. But evenif this variable (organismic viability, freedom from impairment, orwhatever it is) were perfectly correlated with intelligence, it couldaccount for less than half of theNegro-white difference [italics inoriginal].

Readers can judge for themselveswhether myparaphrase ofJensen's comments is in any way misleading.

Variability

Jensen writes, "The generality of[Kaplan's] one possible explanation forthe difference in population variances isunsupported by large studies that fail to

show any scale artifact that wouldrestrict the IQ variance within eithergroup, assuming an equal-interval scalethroughout" [italics added].

Ifone assumes an interval scale, thenmy argument makes no sense whatsoever. But there is, in my opinion, no reasonable statistical argument that IQscores constitute an interval scale.

Weight is an example of an intervalscale. The difference in weight betweentwo people who weigh 170 and 175pounds is the same as the differencebetween two people who weigh 200 and205 pounds. But is the difference in"intelligence" (or whatever it is that IQmeasures) between two people with IQscores of 100 and 105 the same as thedifference in "intelligence" between twopeople with IQ scores of 130 and 135?I think not.

Normality

My article addresses statements Jensenmade about normality in 1969 in hisfamous Harvard Educational Reviewarticle (as reprinted in Genetics andEducation). But Jensen's response refersentirely to statements he made on thesubject in his 1998 book, The g Factor.

What Jensen says about normality inthe Harvard Educational Reviewarticleis nonsense. I'm not familiar with whathe says about normality in The g Factor.

Nongenetic Variance

I don't think bizarre is too strong a wordto use to describe Jensen's use of statistics in 'The Puzzle of Nongenetic Variance." Readers who doubt this shoulddo three things:

I. Read Jensen's description of hiscomplex and confusing statistical analy-

sis of the IQ's of 180 pairs of identicaltwins - all 14 pages of it (pp. 61-74).

2. Read Jensen's conclusions (pp.80-82).

3. Ask themselves, What is the relationship - if any - between the analysis and the conclusions?

The underlying question Jensen istrying to answer is whether nongeneticvariation of IQ scores is explained bysocial environmental factors or by biological environmental factors. His statistical analysis is basically a clumsyattempt to characterize the wayin whichhis dataset departs from a normal distribution.

But how can departures from normality possibly distinguish betweensocial and biologicalfactors? What typeof departure from normality would oneexpect from biological factors? Whattype ofdeparture would one expect fromsocial factors?

Mysuspicion is that Jensen, who hasspent more than three decades arguing(unpersuasively) that variation in intelligence is explained primarily by genetics, is pre-disposed to favor a biologicalrather than a social interpretation forthe remaining nongenetic variation. Butthe distribution of his data can beexplained just as readily by either one.His assertion that biological factors arethe probable source is nothing morethan speculative theorizing. The statistical analysis is just window dressing.

I join with Jensen in urging readersto consult the full article. But I doubtthat the article's "whole theoretical context" will make Jensen's analysis seemany more reasonable.

Upcoming in Chance-

An SAT Coaching Program thatWorks, by Jack Kaplan

Ancient Geometry Comes to theAid of Modern Medics, by Matt Reed and Vyvyan Howard

~etter~Frequency Bias in an Electronic Scrabble Game, by Charles J. Robinve

Were the 1996-2000 Yankees the Best Baseball Team Ever? by Gary A. Simon and Jeffrey S.Simonoff

26 VOL. 14. NO.4. 2001

Dow

nloa

ded

by [

Uni

vers

ity o

f W

ashi

ngto

n L

ibra

ries

] at

08:

44 1

2 O

ctob

er 2

017