Transcript
Page 1: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

ARTICLE IN PRESS

Developmental Review xxx (2004) xxx–xxx

www.elsevier.com/locate/dr

On the law of intelligence

William Lichten*

Koerner Center for Emeritus Faculty, Yale University, New Haven, CT 06520-8368, USA

Received 21 November 2003; revised 26 March 2004

Available online

Abstract

The law of intelligence is presented in test independent form. Mental abilities, physical

brain size, and infant motor capacity follow the same law of growth from birth to adolescence.

Mental growth is independent of race, SES or the Flynn effect. The vitality of the mental age

scale calls for a reexamination of Wechsler�s deviation IQ. This paper builds on Yen�s method

of standardized differences (1986). The main theoretical advance here is to put development

back into intelligence testing and to show a universality among different measures of the

growth of the human nervous system.

� 2004 Elsevier Inc. All rights reserved.

This paper suggests a new theoretical structure for psychoeducational measure-ment and uses it to derive a scale of growth of mental ability. Over the years, many

researchers sought the ‘‘law of intelligence,’’ the growth curve of mental ability, a

goal to be addressed by this paper (Bloom, 1964; Bock, 1983; Gesell, 1928; Heinis,

1924; Jensen, 1973; Keats, 1982; Thorndike, Bregman, Cobb, & Woodward, 1927;

Thurstone, 1925, 1928; Thurstone & Ackerson, 1929; and many others).

Remarks on natural laws

We consider quantitative relations in physics and psychophysics, fields that are

sometimes emulated by mental testers.

* Fax: 1-203-432-8247.

E-mail address: [email protected].

0273-2297/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.dr.2004.04.001

Page 2: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

2 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Scales

A pound of meat is a pound of meat. It weighs the same on the butcher�s scaleswhether by itself or if added to another piece of meat. The scale divisions are uniform

in meaning over the entire range of measurement. Similarly, an inch is the same any-where on a yardstick. Likewise 1 s has a simple, well defined and measurable mean-

ing at any place or time. Thus the basic units of physics, mass, length, and time, and

the laws based on them, are measured on uniform, well-standardized scales.

Psychophysical scales measure the relation between subjective sensations (such

as loudness, pitch, and brightness) and objective physical correlates (sound inten-

sity, frequency, and light intensity). The scales of psychophysics purport to be uni-

form. For example, one asks an observer to vary the intensity of one sound until it

seems half as loud as a second sound. This way, one can set up a loudness scalewhich has equal divisions. (Licklider, 1951; Stevens, 1951, 1975; Stevens & Davis,

1938; Woodworth & Schlosberg, 1954). An example of a psychophysical law is that

of Weber–Fechner, that the sensation of pitch or loudness of a sound, brightness

of a light, etc., is proportional to the logarithm of the corresponding physical var-

iable. (For a sampling of the many discussions of this law and alternatives, see En-

gen, 1971; Luce, Bush, & Galanter, 1963; Luce & Krumhansl, 1988; Luce &

Suppes, 2002; Stevens, 1975; Suppes & Zinnes, 1963; Thurlow, 1971; Woodworth

& Schlosberg, 1954.)

Is a law of intelligence possible?

Quantitative physical and psychophysical laws hinge on measurement scales. Canwe set up a law of intelligence by merely following the examples of physics and psy-

chophysics? Unfortunately the matter is not so simple. As Jensen (1993, p. 141) put

it, ‘‘There are no existing tests that could render such statements as the following at

all meaningful: �A person gains half of his adult level of mental ability by the age of

five.’’� The rationale underlying this statement is the impossibility of comparing di-

rectly the growth of intelligence at different ages.

For example, infants are in Piaget�s sensorimotor stage:

. . .during the first year. . .intelligence, strictly speaking, is not yet observed.

(Piaget & Inhelder, 1969, p. 9)

On the other hand, adults are in a formal operational stage. Comparing the two

would be a case of apples and oranges.

The earliest intelligence measurements were expressed on a mental age (MA) scale

(Binet & Simon, 1916). On Binet�s scale, test scores advanced by even amounts each

year. But almost all subsequent mental tests showed a quite different growth pattern.Terman and Merrill (1937) noted MA was a very uneven scale, with rapid mental

growth among infants and young children and near stasis in late adolescence. Thus

neither the MA scale nor its grade level achievement twin can answer Jensen�s rhe-torical question. Yet both are quite useful and are still widely used in the clinical,

Page 3: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 3

ARTICLE IN PRESS

developmental, and educational literature. In Terman and Merrill�s words (1937, p.25)

The expression of a test result in terms of age norms is simple and unambiguous, resting

upon no statistical assumptions. A test so scaled does not pretend to measure intelligence

as linear distance is measured by the equal units of a foot-rule, but tells us merely that

the ability of a given subject corresponds to the average ability of children of such and such

an age.

It was a pity that the MA scale was dropped from IQ testing. This paper will usethe MA scale to derive the law of intelligence. For further discussion of MA, see the

later section IQ and Mental Age Scales.

Layzer (1972, p. 276) pointed out a related difficulty with IQ, as a measure of de-

viation of intelligence from the average at a given age: ‘‘IQ does not measure an in-

dividual phenotypic character like height or weight; it is a measure of the rank order

or relative standing of test scores in a given population.’’ (See also Jensen, 1993.) To

illustrate his point, consider this question. Which step in intelligence is the greater:

from 100 to 130 or from 70 to 100 IQ points? 130 and 100 might be the differencebetween a research medical doctor and a butcher. On the other hand, 100–70 is

the gap between average and mentally retarded, which under a recent Supreme Court

decision can be the difference between life and death (Atkins vs Virginia, 2002;

Greenhouse, 2002). Both intervals represent 30 points, but how can you compare

the two? Thus it is again an apples and oranges problem to equate units at different

points on existing mental ability scales.

Although we may not yet say what a mental growth scale is, we can certainly say

what it is not. Mental ability is not like a pound of meat; it cannot be put on a scalewhere each interval is exactly equal to every other one in meaning and in size. If it

were that simple, the problem of finding the law of intelligence would be solved and

there would be no need for this paper. We now turn to the measurements which can

be the basis of such a law.

The measurement of mental ability (IQ, achievement, etc.)

Without a simple, linear scale, how can we deal quantitatively with intelligence? In

the words of Jensen (1969, pp. 5–6)

Intelligence, like electricity, is easier to measure than to define. And if the measurements

bear some systematic relationship to other data, it means we can make meaningful state-

ments about the phenomenon we are measuring. There is no point in arguing the question

to which there is no answer, the question of what intelligence really is. The best we can do is

to obtain measurements of certain kinds of behavior and look at their relationship to other

phenomena and see if these relationships make any kind of sense and order.

Luce and Krumhansl (1988) pointed out that the situation is similar to the early

days of the study of heat. Nobody knew exactly what temperature was. The basis of

the concept was subjective feeling of hot and cold. It took centuries for the develop-

ment of the laws of thermodynamics before temperature was really understood. Nev-

ertheless, pioneers went about constructing thermometers based on expansion of

Page 4: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

4 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

liquids. They marked their instruments at two standard temperatures and divided it

into equal steps. For example, Fahrenheit�s scale had its zero at a mixture of ice and

salt and its 96 at body temperature.

A simple way to compare scales is to match mid points. For example, consider

thermometers with standard temperatures at 0 and 100 �F. The scale midpoints forgas thermometers differ from each other by only few thousandths of a degree. The

midpoint of the mercury scale differs from gas thermometers by 0.1 �F and from al-

cohol by 1 �F. The excellent agreement among most thermometric materials means

that thermometer scales are independent of the material used.

Water is an exception. It would make a poor thermometer. It would read 81.3 �Fat the midpoint of the 32–100 �F scale (66 �F). The reason for this gross discrepancy

is the non-linear expansion of water.

If one were to plot temperature from almost any scale against another, the plotwould be a straight line. This linear agreement among scales made it reasonable

to use any one to define temperature. Such scales preceded and agree with the

now well understood laws of thermodynamics.

Note that we cannot directly compare different parts of the temperature scale with

each other, as we might with two yardsticks by laying one on top of the other. There

is no simple, direct way to compare the temperature intervals 0–10 �C and 90–100 �C.Yet the consistency and the linearity among scales make the measurement of temper-

ature exact.In conclusion, this paper is a search for scales of the growth of mental ability

which do not depend on the specifics of the test used to measure it. A simple, prac-

tical test of this consistency of such scales is to compare growth midpoints. We obey

Jensen (1969) and avoid the claim that this is the way that ‘‘intelligence’’ really

grows. Rather, we shall compare the current scale with others in an effort to gain in-

sight into the nature of intelligence and other mental abilities.

Growth and variation. Local vs. global properties

The developmental psychologist Wohlwill (1973) split the growth of any quanti-

tative psychological trait into a universal growth function (Allport�s nomothetic,1942) and the individual variation about that function (Allport�s idiographic).

McCall, Eichorn, and Hagerty (1977, p. 3) noted that developmental psychologists

tend to slight one of these two factors:

Ironically, most empirical research has stemmed from an individual difference tradition in

which cross age correlations were calculated between indices of mental performance, while

the major theorist, Piaget, deals only with developmental function.

McCall�s criticism is particularly germane to IQ testing, where theWechsler�s (1939)deviation scale slights developmental function. One aim here is to overcome this lack.

Luce and Krumhansl (1988, p. 39) distinguished between local psychophysics,

‘‘which is concerned. . .with stimuli which are physically little different’’ vs. ‘‘global

. . . sensations over the full dynamic range of the physical stimuli.’’ In physics, local

Page 5: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 5

ARTICLE IN PRESS

properties are differential; global features are integral. Thus a differential equation

governs the acceleration of a falling body at a given position. Finding the integral

of this equation gives the overall motion of the body or projectile. Either form is de-

rived from the other by means of calculus. The integral form contains more informa-

tion (the boundary conditions) than the differential version. Newton�s laws give theacceleration of a projectile at each point of its trajectory, but it takes a whole chapter

in physics textbooks to relate this local condition to global information, such as the

time the object takes to fall to the ground or the path followed by a projectile.

Likewise, in psychophysics, Weber�s law, that the just noticeable difference (j.n.d.)of a stimulus is proportional to its magnitude, is a local relation stated in differential

form. The Weber–Fechner law, that states that the sensation is proportional to the

logarithm of the stimulus magnitude, is a global version. As in physics, the global

and local relation can be derived mathematically from each other. However, in thepsychophysical case the global law involves further assumptions. (For references,

see Remarks on natural laws section at the beginning of this paper.)

Importance of growth

Growth was at the heart of the first intelligence tests (Binet & Simon, 1916), which

measured on the mental age scale but neglected to measure variation at a given age.On the other hand, the modern deviation IQ scale is based on variation only and

gives no information about growth (Wechsler, 1939).

For developmental psychology and education growth is sine qua non. Casual ob-

servations as far back as Aristotle have shown that two adults of the same age are

more alike than a baby and an adult. The total growth of mental ability from birth

to adulthood is large compared to population variations occurring at a given age.

(see Appendix).

The global nature of mental ability goes beyond variation (IQ) at a given CA andalso includes the much larger growth and decay over the entire life cycle. We can view

IQ and growth over a short term (such as a year) as local and thus as incomplete.

Normally factor analyses of IQ tests are taken from data at the same CA. The

general factor of such analyses is g, general intelligence (Jensen, 1998). If a factor

analysis were instead made of data across the range of ages, for example in the

WISC, Wechsler Intelligence Scale for Children, the general factor would become

CA by far.

The problems of different fields are much more alike than their practitioners think. . .the

physical sciences have learned much by storing up amounts, not just directions. . .being so

uninterested in our variables that we do not care about their units can hardly be desirable.

Tukey (1969, pp. 83, 86, 89)

The need for units

This paper�s goal is nomothetic: to find universal properties of the growth of abil-

ity. Accordingly, it aims to express measurements and derived scales in well-defined

Page 6: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

6 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

units. However, in psychology units are hard to come by. The psychophysicist, in the

classical 19th century tradition of Wundt, Hering, and Helmholtz, measured mental

events, such as brightness, loudness, and pitch, in terms of tangible, physical quan-

tities like intensity and frequency. During the 20th century the word ‘‘psychophys-

ics’’ became ‘‘psychometrics.’’ Mental testers had one physical variable (CA) uponwhich to hang their hats. In addition, standardized tests often use deviation based

units like IQ or percentiles.

The present paper is based on normed, aggregated mental test data, the av-

erage and SD as a function of CA (for intelligence tests) or grade (for achieve-

ment tests). It works equally well and consistently on a variety of ability

measures and derived scales: raw scores; MA; IQ; Thurstone (1925), Rasch,

and Item Response Theory. Examples can be found in Appendix and Yen

(1986). This paper limits its data to well standardized mental ability tests. Thegeneric term ‘‘mental ability’’ covers a wide variety of standardized IQ, achieve-

ment, and infant development tests. Although some authors treat a even wider

range of abilities (Gardner, 1983; Salovey & Mayer, 1990; Sternberg, 1997; Tor-

rance, 1988; Torrance & Goff, 1989), none has made a standardized test which

could be used in this monograph (Jensen, 1998). The goal of this paper is to

find out to what extent an objective growth scale can be based on these mea-

sures.

Intelligence is what the tests test.

Boring (1923, p. 35)

The tests

This paper builds scales from aggregated data (group or population averages),

which are ‘‘true scores’’ (Gulliksen, 1987). It uses IQ, infant development andachievement tests from birth to adolescence. Standardized tests for teenagers and

adults, such as the SAT (formerly the Scholastic Aptitude/Assessment Test), ACT

(formerly American College Testing Program), GRE (Graduate Record Examina-

tion), LSAT (Law School Admission Test), MCAT (Medical College Admission

Test), etc., and individual scores show idiosyncratic rather than lawful behavior

and thus receive limited consideration in this paper. The tests consist of a nested se-

ries of components.

Items

The smallest unit of mental ability tests is the item, which consists of single ques-tion or task.

Subscales

Similar items, such as vocabulary or arithmetic problems, are grouped together to

form subscales.

Page 7: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 7

ARTICLE IN PRESS

Scales

Subscales in turn are combined to form scales. For example, the verbal scale in the

Wechsler IQ tests is a combination of information, similarities, arithmetic, vocabu-

lary, and comprehension subscales; the performance scale combines five subscalessuch as picture completion and block design.

The true (error-free) raw score T for persons of a certain ability is simply the total

number of correctly answered or performed items listed in the norms manual for that

group. Testing companies take a representative sample of the population to ap-

proach error-free tables in the manual. For example, an average 10-year-old would

get 14 items correct on the WISC-III information subtest which translates into a

scale score of ten, according to the manual.

The subscale scores are combined to arrive at scale scores, which are then com-bined to give an IQ for intelligence. Achievement test scores usually are given in per-

centiles (deviation score) or in grade level as a growth measure. The full scale score

(or composite score) combines all scales. This paper uses the terms subscales, scales,

and full scale. Scale score should be distinguished from the term scaled score or stan-

dard score z, which is the deviation of a raw score T from the mean M , expressed in

standard deviation units z ¼ T�MrT

.

IQ and mental age scales

The IQ and MA scales are test independent (all error-free tests give the same true

IQ and MA). On an easy test, an average 10-year-old may get 75 right answers on

100 items; on a hard test, the score might be only 25 items correct. On a sufficiently

large sample of a representative population, both scores would assign the same IQ

and MA to the average 10-year-old or to a sample of persons of any age with the

same mental ability. For intelligence tests, MA is a single, easily understood quantityin well defined units of years (see Fig. 1), as is grade level for achievement.

MA�s leveling off in late adolescence is characteristic of mental tests. Terman and

Merrill (1937, p. 25) noted:

. . .the mental age unit. . .appears definitely to decrease with age. . .the difference between 1-

year and 2-year intelligence (100 IQ points-auth.) is so great that any one can sense it. . .The

difference in intellectual ability between the average child of fifteen and the average child of

16 (then 5 IQ points-auth.) is so small that it can barely be detected by the most elaborate

mental tests.

Conversely, the SD or IQ unit, measured in MA or grade units, increases with age

(see Fig. 1 and Eq. (A.2)). At birth, rMA should vanish.

This behavior may seem strange for MA and its SD (Fig. 1) in isolation. Whenboth are combined to form a standardized growth function (in Appendix to this pa-

per), MA falls in line with other mental test scales (Fig. 18 and Table 1).

MA (and grade level) have the shortcoming that neither can handle children who

fall outside the scale range at either end. The reason is that both MA and grade level

Page 8: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 1. The mental age (MA) scale (Terman & Merrill, 1937.) Vertical bars: � 1 SD.

8 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

can only refer to average abilities, and average ability never reaches above average

children at the top of the scale, nor does it reach below average children at the bot-

tom of the scale. For example, the top of the California Achievement Scale (CAT) is

at the grade level of 12 years 8 months, the last grade at which exams are given.

When a student, class, or school average is higher than that, an arbitrary score of

‘‘12.8’’ is assigned to it. The Iowa Achievement Test (ITBS) uses a fictitious scale go-

ing up to the 18th grade to handle above average 12th graders. In either case, the

number assigned has little meaning.The growth scale used here is given in units of standard scores for each age. This

scale is extended simply by adding standard scores to it. Likewise, deviation IQ has

no problem handling exceptional persons at any age.

Since the time that MA was dropped by Wechsler (1939), IQ test scales have been

deviation based. These scales measure variation but not growth and thus are sub-

jected to the criticism voiced by McCall et al. (1977) and others. Indeed, one must

make correlational, longitudinal studies to study mental development (Anderson,

1939; Bloom, 1964; Furfey & Muehlenbein, 1932; McCall et al., 1977). To remedythat lack, this paper uses the MA scale inter alia to handle both growth and variation.

Standard scores

The familiar standard scale z equates tests by aligning the population means and

standard deviations (SD). For example, IQ has a mean of 100 and SD ¼ 15. Figs. 2

and 3 show another example: distributions for two college entrance examinations,the SAT-Verbal (mean¼ 505, SD ¼ 111) and the ACT-English (mean¼ 20.4,

SD ¼ 5:4).Fig. 4 plots both distributions against standard scores. (The vertical heights of

both distributions also are normalized.) The distributions of standard scores equate

well.

Page 9: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 2. SAT-Verbal score distribution.

Fig. 3. ACT English score distribution.

W. Lichten / Developmental Review xxx (2004) xxx–xxx 9

ARTICLE IN PRESS

Deviations between two sets of measurements which have effect sizes which are

less than 0.2 SD are considered small (Cohen, 1988). Inspection of Fig. 4 shows both

tests to be interchangeable within this precision over the range of z values between )2and +2.

The SAT and ACT align because both have the same shaped distributions and the

huge number of tests irons out statistical fluctuations. For most tests, the distribu-

tions near the mean (jzj � 2) are close to normal, which makes alignment of stan-

dard scores practical. For larger jzj, (Figs. 2 and 3) curves deviate from each otherand results become less comparable.

Page 10: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 4. Scaled score distributions of SAT-V and ACT English.

10 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Norming samples at each age for IQ tests are only a few hundred at most. For

deviations of jzj � 1 (IQ well outside the normal range of 85–115), there are so

few cases in the norming samples that one cannot talk of distributions at all and

it is impossible to compare different IQ tests. For example, the WISC-IV standard-

ization (norming) sample consisted of 200 children at each age. Assignments of ex-

ceptional children (gifted or retarded: IQ > 130 or <70) are based on a normingsample in those IQ ranges of only 5 tests and are questionable. One shudders to think

that life and death decisions hinge on such data (Atkins vs Virginia, 2002; Green-

house, 2002). Especially worrisome are indications of inconsistency between Stan-

ford–Binet and Wechsler tests (Table 2 and Lichten & Wainer, 2004).

The method of standardized growth

Testers have long argued as to whose scales most accurately reflect ability. Re-

lated disputes were over whether or not the standard deviation of mental ability re-

ally increases or decreases with age. As far back as 1928, Thurstone dismissed

statements like ‘‘the distribution of intelligence follows the normal curve.’’ One

can only discuss the distribution of observable quantities like the raw score or scales

which are derived unambiguously from it, such as the latent traits of IRT (Item Re-

sponse Theory) or Rasch theory. Likewise, more recently Yen (1986) rejected dis-

putes over which mental ability scales better represented intuitive, operationallyundefined concepts.

Page 11: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 11

ARTICLE IN PRESS

Yen compared pairs of subtest scores (reading vocabulary and math computa-

tion) from two school achievement tests (CAT/C, California Achievement Test,

based on Thurstone number right scaling and CTBS/U, Comprehensive Tests of Ba-

sic Skills, based on IRT) and found them to be inconsistent (Figs. 5–8).

At first glance, these (Figs. 5–8) are difficult to understand, with an apparentlyrandom pattern of change. For example, Figs. 7 and 8 show different patterns of

Fig. 5. Manufacturers� scales for achievement subtests (Yen, 1986). Vertical bars: � 1 SD. CAT/C Reading

vocabulary. Thurstone number right scaling.

Fig. 6. Manufacturers� scales for achievement subtests (Yen, 1986). Vertical bars: � 1 SD. CTBS/U

Reading vocabulary. IRT scaling.

Page 12: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 7. Manufacturers� scales for achievement subtests (Yen, 1986). Vertical bars: � 1 SD. CAT/C Math-

ematics. Thurstone number right scaling.

Fig. 8. Manufacturers� scales for achievement subtests (Yen, 1986). Vertical bars: � 1 SD. CTBS/U Math-

ematics. IRT scaling.

12 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

mean change, with much more pronounced early growth in 8 than in 7. On the otherhand, the changes on the SD (the error bars) seem to go in the opposite direction

(small to large in 7, large to small in 8). How can one make sense out of such a pat-

tern, in which mean and SD seem to work in opposite directions?

Furthermore, the growth curves for corresponding tests had dissimilar shapes,

patterns of growth which are contrary to the claims that both scales were equal in-

terval (1 unit represents the same amount of ability at any age). If both scales were

equal interval, the growth curves would have the same shape.

As we turn to Yen�s (1986) method of resolving these contradictions, we makesome general observations. First of all, neither the means nor SDs of either scale

Page 13: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 13

ARTICLE IN PRESS

can be meaningful by themselves, since both show patterns of growth with grade level

which are inconsistent. On the other hand, Yen found a consistent relation between

mean and SD, which she showed by comparing two corresponding tests (math vs.

math or vocabulary vs. vocabulary). At a given grade, if the SD of one test is rela-

tively large, the growth curve is relatively steep; when the SD of a test is relativelysmall, the growth curve is flatter, with relatively small growth.

This relation is the basis of Yen�s method of standardized growth. By taking the

ratio of growth rate and SD, she found a quantity that did not depend on whether the

test was CAT/C or CTBS. (For mathematical formulae, see Appendix.) For incon-

sistency in either of two quantities (mean and SD) alone she substituted lawfulness

in the combination of both. This is the genius in Yen�s method, to which we now

turn.

In the example of the SAT and ACT scales, which had very different values formeans and standard deviations, plotting standard scores z made both distributions

the same. In a similar fashion, Yen�s standardized differences reduced annual growth

in both tests to be the same.

Yen’s results

Yen applied her method to the results shown in Figs. 5–8 (see Fig. 9 and Yen,

1986 for tables.) The standardized differences were closely the same for correspond-ing subtests.

The close subtest agreement for each trait made moot the dispute over the relative

merits of the test scales. In her words,

standardized differences lead to essentially the same conclusion regardless of which scale is

used.

(Yen, 1986, p. 305)

Fig. 9. Standardized growth differences for the same subtests as in Figs. 5–8. SD ¼ 1.

Page 14: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

14 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Two remarks on standardized growth

Yen�s method was an important first step towards the law of intelligence. How-

ever, it had two limitations:

1. The method considers only local aspects of mental growth. That is, it only dealswith the standard deviation and 1 year�s growth of the test score. For a law of in-

telligence to handle global growth (the entire amount from birth to adulthood),

her calculation needs to be extended.

2. Standardization only brought together the growth for like scales (for a single abil-

ity of math or vocabulary). Unlike scales, vocabulary and math, did not show the

same growth. There is no a priori reason to expect growth functions in tests for

different subjects to be the same. This leaves us short of a law of intelligence, which

should not depend upon which test is used. The achievement of that goal will in-volve finding a class of mental tests for which standardized growth is the same.

Extension of the method of standardized growth

This part of the present paper constructs a growth measure that extends and

broadens Yen�s results.Fig. 10 shows curves obtained by the author by simply adding Yen�s differences to

obtain a growth function. This procedure has two advantages: it changes local to

global measures and irons out statistical fluctuations. (Compare Figs. 9 and 10.)

A reminder

As pointed out earlier, we cannot assume the units keep the same meaning at all

grades. It would be nice if mental ability were that simple, but that is not in the cards

at this stage of inquiry. Nevertheless, the growth function, as given here, is unambig-uously defined. As such it is a valid measure.

Fig. 10. Growth curves obtained from Fig. 9.

Page 15: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 15

ARTICLE IN PRESS

This paper explores standardized growth of the composite, or ‘‘full scale,’’ score

of a mental ability test, which combines all subjects, such as math computation,

math concepts, reading ability, vocabulary, spelling, geometric visualization, and

factual knowledge. By combining subjects which involve many portions of the brain

and occur at every stage of development, one might expect to find a more lawfulmeasure of human development that depends less on the specifics of the test.

Remarks on standardized growth

Growth curves for different subtests, such as ITBS reading vs. CAT/C math com-

putation sometimes do not agree; growth for subtests of like abilities agree. Calcu-

lations of growth are made without any adjustable parameters, a stringent test.

Typical coefficients of variation (c.o.v.) of total growth among unlike subtests are

25%, as compared with c.o.v. among like subtests of 10%. Standardized growth re-

duces variance by a factor of 6. As noted earlier here, cumulative growth is more pre-

cise than standardized differences, since it statistically averages out random

fluctuations of individual pairs of data (Figs. 9 and 10).

It matters not so much what the questions ask, as long as they are numerous.

(Binet & Simon, 1916, p. 329)

The indifference of the indicator.

(Spearman, 1927, p. 198)

Full scale tests

The expectation that full scale tests would probe universal aspects of the human

mind goes back to the earliest days of mental testing. Ever since Binet and Stern,

IQ tests have assessed a broad mixture of mental skills. The universal results did

not depend on the details of the test. This is the rationale for what follows in thispaper.

Achievement tests

Fig. 11 plots full scale scores for two widely used achievement tests, the CAT E,F

and the Iowa Tests of Basic Skills (ITBS). For comparison, the linear scales of both

tests are adjusted to make both coincide at grades 1 and 12. The curves� differentshapes affect the growth midpoints. The CAT E,F (based on IRT scaling) midpointis in the 2nd grade; the ITBS (based on a proprietary scale) rises more gradually and

reaches the halfway mark in the 4th grade.

Fig. 12 shows standardized growth for the two tests. The halfway growth marks

are much closer together than in Fig. 11.

Table 1 compares a sample of widely used achievement tests. The standardized

growth mid grades are in better agreement than the manufacturer�s scales. The

Page 16: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 11. Manufacturers� growth curves for composite CAT/E,F and ITBS achievement tests. Arrows:

score midpoints.

Fig. 12. Standardized growth curves for ITBS and CAT tests. Compare with Fig. 17.

16 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

c.o.v. again is reduced approximately from 25 to 10% by standardization, a reduc-

tion of variance by a factor of six. The method works for full scale tests as it did

for Yen for subtests.

Page 17: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Table 1

K-12 growth (SD); half-way growth points for achievement and IQ

Test Scale K-12 Growth

(SD)

Mid grade or mid age in years

Standardized Original

Achievement tests

CAT IRT 7.7 2.4 grade 2.4

CTBS IRT 8.4 2.3 2.1

ITBS ITBS 8.4 2.8 5.2

MAT Rasch 7.8 2.7 2.7

SAT Rasch 8.7 2.3 2.5

W-J Rasch 10.4 2.7 2.1

WRAT Rasch 8.8 2.1 2.1

Mean

Raw 8.6 2.5 grade 2.7

Correcteda 6.9 8.2 years —

SD 0.7 0.26 years 1.1

IQ Tests

Mean 7.3 7.9 Years

SD 0.7 —

aTo convert grades for achievement tests to age in years to match IQ tests, corrections were made for

retention in grade and exclusion of special education students from testing.

W. Lichten / Developmental Review xxx (2004) xxx–xxx 17

ARTICLE IN PRESS

The dispute as to whether mental tests are unidimensional is nearly a century old,

goes back to Spearman and Thurstone, and is still current. However, Wilks� (1938)theorem implies that correlations between different tests approach unity as test

lengths become infinite. Thus full scale, long tests may appear to be unidimensional,

even if the subtest contents are multidimensional. Yen (1985) found evidence ofmultidimensionality in an achievement test, especially in the mathematics subtests.

However, this occurred in the high school grades, which age range was ruled out

for the present paper on the basis of lack of lawful behavior.

In Table 1, the ITBS appears to be an outlier from the other tests. However, ex-

amination of Figs. 5–8, shows that the Thurstone scaled CAT C tests also differ from

the CAT E, F, IRT scaled tests in similar fashion. Whether the CAT C–CAT E, F

difference is caused by the change in scaling from Thurstone to IRT, or by the revi-

sion of the test is anybody�s guess. Rather than try to untangle this complexity, itsuffices to say that Yen�s method of standardized differences eliminates the problem.

The correlation between different subtests is often relatively low; that between full

scale tests is high. Full scale tests consist of many, various items. Statistical sampling

theory tells us that the correlation between full scale tests will be high (Wilks, 1938).

Likewise, the statistical averaging in full scale tests irons out differences in subtest

growth.

Incidentally, the choice of scale makes little difference in the end. The outcome is a

universal growth function. It was predicted and confirmed that this function governsbrain growth (Lichten, 1993, 1996). This function transcends mental ability, unex-

pectedly, as it applies also to infant motor ability (Bayley, 1993).

Page 18: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

18 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Intelligence tests

IQ test correlations and growth follow suit. For example, in the Wechsler Intelli-

gence Scale for Children (WISC-III) the average correlation among subtests is

r ¼ 0:44; between verbal and performance scales, each consisting of five subtests, itis r ¼ 0:66. The standardized growth curves for subtests, the verbal, performance

and full scale tests, are shown in Figs. 13–15. Growth among subtests vary; perfor-

mance, verbal, and full scale growthare close to eachother. Similarly, different full scale

IQ tests usually have high correlations with each other and other standardized tests.

The standardized growth for IQ tests is calculated from the well-known MA def-

inition of IQ in Appendix. The standardized growth differences of MA (plotted in

Fig. 16) are inversely proportional to CA. On the log–log plot of Fig. 16, the inverse

relation becomes a straight line with negative unit slope. Fig. 16 compares commonlyused infant and children�s IQ tests. The data fit such a line well between 4 months and

12 years, over a factor of 36 in ages and in growth rates. Table 2 lists for a variety of

IQ tests the empirical growth coefficient b in the inverse relation (for mathematical

details, see Appendix). The theoretical value (based on early 20th century tests) of

b is 6.7 and is in fair agreement with the data based on late 20th century tests.

Fig. 13. Standardized growth: WISC-III verbal.

Fig. 14. Standardized growth: WISC-III performance.

Page 19: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 15. WISC-III performance, verbal, and full scale standardized growth. SD ¼ 15.

Fig. 16. Standardized differences for IQ tests and predicted line from Eq. (A.3). See also Table 2.

W. Lichten / Developmental Review xxx (2004) xxx–xxx 19

ARTICLE IN PRESS

The growth constant b is slightly larger (ca. 10%) in earlier tests, presumably be-

cause less inclusive samples were used for standardization, which results in a smaller

SD and therefore a larger growth function. The Stanford–Binet appears to be the ex-

ception that proves the rule; it merits further investigation.

Construction of mental ability growth functions

We make a growth function J by taking the sum of standardized differences from

IQ data from birth onward (Fig. 16). Fig. 17 shows the growth function J , which is

logarithmic. Remarkably, J rises to half of its adult value in only a little more than a

year. The Appendix shows the mathematics and also a simplified model of standard-

ized growth to show how such a function is independent of the test.

Page 20: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Table 2

The growth constant b for IQ tests

Test Year Ages ba SEM

Binet–Simon (Burt) 1922 3–12 7.0 0.2

Yerkes point scale 1923 5–12 6.6 1.1

312–7 5.3 0.3

Stutsman (Merrill–Palmer) 1931 112–51

46.8 0.4

Stanford Binet (Terman) 1916 4–12 7.9 0.2

(Terman and Merrill) 1937 2.3–12 6.7 —

(R.L. Thorndike) 1986 2.3–12 5.2 0.3

(G.H. Roid) 2003 2–12 4.5b 0.5

Wechsler

Wechsler–Bellevue 1939 7.5–12 7.1 0.6

WPPSI-R 1989 3–6 6.2 0.2

WISC-III 1991 6–12 6.2 0.5

Bayley

First publication 1933 0.33–3 10.5 0.4

BSID 1969 0.33–2.5 7.6 0.3

BSID-R 1994 0.33–3 6.2 0.5

CogAT (Riverside) 1992 5–12 6.3 0.4

Average: All tests 1917–1994 0.33–12 6.5 0.4

Newer tests 1986–1994 0.33–12 5.9 0.2

Corrected value 1986–1994 0.33–12 6.8 0.3

aUncorrected, unless so mentioned.bCorrected for reliability and step size.

20 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Unintuitive as this result may seem, it is undeniable.Moreover, this result correlates

with physical brain development. Furthermore, Chugani (1994) found by means of

positron spectroscopy that every part of the brain becomes functional in the first year.

Along these lines, it can be showed, inter alia, that at birth variation is almost en-tirely due to maturation and for adults, test score differences reflect ability, not

Fig. 17. Growth functions for IQ tests from Fig. 16 and for the human brain.

Page 21: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 21

ARTICLE IN PRESS

maturation. These conclusions of psychological significance are among those that fol-

low from the growth function developed here, but do not depend on any comparison of

the real size of the SD of mental ability. Parenthetically, one could derive some of the

conclusions in this paper without the growth function and just by using standardized

differences. However, the procedures would be much less direct and less transparent.It should be clear from the discussion given earlier in this paper that this result

does not necessarily mean that a 1-year-old has half the intelligence of an adult. Such

a statement would mean that the unit of the J scale, the SD, meant the same thing

throughout the life cycle. This paper makes no such apples and oranges claim. That

such a claim would be unwarranted has been pointed out by many previous investi-

gators (Flanagan, 1951; Jensen, 1969; Schulz & Nicewander, 1997; and especially

Yen, 1986, pp. 312 ff.).

Gesell�s (1928) intelligence scale, a prescient conjecture, was close to the presentgrowth function. He modeled it after the Weber–Fechner law, one of psychology�soldest principles, which is discussed in the remarks on natural laws section at the be-

ginning of this paper. His growth rate of intelligence was inversely proportional to

chronological age CA. He integrated it and also got a logarithmic relation. However,

he did not solve the problem of the divergence of the log function at both ends.

Because J is defined in SD units, the extension to non-average test scores is trivial.

One simply adds to J the standard score z. For example. an average 10-year-old

(IQ¼ 100) has a growth function J ¼ 29:5. For a 10-year-old with IQ¼ 85, thegrowth function is J � 1 ¼ 28:5; for an IQ of 115, the value is J þ 1 ¼ 30:5.

Comparison of mental ability and other growth scales

Intelligence and achievement

Fig. 18 and Table 1 compare standardized growth of achievement and intelligencetest scores (average of the most widely used tests). Other than aligning both func-

tions at the beginning, there are no adjustable constants. The difference in K-12

growth is less than 10%.

Fig. 18. Standardized growth for IQ and achievement tests. See Table 1.

Page 22: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

22 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

This is comparable to the coefficient of variation among full scale tests of the same

kind (IQ or achievement) and thus shows that growth of both types of tests are in-

distinguishable.

Infant mental and motor scales

Fig. 19 compares the standardized growth curves of the Bayley Mental and Motor

Infant Scales of Development (B.S.I.D., 1993):

The Mental Scale includes items that assess memory, habituation, problem solving, early

number concepts, generalization, classification, vocalizations, language, and social skills.

The Motor Scales assesses control of the gross and fine muscle groups. This includes. . .roll-

ing, crawling and creeping, sitting, standing, walking, running, and jumping. . .items. . .not

concerned with functions generally perceived as �mental� or included in intelligence scales.

(Bayley, 1993, p. 1)

This claim is supported by the average correlation between Bayley Mental and

Motor Scales, which is only 0.45 (Bayley, 1993). The mental scales correlate well with

the WPPSI (Wechsler Pre-school and Primary Scale of Intelligence: r ¼ 0:73, typicalfor two IQ tests); the motor scales correlate less well with WPPSI (r ¼ 0:41).

Yet both growth curves are remarkably the same. Here correlation and growth do

not go hand in hand. It appears that growth reflects more fundamental and general

factors than those shown by correlation.

Discussion: Theme-park psychology?

Growth is independent of race, SES, and Flynn effect

Sternberg (2000) has criticized what he calls ‘‘theme-park psychology,’’ a study of

human behavior under narrowly limited conditions. A psychological principle, like

Fig. 19. Bayley Mental and Motor standardized growth functions.

Page 23: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 23

ARTICLE IN PRESS

any natural law, should hold for all times, places, and cultures. By going to different

eras, different races, and varying socio-economic status (SES), the author has gone as

far as possible with currently available data to avoid theme-park psychology. Table 2

shows that the law of growth has held over the entire history of IQ tests (but see re-

marks on Table 2). A comparison of mental growth between industrialized and non-industrial cultures is not available. For the raison d’etre of this paper is a well-stan-

dardized test, which is a hallmark of the developed world.

Within this sector, a pronounced source of diversity is race. The black–white men-

tal test gap, about 1 SD, has been known for nearly a century (Jencks & Phillips,

1998). Fig. 20 plots achievement as standardized growth (in SD units) for two races

based on the CAT (CTB, 1987). African-Americans have the same standard score at

all school ages. When gains in achievement between two ages are compared, the

black–white difference cancels and African-Americans neither lose nor gain relativeto the general population. The actual K.8-12.8 gains are 5.82 SD for the entire pop-

ulation and 5.91 SD for African-Americans, a difference of only 1.5%. Similar results

hold for other SES indices, such as income, parental education, ethnicity, suburban–

urban–rural, south–north, etc., which often have consistent standard scores.

There are no racial differences in the mental scale at birth (Bayley, 1965). Differ-

ences in test scores at school age then must represent corresponding deviations in

growth rate at some age. However, the differences are relatively small on the growth

scale (expression (A.40)). A 1 SD difference at adolescence represents only a 3% dif-ference in standardized growth over the entire period from birth onward. (The exact

meaning of this number, of course, is subject to the oranges and apples caveat ex-

pressed throughout this paper.) This difference is established approximately in the

second year of life, when the infant is learning to talk (Bayley, 1954, Fig. 1; Garber,

1988, Fig. 4-2). This may be connected with the observation that permanent IQ

Fig. 20. Black and White standardized growth for CAT achievement test are the same (see Fig. 22).

Page 24: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

24 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

differences are largely formed in the home and depend on the vocabulary and other

characteristics of the infant�s parent(s) or caregiver (Farkas, 2001; Farkas & Beron,

in press; Hart & Risley, 1995).

Like racial and class differences, the Flynn effect, the worldwide secular advance

in intelligence test scores with time, is approximately independent of age and thusleaves the standardized growth of mental ability unaffected. (Flynn, 1984, 1987;

Neisser, 1998).

The present ‘‘law of intelligence’’ is too universal to accuse it of being theme-park

psychology. Nevertheless the tests used were standardized largely on US popula-

tions. It would be desirable to see if the law holds in other countries, especially in

underdeveloped nations.

How does the mind grow? It grows like the nervous system; it grows with the nervous sys-

tem.

(Gesell & Ilg, 1943, p. 9)

Physical and mental growth

Physical growth is measured in objective units, such as meters and kilograms.Mental growth is defined here in SD units. Fig. 21 compares typical growth curves

for different human body systems:

Lymphoid type: thymus, lymph nodes, and intestinal lymph masses.

Fig. 21. Growth curves for several body organ systems (Scammon, 1930).

Page 25: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 25

ARTICLE IN PRESS

Brain and head type: brain and head size, standardized mental ability (redrawn by

author).

General type: body as a whole, external dimensions (except head), respiratory and

digestive organs, kidneys, aortic and pulmonary trunks, musculature, and blood vol-

ume.Reproductive type: testis, ovary, epididymis, prostate, seminal vesicles, and fallo-

pian tubes.

Fig. 17 compares the mental growth function with that of the human brain. The

similarity between mental and neural growth curves is remarkable and unique.

Objections and trivializations of the growth function

Growth of the mind and of big toes

Skeptics mock Gesell and Ilg along these lines: ‘‘The big toe grows with the mind.Therefore the mind resides inside of our big toes.’’

Reply

This joke is a simplification which treats all growth the same. On the contrary,

each body system has a growth spurt at its own age (see Fig. 21). Sex shows a growth

spurt at puberty when mental growth is almost frozen; conversely the mental-motor-

brain growth spurt occurs perinatally when reproductive organ growth is nearly nil.That reproductive and mental ability growth spurts occur at different times in the life

cycle jibes with the smallness of male-female IQ differences. Epstein (1979) claimed

the existence of non-perinatal growth spurts, but these have not been confirmed.

Skeletal growth shows a very different pattern from the brain and mental ability

(see Fig. 21). Hence growth patterns confirm Plato in that the mind resides in our

heads, not in our big toes.

Some have even suggested that ability and achievement measures are so much alike as to be

virtually the same thing.

(Gridley & Roid, 1998, p. 257)

Reply

The tests considered in this paper are not necessarily ‘‘the same thing.’’ IQ and

achievement tests have common features, such as vocabulary, but also have differ-

ences, such as spelling and maze performance. Tests or subtests can have substantial

correlations with each other (typically, ca 0.7), grow in the same way, and yet be

quite different (Campbell & Fiske, 1959). Examples are the Wechsler Verbal and Per-

formance scales (r ¼ 0:66. See Fig. 15). The Bayley Mental and Motor Scales only

correlate modestly (average value 0.45), but grow together in a lock step. The items

(fine hand movements vs. verbal proficiencies) are different, yet share a common lawof growth.

Page 26: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

26 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Does J apply to education? An achievement gap

Critics of American education have claimed that racial minorities and the poor

are short-changed by the public schools. Minority children start school only a few

months behind majority children, fall further behind, and finally graduate from highschool at only the eight or ninth grade level on standardized achievement tests (Cole-

man et al., 1966). On the other hand, Jensen (1973, pp. 97–102) noted that the edu-

cational achievement gap, expressed in SD, remained essentially the same throughout

the school years (see Fig. 20). Thus, according to this statistic, schools are educating

minority pupils as well as the majority.

But on IRT scaled tests like the CTBS or CAT, the decrease of the SD with grade

shows minority students catching up! Could this mean that their education is better

than that of the majority? Hardly (see Fig. 22).The inconsistency of these conclusions results from the difficulty of comparing a

SD at one age with that at another age, as emphasized earlier in this paper (for ex-

ample, see paragraph on Extension of the method of standardized growth). One can-

not accept a conclusion that depends on which test scale one uses. On the contrary,

the sameness of standardized growth on full scale tests, despite differences among so-

cial, ethnic, racial groups and eras, merely shows the underlying lawfulness of mental

growth, but says little about the schools. One might view skeptically the often aimed,

but seldom achieved, goal of eliminating group test score differences (Jencks & Phil-lips, 1998).

Evaluation

Based on Yen�s powerful method of standardized growth, this paper presents a

new psychometric yardstick of mental development. We now address several ques-

tions to evaluate this measure. Is the new approach really needed? If so, how does

Fig. 22. Based on IRT scales of the CAT E/F, minority achievement catches up (see Fig. 20).

Page 27: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 27

ARTICLE IN PRESS

it meet the needs? Does it have advantages over the highly developed techniques

presently used in psychometrics?

Is it really needed?

This paper has emphasized that growth is a pre-eminent aspect of mental ability,

whether it be intelligence or achievement. The current deviation scale of intelligence

gives no direct measures of growth and must rely on correlational data obtained

from difficult longitudinal studies. The situation is better in achievement testing,

where some methods, such as Rasch and IRT, agree fairly well with each other

and with the standardized growth method, as Table 1 shows.

It appears that the reason for the agreement is that IRT scales are also expressed

in SD units. Lord (1975, 1980) pointed out that the IRT method does not lead to aunique ability scale. He showed that any transformation of the IRT ability scale can

be used in principle. It is proved in Appendix of this paper that the standardized

growth functions based on any transformation of the IRT scale are mathematically

identical. Thus the J scale is unique.

However, achievement tests only cover the school ages, which make up only the

last 20% of brain and standardized mental growth. The crucial early years, which

contain the major developments of language, social, and motor functioning, are left

out. Furthermore, psychometric measures of growth were disconnected. Prior to thispaper, there were few direct comparisons of the growth of intelligence with that of

achievement.

Although current item analysis techniques function well in constructing mental

ability (especially computerized) tests, we are left with little insight. The IRT method

of grading a mental ability test is so complex that it involves a computer program

that only a specialized coterie of psychometricians can fathom. Furthermore, it

should be noted that a typical achievement battery is not just a single test. Because

of the fierce growth among young schoolchildren, psychometricians must administera series of tests, each designed to fit a particular grade level. To link the scales from

these smaller tests into a grand k-12 scale involves vertical scaling procedures (Kolen

& Brennan, 1995), which do nothing to remove the mystery of test construction. This

disconnect between measurement expertise and developmental psychology and edu-

cational practice is unhealthy.

A comprehensive mental growth measure: Birth to adulthood. How it meets the needs

Yen�s method of standardized growth translates among mental ability tests and

puts them all on a common footing. It is used here to put growth back into IQ

and to link it with achievement. It also connects infant, child, and adult tests.

Advantages over present techniques

A central point of the method of standardized growth is its simplicity. The au-

thor�s calculations often were done on a pocket calculator and, at times, with pencil

Page 28: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

28 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

and paper, since only elementary arithmetic was needed. To see how simple the cal-

culations are, read Yen�s paper (1986) and follow her calculations, or do the same for

the example in the Appendix. Item analysis and test equating may remain useful for

test construction, especially for computerized exams. However, standardized growth

functions are simpler and more versatile.The controversy over which tests and scales are best, which should have been

ended by Yen�s paper (1986), becomes moot. For example the mental age scale, over

its range of applicability (0–12 years) and when its growth is standardized, is as good

a measure as any and has been shown here to possess some simplicities and advan-

tages. It is the only function that covers growth from birth to the teens (ca. 90% of

the total) in an unambiguous numerical formula.

This paper agrees with the social psychologist, Allport (1942), who concluded that

neither a purely nomothetic nor idiographic approach to psychology can suffice. Hu-man mental ability, its magnitude, its variation among individuals, and its growth

with age are inextricably connected.

Further research. Probing the limits of validity of the law of growth

The present paper has pushed down the law of growth (Stern relation: Fig. 1; Eq.

(A.1); and rIQ ¼ 15) from its former value of 2 years (Stanford–Binet test) to 4

months (B.S.I.D.). As pointed out by the author (Lichten, 2002), corrections for ges-tation may account for at least part of the deviations from the Stern relation below

the CA of 4 months. The corrected Stern relation may hold at even earlier ages.

Summary

This paper has found a new measure of growth which applies quantitatively, con-

sistently, and lawfully to intelligence, achievement, infant motor ability, and braindevelopment. When measured by total standardized growth, differences on full scale

tests, among social groups, epochs, and schools are small.

The long sought law of intelligence, given here in a test-independent form, applies

to full scale measurements of mental ability, both for IQ and achievement at all ages

up to adolescence. It is expressed in Appendix in three mathematically equivalent

ways.

Acknowledgments

Visitor 1998–1999 at the Educational Testing Service, Fellow 1994–2003 of the

Yale Institution for Social and Policy Studies. The author thanks W. Gilliam, T.

Goldsmith, R. E. Keen, J. Kihlstrom, C. Levinson, L. Mazes, U. Neisser, W. R.

Overton, R. Sternberg, H. Wainer, R. Wyman, K. Wynn, and E. Zigler for helpful

suggestions and encouragement. Preliminary versions of this paper were given by the

author (Lichten, 1993, 1996) and at meetings held by the Eastern Psychological

Page 29: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 29

ARTICLE IN PRESS

Association, Wash. DC, 1997 and Boston, MA, 1998; New England Psychological

Association, New London, CT, 1996; American Psychological Association, Wash.,

DC, 1998 and New Orleans, 2002.

Appendix

This appendix collects mathematical equations and also presents a simplified,

non-mathematical model of standardized growth for readers who prefer to skipequations.

IQ, MA, and the mathematical form of the law of intelligence

IQ and growth were related before Wechsler (1939) by well-known mental age

definition of IQ (Stern, 1914)

IQ ¼ 100MA

CA; ðA:1Þ

where CA is chronological age and MA is mental age.

Example

A 10-year-old with the mental ability of an 11.5-year-old has an

IQ ¼ 100� 11:510

¼ 115.

The standard deviation of the IQ distribution was approximately 15 at each age.This fact was the basis of Wechsler�s deviation definition of IQ, which dropped the

mental age Eq. (A.1), and defined the mean IQ to be 100 and the standard deviation

to be 15 points. It also is the basis of the law of intelligence, as presented here.

To compare growth and variation of mental ability, we go back to our 10-year-old

with MA¼ 11.5 and IQ¼ 115, one SD above average, a deviation within the normal

range of ability. This deviation is equivalent to 11:5� 10 ¼ 1:5 years, only 15% of

the child�s total age. Thus global features of growth overshadow local variations

(at a given age) in mental ability.Strictly speaking, this argument violates the earlier statement in this paper that

one cannot equate mental growth scores at different chronological ages. However,

the dominance of growth over variation is huge on any scale. For example, a 10-

year-old who scored at the level of an average 4-year-old on the Stanford–Binet4

would have an IQ of 40 on the MA scale and zero on either the deviation (standard

score ca )6) or standardized growth (see expression (A.3)) scales. Such an IQ is far

outside the normal range of variation in test scores.

The law of intelligence: Derivation and three mathematically equivalent formulae

From expression (A.1) and since rIQ ¼ 15, we have the first statement of the law

of intelligence

Page 30: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

30 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

rMA ¼ 0:15CA: ðA:2Þ

We calculate the standardized growth of the MA scale as follows. To find thestandardized differences, find the quotient of the annual increase of MA and its SDfrom expression (A.2). Since MAAV ¼ CA, its annual growth is 1 year, i.e.,

MAðCAþ 1Þ �MAðCAÞ ¼ 1 year.

A combination of these relations gives a second expression: the standardized

growth rate of mental ability (quotient of annual growth and standard deviation)

is given by

G:R: ¼ 100

15� CA: ðA:3Þ

Integration of this expression gives the third relation for the standardized growth J :

J ¼Z

6:7

CAdðCAÞ ¼ 6:7 lneðCAÞ þ Const: ðA:4Þ

An alternative form gives the standardized growth of mental ability between two

ages CA1 and CA2:

J 2 � J 1 ¼100

15lne

CA2

CA1

� �: ðA:4

0 Þ

A simplified model of standardized growth

Three imaginary achievement tests measure the growth of a mental ability, which

steadily increases with grade level. We further imagine the standard deviation of this

trait to be one grade level at all ages. Three tests show different patterns of growth

for average raw test score as shown in Fig. 23.

The raw score at a particular grade is the total number of dots from the left-hand

point (beginning of grade 0, i.e., kindergarten) to the grade in question.In test ‘‘S’’ the items are evenly spaced. Test ‘‘C’’ has items concentrated at lower

grade levels; test ‘‘I’’ items crowd together at higher grade levels.

In test ‘‘S,’’ since there are simply five dots per grade, the average raw score (total

number of dots up to the grade in question in Fig. 23 and Table 3) increases evenly.

The raw score, shown in Fig. 24, falls on a simple straight line with slope equal to

five. The raw score SD is the number of dots (simply five in this case) in an interval

Fig. 23. A model of standardized growth. Each dot is for a single item; total number: 60.

Page 31: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 31

ARTICLE IN PRESS

of one standard deviation¼ one grade. Since the SD is constant, its graph is a hor-

izontal straight line in Fig. 24.

In test ‘‘C’’ the items are unevenly spaced, with the items crowded together at be-

ginning grades (see Fig. 23). Correspondingly, the annual steps in the mean raw

score become smaller with grade level (see Table 4).The ‘‘C’’ raw score graph has negative curvature (is concave downward) and its

SD graph slopes downward in Fig. 25. Mean and SD are obtained again by a simple

count of dots in Fig. 24. Since the items (dots) are crowded to the left, both mean and

SD plots are curved. Both the slope of the curve for means and the height of the SDdecrease toward the right.

In test ‘‘I,’’ the item spacing becomes small at higher grades (see Fig. 23 and Table

5). The test score has positive curvature (is concave upward) and the SD slopes up-

ward, as shown in Fig. 26.

Table 3

A model of growth. Test ‘‘S’’: Raw score has uniform growth rate

Grade Raw score Yearly

growth

Standard

deviation

Standardized

growth rate

Standardized

growth

0 0 5 0

1 5 5 5 1 1

2 10 5 5 1 2

3 15 5 5 1 3

4 20 5 5 1 4

5 25 5 5 1 5

6 30 5 5 1 6

7 35 5 5 1 7

8 40 5 5 1 8

9 45 5 5 1 9

10 50 5 5 1 10

11 55 5 5 1 11

12 60 5 5 1 12

Fig. 24. Average raw score and standard deviation for test ‘‘S’’ (count dots in Fig. 23).

Page 32: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Table 4

Test ‘‘C’’: Raw score growth slows down with grade

Grade Raw score Yearly

growth

Standard

deviation

Standardized

growth rate

Standardized

growth

K 0.0 7.4 0

1 7.2 7.2 7.0 1 1

2 14.0 6.8 6.6 1 2

3 20.3 6.4 6.2 1 3

4 26.3 6.0 5.8 1 4

5 31.9 5.6 5.4 1 5

6 37.1 5.2 5.0 1 6

7 41.9 4.8 4.6 1 7

8 46.3 4.4 4.2 1 8

9 50.4 4.0 3.8 1 9

10 54.0 3.6 3.4 1 10

11 57.2 3.2 3.0 1 11

12 60.0 2.8 2.6 1 12

Fig. 25. Mean and SD for test ‘‘C.’’

32 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Tables 3–5 and Fig. 27 show the computation and plots of standardized growth

for the three types of tests. All three growths and SDs are the same; the process of

standardization irons out the apparent differences among the three types of tests

to reveal the true (test independent) growth of the trait.

Proof of the equivalence of classical and IRT growth functions

We connect true score T of classical theory and ability hðCAÞ of item responsetheory, which are functions of two variables CA and z:

T ðCA; zÞ ¼ TavðCAÞ þ rT ðCAÞzhðCA; zÞ ¼ havðCAÞ þ rhðCAÞz:

ðA:5Þ

Page 33: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

Fig. 27. Standardized growth for all three tests is the same.

Table 5

Test ‘‘I’’: Growth speeds up with grade

Grade Raw score Yearly

growth

Standard de-

viation

Standardized

growth rate

Standardized

growth

0 0.0 2.6 0

1 2.8 2.8 3.0 1 1

2 6.1 3.2 3.4 1 2

3 9.7 3.6 3.8 1 3

4 13.7 4.0 4.2 1 4

5 18.1 4.4 4.6 1 5

6 22.9 4.8 5.0 1 6

7 28.1 5.2 5.4 1 7

8 33.7 5.6 5.8 1 8

9 39.7 6.0 6.2 1 9

10 46.1 6.4 6.6 1 10

11 52.9 6.8 7.0 1 11

12 60.0 7.2 7.4 1 12

Fig. 26. Mean and SD for test ‘‘I.’’

W. Lichten / Developmental Review xxx (2004) xxx–xxx 33

ARTICLE IN PRESS

Page 34: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

34 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

We consider the number-right score for a given test as a function of ability, the test

characteristic function fðhÞ. That we can write f as a function of a single variable his a consequence of the usual assumption of IRT that the test items are unidi-

mensional.

The dispute as to whether mental tests are unidimensional is nearly a century old,goes back to Spearman and Thurstone, and is still current. But, as discussed in this

paper, Wilks� theorem implies that correlations between different tests approach

unity as test lengths become infinite. Thus full scale, long tests may appear to be uni-

dimensional, even if the subtest contents are multidimensional. Yen (1985) found ev-

idence of multidimensionality in an achievement test, especially in the mathematics

subtests. However, this occurred in the high school grades, which age range was

ruled out for the present paper on the basis of lack of lawful behavior.

We return to the argument and consider the standard growth rate of the testcharacteristic function:

G:R:ðfÞ ¼ dfavrfdðCAÞ : ðA:6Þ

We take differentials

df ¼ dfdh

dh: ðA:7Þ

which imply

rf ¼dfdh

rh: ðA:8Þ

We also have, by the chain rule, the relation

dfavdðCAÞ ¼

dfdh

dhavdðCAÞ : ðA:9Þ

Combining the last two equations, we obtain the standard growth rate for f:

G:R:ðfÞ ¼ 1

rf

dfavdðCAÞ ¼

1

rh

dhavdðCAÞ ¼ G:R:ðhÞ: ðA:10Þ

Denoting by J and H the corresponding total growth between two ages CA0 and

CA, we have the expressions

J ðCAÞ � J ðCA0Þ ¼Z CA

CA0

1

rf

dfavdðCAÞ dðCAÞ ¼

Z CA

CA0

1

rh

dhavdðCAÞ dðCAÞ

¼ HðCAÞ �HðCA0Þ: ðA:11Þ

The growth function found from the number-right score for any test formed from

a unidimensional set of test items will be the same as the growth function of the

ability variable itself.

A corollary is that any transformation xðhÞ of the ability variable has the same

growth function as h itself. We merely rename the transform f and follow the

proof immediately above. Thus the growth function obtained from Yen�s standard-

Page 35: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 35

ARTICLE IN PRESS

ized differences, whether formed from the classical test score or from the IRT abil-

ity variable, forms an invariant measure of mental ability. This gives a theoretical

basis for the agreement between standardized growth and item analysis shown in

Table 1.

References

Allport, G.W. (1942). The use of psychological documents in psychological science. New York: Social

Science Research Council, Bulletin 49.

Anderson, J. E. (1939). The limitations of infant and preschool tests in the measurement of intelligence.

Journal of Psychology, 8, 351–379.

Atkins vs Virginia (2002). US Supreme Court decision 00-8452.

Bayley, N. (1954). Some increasing parent–child similarities during the growth of children. Journal of

Educational Psychology, 45, 1–21.

Bayley, N. (1965). Comparison of mental and motor test scores for ages 1–15 months by sex, birth order,

race, geographical location and education of parents. Child Development, 36, 379–411.

Bayley, N. (1993). Bayley scales of infant development. Manual (2nd ed.). San Antonio, TX: Psychological

Corporation.

Binet, A., Simon, T. (1916). The development of intelligence in children (The Binet–Simon scale)

(Translated by E. Kite). Baltimore, MD: Williams and Wilkins (reprinted by Ayer, Salem, NH).

Bloom, B. S. (1964). Stability and change in human characteristics. New York: Wiley.

Bock, R. D. (1983). The mental growth curve reexamined. In D. Weiss (Ed.), New horizons in testing.

Latent trait test theory and computerized adaptive testing (pp. 205–218). New York: Academic Press.

Boring, E. G. (1923). Intelligence as the tests test. The New Republic (June 6), 35–37.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-

multimethod matrix. Psychological Bulletin, 56, 81–105.

Chugani, H. (1994). Development of regional brain glucose metabolism in relation to behavior and

plasticity. In G. Dawsom & K. W. Fischer (Eds.),Human behavior and the developing brain. New York:

Guilford Press.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Coleman, J.S., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfeld, F., York, R. (1966).

Equality of educational opportunity (Washington, DC: US Department of Health, Education and

Welfare OE-38001, US Government Printing Office, Catalog number FS 5.328.38001).

CTB/McGraw-Hill (1987). California achievement tests. Forms E and F. Levels 10–20. Technical report.

Table 49. (CTB/McGraw Hill, Monterey, CA).

Engen, T. (1971). Psychophysics. In J. W. Kling & L. A. Riggs (Eds.), Woodworth & Schlosberg�sexperimental psychology (pp. 11–86). New York: Holt, Rinehart & Winston.

Epstein, H. (1979). Growth spurts during brain development: Implications for educational policy. In J. S.

Chall & A. F. Mirsky (Eds.), 1978 Yearbook of the national society for the study of education (pp. 343–

370). Chicago: University of Chicago Press.

Farkas, G. (2001). Family linguistic culture and social reproduction: Verbal skill from parent to child in

the preschool and school years. Paper presented at the session on Consequences of child poverty and

deprivation, at the Annual Meeting of the Population Association of America, Washington, DC,

March 31. Available http://www.pop.psu.edu/~farkas/paa301.pdf.

Farkas, G., Beron, K. (in press). The detailed age trajectory of oral vocabulary knowledge: Differences by

class and race. Social Science Research.

Flanagan, J. C. (1951). Units, scores and norms. In E. F. Lindquist (Ed.), Educational measurement (pp.

695–763). Washington: American Council on Education.

Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932–1978. Psychological Bulletin, 95, 29–

50.

Page 36: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

36 W. Lichten / Developmental Review xxx (2004) xxx–xxx

ARTICLE IN PRESS

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin,

101, 171–191.

Furfey, P. H., & Muehlenbein, J. (1932). The validity of infant intelligence tests. Journal of Genetic

Psychology, 40, 219–223.

Garber, H. (1988). The Milwaukee project. Preventing mental retardation in children at risk. Washington:

American Association on Mental Retardation.

Gardner, H. (1983). Frames of mind. The theory of multiple intelligences. New York: Basic Books.

Gesell, A. (1928). Infancy and human growth. New York: MacMillan.

Gesell, A., & Ilg, F. L. (1943). Infant and child in the culture of today. New York: Harper.

Greenhouse, L. (2002). The Supreme Court: The death penalty; citing �national consensus,� justices bar

death penalty for retarded defendants. New York Times. June 21, A1.

Gridley, B. E., & Roid, G. H. (1998). The use of the WISC-III with achievement tests. In A. Profitera & D.

Saklofske (Eds.), WISC III (pp. 249–288). San Diego, CA: Academic Press.

Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum (reprint of 1950 book).

Hart, B., & Risley, T. (1995). Meaningful differences in the everyday experience of young American children.

Baltimore: Paul Brookes.

Heinis, H. (1924). La loi du d�eveloppement mental. Archives de Psychologie, 19, 97–127.

Jencks, C.& Phillips, M. (Eds.). (1998). The black–white test score gap. Washington: Brookings.

Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational

Review, 39, 1–123.

Jensen, A. R. (1973). Educability and group differences. New York: Harper & Row.

Jensen, A. R. (1993). Psychometric g and achievement. In B. R. Gifford (Ed.), Policy perspectives on

educational testing (pp. 117–227). Boston: Kluwer.

Jensen, A. R. (1998). The g Factor. The science of mental ability. Westport, CT: Praeger.

Keats, J. A. (1982). Ability measures and theories of cognitive development. In H. A. Wainer & S. Messick

(Eds.), Principals of modern psychological measurement: A festschrift for Frederic M. Lord. Hillsdale,

NJ: Erlbaum.

Kolen, M. J., & Brennan, R. L. (1995). Test equating. Methods and practice. New York: Springer.

Layzer, D. (1972). Science or superstition. A physical scientist looks at the IQ controversy. Cognition:

International Journal of Cognitive Science, 1, 265–299.

Lichten, W. (1993). The big bang model of the growth of intelligence. Proceedings and abstracts of the annual

meeting of the Eastern Psychological Association (p. 66). Arlington, VA: Eastern Psychological

Association (unpublished).

Lichten, W. (1996). The big bang model of the growth of intelligence. Confirmation. Proceedings and

abstracts of the annual meeting of the Eastern Psychological Association (p. 86). Washington, DC:

Eastern Psychological Association (unpublished).

Lichten, W. (2002). Are all men created equal?Unpublished paper delivered at the New Orleans meeting of

the American Psychological Society.

Lichten, W., Wainer, H. (2004). IQ: A matter of life and death. Paper given at the American Psychological

Society, 16th Annual Convention, Chicago, IL May 25–30.

Licklider, J. C. R. (1951). Basic correlates of the auditory stimulus. In S. S. Stevens (Ed.), Handbook of

experimental psychology (1st ed., pp. 985–1039). New York: Wiley.

Luce, R. D., Bush, R. R., & Galanter, E. (1963). Psychological scaling. In R. D. Luce, R. R. Bush,

& E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 245–307). New York:

Wiley.

Luce, R. D., & Krumhansl, C. L. (1988). Measurement, scaling and psychophysics. In R. C. Atkinson, R.

J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens� handbook of experimental psychology (2nd ed.,

pp. 3–74). New York: Wiley.

Luce, R. D., & Suppes, P. (2002). Representational measurement theory. In H. Pasler & J. Wixted (Eds.),

Stevens� handbook of experimental psychology (4, (3rd ed., pp. 1–41). New York: Wiley.

McCall, R. B., Eichorn, D. H., & Hagerty, P. S. (1977). Transitions in early development. Monographs of

the Society for Research in Child Development (Ser. No. 171, 42, No. 3).

Neisser, U. (Ed.). (1998). The rising curve: Long-term gains in IQ and related measures. Washington:

American Psychological Association.

Page 37: On the law of intelligence - Yale University copy.pdf · On the law of intelligence ... Terman and Merrill (1937) ... A test so scaled does not pretend to measure intelligence

W. Lichten / Developmental Review xxx (2004) xxx–xxx 37

ARTICLE IN PRESS

Piaget, J., & Inhelder, B. (1969). The psychology of the child. New York: Basic.

Salovey, P., & Mayer, J. D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9, 185–

211.

Scammon, R. E. (1930). The measurement of the body in childhood. In J. A. Harris, C. M. Jackson, D. G.

Paterson, & R. E. Scammon (Eds.), The measurement of man (pp. 173–215). Minneapolis: University of

Minnesota.

Schulz, E. M., & Nicewander, W. A. (1997). Grade equivalent and IRT representations of growth. Journal

of Educational Measurement, 34, 315–331.

Spearman, C. (1927). The abilities of man: Their nature and measurement. New York: MacMillan.

Stern, W. (1914). The psychological methods of testing intelligence (translated by G. Whipple). Baltimore:

Warwick & York.

Sternberg, R. (1997). The concept of intelligence and its role in lifelong learning and success. American

Psychologist, 52, 1030–1037.

Sternberg, R. (2000). Theme-park psychology: A case study regarding human intelligence and its

implications for education. Educational Psychology Review, 12, 247–268.

Stevens, S. S. (1951). Mathematics, measurement and psychophysics. In S. S. Stevens (Ed.), Handbook Of

experimental psychology (1st ed.). New York: Wiley.

Stevens, S. S. (1975). In G. Stevens (Ed.), Psychophysics: Introduction to its perceptual, neural, and social

prospects. New York: Wiley.

Stevens, S. S., & Davis, H. (1938). Hearing. Its psychology and physiology. New York: Wiley.

Suppes, P., & Zinnes, J. L. (1963). Basic measurement theory. In R. D. Luce, R. R. Bush, & E. Galanter

(Eds.), Handbook of mathematical psychology (Vol. 1, pp. 1–76). New York: Wiley.

Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton Mifflin.

Terman, L. M., & Merrill, M. A. (1937). Measuring intelligence: A guide to the administration of the new

revised Stanford–Binet tests of intelligence. Boston: Houghton-Mifflin.

Thorndike, E.L., Bregman, E.O., Cobb, M.V., Woodward, E. (1927). The measurement of intelligence.

New York: Bureau of publications, Teachers College, Columbia University (Reprint Edition 1973,

Arno Press).

Thurlow, W. R. (1971). Audition. In J. W. Kling & L. A. Riggs (Eds.), Woodworth & Schlosberg�sexperimental psychology (pp. 223–272). New York: Holt, Rinehart & Winston.

Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational

Psychology, 16, 433–451.

Thurstone, E. L. (1928). The absolute zero in intelligence measurement. Psychological Review, 35, 175–

197.

Thurstone, E. L., & Ackerson, L. (1929). The mental growth curve for the Binet tests. Journal of

Educational Psychology, 20, 569–583.

Torrance, E. P. (1988). The nature of creativity as manifest in its testing. In R. J. Sternberg (Ed.), The

nature of creativity. Contemporary psychological perspectives (pp. 43–75). New York: Cambridge

University Press.

Torrance, E. P., & Goff, K. (1989). A quiet revolution. Journal of Creative Behavior, 23, 136–145.

Tukey, J. W. (1969). Analyzing data. Sanctification or detective work? American Psychologist, 24, 83–89.

Wechsler, D. (1939). The measurement of adult intelligence. Baltimore: Williams & Wilkens.

Wilks, S. S. (1938). Weighting systems for linear functions of correlated variables when there is no

dependent variable. Psychometrika, 3, 23–40.

Wohlwill, J. F. (1973). The study of behavioral development. New York: Academic.

Woodworth, R. S., & Schlosberg, H. (1954). Experimental psychology. New York: Holt, Rinehart &

Winston.

Yen, W. M. (1985). Increasing item complexity: A possible cause of scale shrinkage for unidimensional

item response theory. Psychometrika, 50, 399–410.

Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of

Educational Measurement, 23, 299–325.


Top Related