an approach to the psychology of …rca.ucsd.edu/selected_papers/an approach to the psychology...

13
Psychological Bulletin 1972, Vol. 78, No. 1, 49-61 AN APPROACH TO THE PSYCHOLOGY OF INSTRUCTION R. C. ATKINSON» AND J. A. PAULSON Stanford University The relationship between a theory of learning and a theory of instruction is dis- cussed. Examples are presented that illustrate how to proceed from a theoretical description of the learning process to the specification of an optimal strategy for carrying out instruction. The examples deal with fairly simple learning tasks and are admittedly of limited generality. Nevertheless, they clearly define the steps necessary for deriving and testing instructional strategies, thereby providing a set of procedures for analyzing more complex problems. The parameter-dependent optimization strategies are of particular importance because they take into account individual differences among learners as well as differences in difficulty among curriculum units. Experimental evaluations indicate that the parameter-dependent strategies lead to major gains in learning, when compared with strategies that do not take individual differences into account. The task of relating the methods and findings of research in the behavioral sciences to the problems of education is a continuing concern of both psychologists and educators. A few years ago, when our faith in the ability of money and science to cure social ills was at its peak, an educational researcher could content himself with trying to answer the same ques- tions that were being studied by his psycho- logist colleagues. The essential difference was that his studies referred explicitly to educa- tional settings, whereas those undertaken by psychologists strived for greater theoretical generality. There was implicit confidence that as the body of behavioral research grew, applications to education would occur in the natural course of events. When these applica- tions failed to materialize, confidence was shaken. Clearly, something essential was miss- ing from educational research. A number of factors contributed to the feeling that something was wrong with business as usual. Substantial curriculum changes, initiated on a national scale after the Soviet's launching of Sputnik, had to be carried out with only minimal guidance from behavioral scientists. Developers of programmed learning 1 An earlier version of this paper was presented by the first author at a conference on "The Use of Com- puters in Education" organized by the Japanese Ministry of Education in collaboration with the Organization for Economic Cooperation and Develop- ment in Tokyo, July 1970. Requests for reprints should be sent to R. C. Atkinson, Department of Psychology, Stanford Uni- versity, Stanford, California 94305. and computer-assisted instruction faced similar problems. Although the literature in learning theory was perhaps more relevant to their concerns, the questions it treated were still not the critical ones from the viewpoint of instruc- tion. This situation would not have been sur- prising had the study of learning been in its infancy. But far from that, the psychology of learning had a long and impressive history. An extensive body of experimental literature existed, and many simple learning processes were being described with surprising precision using mathematical models. Whatever was wrong, it did not seem to be a lack of scientific sophistication. These issues were on the minds of those who contributed to the 1964 Yearbook of the National Society for the Study of Education, edited by Hilgard (1964). In that book Bruner (1964) summarized the feelings of many of the con- tributors when he called for a theory of in- struction, which he sharply distinguished from a theory of learning. He emphasized that where the latter is essentially descriptive, the former should be prescriptive, setting forth rules specifying the most effective ways of achieving knowledge or mastering skills. This distinction served to highlight the difference in the goals of experiments designed to advance the two kinds of theory. In many instances variations in instructional procedures affect several psy- chological variables simultaneously. Experi- ments that are appropriate for comparing methods of instruction may be virtually impossible to interpret in terms of learning 49

Upload: vudien

Post on 20-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Psychological Bulletin1972, Vol. 78, No. 1, 49-61

AN APPROACH TO THE PSYCHOLOGY OF INSTRUCTION

R. C. ATKINSON» AND J. A. PAULSON

Stanford University

The relationship between a theory of learning and a theory of instruction is dis-cussed. Examples are presented that illustrate how to proceed from a theoreticaldescription of the learning process to the specification of an optimal strategy forcarrying out instruction. The examples deal with fairly simple learning tasks andare admittedly of limited generality. Nevertheless, they clearly define the stepsnecessary for deriving and testing instructional strategies, thereby providing aset of procedures for analyzing more complex problems. The parameter-dependentoptimization strategies are of particular importance because they take into accountindividual differences among learners as well as differences in difficulty amongcurriculum units. Experimental evaluations indicate that the parameter-dependentstrategies lead to major gains in learning, when compared with strategies that donot take individual differences into account.

The task of relating the methods and findingsof research in the behavioral sciences to theproblems of education is a continuing concernof both psychologists and educators. A fewyears ago, when our faith in the ability ofmoney and science to cure social ills was at itspeak, an educational researcher could contenthimself with trying to answer the same ques-tions that were being studied by his psycho-logist colleagues. The essential difference wasthat his studies referred explicitly to educa-tional settings, whereas those undertaken bypsychologists strived for greater theoreticalgenerality. There was implicit confidence thatas the body of behavioral research grew,applications to education would occur in thenatural course of events. When these applica-tions failed to materialize, confidence wasshaken. Clearly, something essential was miss-ing from educational research.

A number of factors contributed to thefeeling that something was wrong with businessas usual. Substantial curriculum changes,initiated on a national scale after the Soviet'slaunching of Sputnik, had to be carried outwith only minimal guidance from behavioralscientists. Developers of programmed learning

1 An earlier version of this paper was presented bythe first author at a conference on "The Use of Com-puters in Education" organized by the JapaneseMinistry of Education in collaboration with theOrganization for Economic Cooperation and Develop-ment in Tokyo, July 1970.

Requests for reprints should be sent to R. C.Atkinson, Department of Psychology, Stanford Uni-versity, Stanford, California 94305.

and computer-assisted instruction faced similarproblems. Although the literature in learningtheory was perhaps more relevant to theirconcerns, the questions it treated were still notthe critical ones from the viewpoint of instruc-tion. This situation would not have been sur-prising had the study of learning been in itsinfancy. But far from that, the psychology oflearning had a long and impressive history. Anextensive body of experimental literatureexisted, and many simple learning processeswere being described with surprising precisionusing mathematical models. Whatever waswrong, it did not seem to be a lack of scientificsophistication.

These issues were on the minds of those whocontributed to the 1964 Yearbook of the NationalSociety for the Study of Education, edited byHilgard (1964). In that book Bruner (1964)summarized the feelings of many of the con-tributors when he called for a theory of in-struction, which he sharply distinguished froma theory of learning. He emphasized that wherethe latter is essentially descriptive, the formershould be prescriptive, setting forth rulesspecifying the most effective ways of achievingknowledge or mastering skills. This distinctionserved to highlight the difference in the goalsof experiments designed to advance the twokinds of theory. In many instances variationsin instructional procedures affect several psy-chological variables simultaneously. Experi-ments that are appropriate for comparingmethods of instruction may be virtuallyimpossible to interpret in terms of learning

49

50 R. C. ATKINSON AND J. A. PAULSON

theory because of this confounding of variables.The importance of developing a theory ofinstruction justifies experimental programs de-signed to explore alternative instructionalprocedures, even if the resulting experimentsare difficult to place in a learning-theoreticframework.

The task of going from a description of thelearning process to a prescription for optimizinglearning must be clearly distinguished from thetask of finding the appropriate theoreticaldescription in the first place. However, there isa danger that preoccupation with finding pre-scriptions for instruction may cause us tooverlook the critical interplay between the twoenterprises. Developments in control theory(Bellman, 1961) and statistical decision theory(Raiffa & Schlaiffer, 1968) provide potentially-powerful methods for discovering optimaldecision-making strategies in a wide variety ofcontexts. In order to use these tools it isnecessary to have a reasonable model of theprocess to be optimized. As noted earlier, somelearning processes can already be describedwith the required degree of accuracy. Thisarticle examines an approach to the psychologyof instruction which is appropriate when thelearning is governed by such a process.

STEPS IN THE DEVELOPMENT OF OPTIMALINSTRUCTIONAL STRATEGIES

The development of optimal strategies canbe broken down into a number of tasks thatinvolve both descriptive and normative analy-ses. One task requires that the instructionalproblem be stated in a form amenable to adecision-theoretic analysis. While the detailedformulations of decision problems vary widelyfrom field to field, the same formal elementscan be found in most of them. It will be a usefulstarting point to identify these elements in thecontext of an instructional situation.

The formal elements of a decision problemthat must be specified are:

1. The possible states of nature;2. The actions that the decision maker can

take to transform the state of nature;3. The transformation of the state of nature

that results from each action;4. The cost of each action;

5. The return resulting from each state ofnature.

Statistical aspects occur in a decision problemwhen uncertainty is associated with one ormore of these elements. For example, the stateof nature may be imperfectly observable orthe transformation of the state of nature whicha given action will cause may not be completelypredictable.

In the context of the psychology of instruc-tion, most of these elements divide naturallyinto two groups, those having to do with thedescription of the underlying learning processand those specifying the cost-benefit dimen-sions of the problem. The one element thatdoes not fit is the specification of the set ofactions from which the decision maker mustmake his choice. The nature of this element canbe indicated by an example.

Suppose one wants to design a supplementalprogram of exercises for an initial readingprogram. Most reasonable programs of initialreading instruction include both training insight-word identification and training inphonics. Let us assume that on the basis ofexperimentation two useful exercise formatshave been developed, one for training on sightwords, the other for phonics. Given theseformats, there are many ways to design anoverall program. A variety of optimizationproblems can be generated by fixing somefeatures of the design and leaving the others tobe determined in a theoretically optimalmanner. For example, it may be desirable todetermine how the time available for instruc-tion should be divided between phonics andsight-word recognition, with all other featuresof the design fixed. A more complicated ques-tion would be to determine the optimal order-ing of the two types of exercises in addition tothe optimal allocation of time. It would beeasy to continue generating different optimiza-tion problems in this manner. The point isthat varying the set of actions from which thedecision maker is free to choose changes thedecision problem, even though the otherelements remain the same.

For the decision problems that arise ininstruction it is usually natural to identifythe states of nature with learning states of thestudent. Specifying the transformation of the

PSYCHOLOGY OF INSTRUCTION 51

states of nature caused by the actions of thedecision maker is tantamount to constructinga model of learning for the situation underconsideration.

The role of costs and returns is more formalthan substantive for the class of decision prob-lems considered in this article. The specificationof costs and returns in instructional situationstends to be straightforward when examined ona short-time basis, but virtually intractableover the long term. In the short term, one canassign costs and returns for the mastery of,say, certain basic reading skills, but sophisti-cated determinations for the long-term valueof these skills to the individual and society aredifficult to make. There is an important rolefor detailed economic analysis of the long-termimpact of education, but such studies deal withissues at a more global level than we require.In this article analysis is limited to those costsand returns directly related to the specificinstructional task being considered.

After a problem has been formulated in away amenable to decision-theoretic analysis,the next step is to derive the optimal strategyfor the learning model which best describes thesituation. If more than one learning modelseems reasonable a priori, then competingcandidates for the optimal strategy can bededuced. When these steps have been accom-plished, an experiment can be designed todetermine which strategy is best.

There are several possible directions in whichto proceed after the initial comparison ofstrategies, depending on the results of theexperiment. If none of the supposedly optimalstrategies produces satisfactory results, thenfurther experimental analysis of the assump-tions of the underlying learning models isindicated. New issues may arise even if one ofthe procedures is successful. In one case thatwe discuss, the successful strategy producedan usually high error rate during learning,which is contrary to a widely accepted principleof programmed instruction. When anomaliessuch as this occur, they suggest new lines ofexperimental inquiry, and often require areformulation of the axioms of the learningmodel. The learning model may have providedan excellent account of data for a range ofexperimental conditions but can prove totallyinadequate in an optimization condition where

special features of the procedure magnify in-accuracies of the model that had previouslygone undetected.

AN OPTIMIZATION PROBLEM THAT ARISES INCOMPUTER-ASSISTED INSTRUCTION

One application of computer-assisted instruc-tion that has proved to be very effective in theprimary grades involves a regular program ofpractice and review specifically designed tocomplement the efforts of the classroomteacher (Atkinson, 1969). Some of the cur-riculum materials in such programs take theform of lists of instructional units or items.The objective of the computer-assisted instruc-tion programs is to teach students the correctresponse to each item in a given list. Typically,a sublist of items is presented each day in oneor more fixed exercise formats. The optimiza-tion problem that arises concerns the selectionof items for presentation on a given day.

The Stanford Reading Project is an exampleof such a program in initial reading instruction(Atkinson & Fletcher, 1972). The vocabulariesof several of the commonly used basal readerswere compiled into one dictionary, and avariety of exercises using these words were de-veloped to teach reading skills. These exerciseswere designed principally to strengthen thestudent's decoding skills, with special emphasison letter identification, sight-word recognition,phonics, spelling patterns, and word compre-hension. The details of the teaching procedurevary from one exercise to another, but mostinclude a sequence in which a curriculum itemis presented, eliciting a response from thestudent, followed by a short period for study-ing the correct response. For example, oneexercise in sight-word recognition has thefollowing format :

Teletype DisplayNUT MEN RED

Audio MessageType red.

Three words are printed on the teletype,followed by an audio presentation of one of thewords. Control is then turned over to thestudent; if he types the correct word a rein-forcing message is given, and the computerprogram then proceeds to the next presenta-tion. If the student responds incorrectly orexceeds the time, the teletype prints the

52 R. C. ATKINSON AND J. A. PAULSON

correct word simultaneously with its audiopresentation and then moves to the nextpresentation. Under an early version of theprogram, items were presented in predeter-mined sublists, with an exercise continuing ona sublist until a specified criterion has been met.

Strategies can be found that will improve onthe fixed order of presentation. Two studiesdescribed below are concerned with the devel-opment of such strategies. One study examinesalternative presentation strategies for teachingspelling words to elementary school children,and the other examines strategies for teachingSwahili vocabulary items to college-levelstudents. The optimization problems in bothstudies were essentially the same. A list of Nitems is to be learned, and a fixed number ofdays, D, are allocated for its study. On eachday a sublist of items is presented for test andstudy. The sublist always involves M items,and each is presented only once for testfollowed by a study period. The total set ofN items is extremely large with regard to anysublist of M items. Once the experimenter hasspecified a sublist for a given day, its order ofpresentation is random. After the D days ofstudy are completed, a posttest is given overall items. The parameters N, D, and M arefixed, and so is the instructional format on eachday. Within these constraints the problem isto maximize performance on a posttest by anappropriate selection of sublists from day today. The strategy for selecting sublists fromday to day is dynamic (or response sensitive,using the terminology of Groen & Atkinson,1966) to the extent that it depends upon thestudent's prior history of performance.

Three Models of the Learning Process

Two extremely simple learning models areconsidered first. Then a third model, whichcombines features of the first two, is described.

In the first model, the state of the learnerwith respect to each item is completely deter-mined by the number of times the item hasbeen studied. At the start of the experimentan item has some initial probability of error;each time the item is presented its error prob-ability is reduced by a factor a, which is lessthan one. Stated as a difference equation, theprobability of an error on the n + 1st presenta-tion of an item is related to its probability on

the wth presentation as follows:

= aqn. [1]

Note that the error probability for a given itemdepends on the number of times it has beenreduced by the factor a; that is, the number oftimes it has been presented. Learning is thegradual reduction in the probability of error byrepeated presentations of items. This model issometimes called the linear model because theequation describing change in response prob-ability is linear.

In the second model, mastery of an item isnot at all gradual. At any point in time astudent is in one of two states with respect toeach item: the learned state or the unlearnedstate. If an item in the learned state is pre-sented, the correct response is always given;if an item is in the unlearned state, an incorrectresponse is given unless the student makes acorrect response by guessing. When an un-learned item is presented, it may move intothe learned state with probability c. Stated asa difference equation,

_ f <?», with probability 1 — c ,- -.1 { 0, with probability c.

Once an item is learned, it remains in thelearned state throughout the course of instruc-tion. Some items are learned the first time theyare presented; others may be presented severaltimes before they are finally learned. There-fore, the list as a whole is learned gradually.But for any particular item, the transition fromthe unlearned to the learned state occurs on asingle trial. The model is sometimes called theall-or-none model because of this characteriza-tion of the possible states of learning.

The third model to be considered is therandom-trial increments model and representsa compromise between the linear and all-or-none model (Norman, 1964). For this model:

_ ( qn, with probability 1 — c r -,\aqn, with probability c.

If c = 1, the random-trial increments modelreduces to the linear model; if a = 0, it reducesto the all-or-none model. However, for c < 1and a > 0, the random-trial increments modelgenerates predictions that are quite distinctfrom both the linear and the all-or-none models.

PSYCHOLOGY OF INSTRUCTION 53

For all three models the probability of anerror on the first trial is a parameter that mayneed to be estimated in certain situations; toemphasize this point the initial error prob-ability is written as q' henceforth. It should benoted that the all-or-none model and therandom-trial increments model are responsesensitive in that the learner's particular historyof correct and incorrect responses makes adifference in predicting performance on thenext presentation of an item. In contrast, thelinear model is response insensitive; its predic-tion depends only on the number of priorpresentations and is not improved by a know-ledge of the learner's response history.

Cost/Benefit Structure

At the present level of analysis, it willexpedite matters if some assumptions are madeto simplify the appraisal of costs and benefitsassociated with various strategies. It is tacitlyassumed that the subject matter being taughtis sufficiently beneficial to justify allocating afixed amount of time to it for instruction. Sincethe exercise formats and the time allocated toinstruction are the same for all strategies, it isreasonable to assume that the costs of instruc-tion are the same for all strategies as well. Ifthe costs of instruction are equal for allstrategies, then for purposes of comparisonthey may be ignored and attention focused onthe comparative benefits of the variousstrategies. This is an important simplificationbecause it affects the degree of precision neces-sary in the assessment of costs and benefits. Ifboth costs and benefits are significantly vari-able in a problem, then it is essential that bothquantities be estimated accurately. This isoften difficult to do. When one of these quan-tities can be ignored, it suffices if the other canbe assessed accurately enough to order thepossible outcomes. This is usually fairly easyto accomplish. In the present problem, forexample, it is reasonable to consider all theitems equally beneficial. This implies thatbenefits depend only on the overall probabilityof a correct response, not on the particularitems known. It turns out that this specificationof cost and benefit is sufficient for the learningmodels to determine optimal strategies.

The above cost/benefit assumptions permitus to concentrate on the main concern of this

article, the derivation of the educationalimplications of learning models. Also, they areapproximately valid in many instructionalcontexts. Nevertheless, it must be recognizedthat in the majority of cases these assumptionswill not be satisfied. For instance, the assump-tion that the alternative strategies cost thesame to Implement usually does not hold. Itonly holds as a first approximation in the casebeing considered here. In the present formula-tion of the problem, a fixed amount of time isallocated for study and the problem is tomaximize learning, subject to this time con-straint. An alternative formulation that ismore appropriate in some situations fixes aminimum criterion level for learning. In thisformulation, the problem is to find a strategyfor achieving this criterion level of performancein the shortest time. As a rule, both costs andbenefits must be weighed in the analysis, andfrequently subtopics within a curriculum varysignificantly in their importance. Sometimesthere is a choice among several exerciseformats. In certain cases, whether or not acertain topic should be taught at all is thecritical question. Smallwood (1971) has treateda problem similar to the one considered in thisarticle in a way that includes some of thesefactors in the structure of costs and benefits.

Deducing Strategies from the LearningModels

Optimal strategies can be deduced for thelinear and all-or-none models under the as-sumption that all items have the same learningparameters and initial error probabilities. Thesituation is more complicated in the case ofthe random-trial increments model. An ap-proximation to the optimal strategy for therandom-trial increments case will be discussedlater; in this case the strategy explicitlyallows for individual differences in parametervalues.

For the linear model, if an item has beenpresented n times, the probability of an erroron the next presentation of the item is an~lq'\when the item is presented, the error prob-ability is reduced to anq'. The size of thereduction is thus an^(i — a)q'. Observe thatthe size of the decrement in error probabilitygets smaller with each presentation of the item.This observation can be used to deduce that

54 R. C. ATKINSON AND J. A. PAULSON

the following procedure is optimal:

On a given day, form the sublist of M items by selectingthose items that have received the fewest presentationsup to that point. If more than M items satisfy thiscriterion, then select items at random from the setsatisfying the criterion.

Upon examination, this strategy is seen tobe equivalent to the standard cyclic presenta-tion procedure commonly employed in experi-ments on paired-associate learning. It amountsto presenting all items once, randomly reorder-ing them, presenting them again, and repeatingthe procedure until the number of days allo-cated to instruction have been exhausted.

According to the all-or-none model, once anitem has been learned there is no further reasonto present it. Since all unlearned items areequally likely to be learned if presented, it isintuitively reasonable that the optimal pre-sentation strategy selects the item least likelyto be in the learned state for presentation. Inorder to discover a good index of the likelihoodof being in the learned state, consider astudent's response protocol for a single item.If the last response was incorrect, the item wascertainly in the unlearned state at that time,although it may then have been learned duringthe study period that immediately followed.If the last response was correct, then it is morelikely that the item is now in the learned state.In general, the more correct responses there arein the protocol since the last error on the item,the more likely it is that the item is in thelearned state.

The preceding observations provide a heu-ristic justification for an algorithm whichKarush and Dear (1966) have proved is in factthe optimal strategy for the all-or-none model.The optimal strategy requires that for eachstudent a bank of counters be set up, one foreach word in the list. To start, M differentitems are presented each day until each itemhas been presented once and a 0 has beenentered in its counter. On all subsequent days,the strategy requires that we conform to thefollowing two rules:

1. Whenever an item is presented, increase itscounter by 1 if the subject's response is correct, butreset it to 0 if the response is incorrect.

2. Present the M items whose counters are lowestamong all items. If more than M items arc eligible, thenselect randomly as many items as are needed to com-plete the sublist of size M from those having the same

highest counter reading, having selected all items withlower counter values.

For example, suppose six items are presentedeach da)', and after a given day a certainstudent has four items whose counters are 0,four whose counters are 1, and higher valuesfor the rest of the counters. His study list wouldconsist of the four items whose counters are 0,and two items selected at random from thefour whose counters are 1.

It has been possible to find relatively simpleoptimal strategies for the linear and all-or-nonemodels. It is noteworthy that neither strategydepends on the values of the parameters of therespective models (i.e., on a, c, or </'). Anotherexceptional feature of these two models is thatit is possible to condense a student's responseprotocol to one index per item without losingany information relevant to presentation deci-sions. Such condensations of response protocolsare referred to as sufficient histories (Groen &Atkinson, 1966). Roughly speaking, an indexsummarizing the information in a student'sresponse protocol is a sufficient history if anyadditional information from the protocol wouldbe redundant in the determination of thestudent's state of learning. The concept isanalogous to a sufficient statistic. If one takes asample of observations from a population withan underlying normal distribution and wishesto estimate the population mean, the samplemean is a sufficient statistic. Other statisticsthat can be calculated (such as the median,the range, and the standard deviation) cannotbe used to improve on the sample mean as anestimate of the population mean, though theymay be useful in assessing the precision of theestimate. In statistics, whether or not data canbe summarized by a few simple sufficientstatistics is determined by the nature of theunderlying distribution. For educational ap-plications, whether or not a given instructionalprocess can be adequately monitored by asimple sufficient history is determined by themodel representing the underlying learningprocess.

The random-trial increments model appearsto be an example of a process for which theinformation in the subject's response protocolcannot be condensed into a simple sufficienthistory. It is also a model for which the optimalstrategy depends on the values of the model

PSYCHOLOGY OF INSTRUCTION 55

parameters. Consequently, it is not possibleto state a simple algorithm for the optimalpresentation strategy for this model. Sufficeit to say that there is an easily computableformula for determining which item has thebest expected immediate gain, if presented.The strategy that presents this item should bea reasonable approximation to the optimalstrategy (Calfee, 1970). More is said laterregarding the problem of parameter estimationand some of its ramifications.

If the three models under consideration areto be ranked on the basis of their ability toaccount for data from laboratory experimentsemploying the standard presentation proce-dure, the order of preference is clear. Theall-or-none model provides a better account ofthe data than the linear model, and therandom-trial increments model is better thaneither of them (Atkinson & Crothers, 1964).This does not necessarily imply, however, thatthe optimization strategies derived from thesemodels will receive the same ranking. Thestandard cyclic presentation procedure used inmost learning experiments may mask certaindeficiencies in the all-or-none or random-trialincrements models which would manifest them-selves when the optimal presentation strategyspecified by one or the other of these modelswas employed.2

2 This type of result was obtained by Dear, Silberman,Estavan, and Atkinson (1967). They used the all-or-none model to generate optimal presentation scheduleswhere there were no constraints on the number of timesa given item could be presented for test and studywithin an instructional period. Under these conditionsthe model generates an optimal strategy that has a highprobability of repeating the same item over and overagain until a string of correct responses occurs. Intheir experiment the all-or-none strategy proved quiteunsatisfactory when compared with the standard pre-sentation schedule. The problem was that the all-or-none model provides an accurate account of learningwhen the items are well spaced, but fails badly underhighly massed conditions. Laboratory experiments priorto the Dear et al. (1967) study had not employed amassing procedure, and this particular deficiency of theall-or-none model had not been made apparent. Theimportant remark here is that the analysis of instruc-tional problems can provide important information inthe development of learning models. In certain casesthe set of laboratory tasks that the psychologist dealswith may be such that it fails to uncover that particularcondition which would cause the model to fail. Byanalyzing optimal learning conditions we are imposinga somewhat different test on a learning model, whichmay provide a more sensitive measure of its adequacy.

EVALUATION OF THE ALL-OR-NONE STRATEGY

Lorton3 compared the all-or-none strategywith the standard procedure in an experimentin computer-assisted spelling instruction withelementary school children. The former strat-egy is optimal if the learning process is indeedall-or-none, whereas the latter is optimal ifthe process is linear. The experiment was onephase of the Stanford Reading Project usingcomputer facilities at Stanford Universitylinked via telephone lines to student terminalsin the schools.

Individual lists of 48 words were compiledin an extensive pretest program to guaranteethat each student would be studying words ofapproximately equal difficulty, which he didnot already know how to spell. A within-sub-jects design was used in an effort to make thecomparison of strategies as sensitive as possible.Each student's individualized list of 48 wordswas used to form two comparable lists of 24words, one to be taught using the all-or-nonestrategy and the other using the standardprocedure.

Each day a student was given training on 16words, 8 from the list for standard presentationand 8 from the list for presentation accordingto the all-or-none strategy. There were 24training sessions followed by 3 days for testingall the words; approximately 2 weeks later 3more days were spent on a delayed retentiontest. Using this procedure, all words in thestandard presentation list received exactly onepresentation in successive 3-day blocks duringtraining. Words in the list presented accordingto the all-or-none algorithm received from zeroto three presentations in successive 3-dayblocks during training, with one presentationbeing the average. A flow chart of the dailyroutine is given in Figure 1.

The results of the experiment are sum-marized in Figure 2. The proportions of correctresponses are plotted for successive 3-dayblocks during training, followed by the firstoverall test and then the 2-week delayed test.Note that during training the proportion cor-rect is always lower for the all-or-none pro-cedure than for the standard procedure, but on

3 P. Lorton, Jr. Computer-based instruction inspelling: An investigation of optimal strategies forpresenting instructional material. Unpublished doctoraldissertation, Stanford University, 1972.

56 R. C. ATKINSON AND J. A. PAULSON

LfSSON IMPLEMENTATIONPROGRAM

LESSON OPTIMIZATIONPROGRAM

PRINT- X

( AND CORRECT SPELLING )

PRINT

SYSTEM PREPARESTO PRESENTNEXT WORD

TRANSFER DATATO LESSONOPTIMIZATION PROGRAM

. _l

FIG. 1. Daily list presentation routine.

both the final test and the retention test theproportion correct is greater for the all-or-nonestrategy. Analysis of variance tests verifiedthat these results are statistically significant.The advantage of approximately 10 percentagepoints on the posttests for the all-or-noneprocedure is of practical significance as well.

The observed pattern of results is exactlywhat would be predicted if the all-or-nonemodel does indeed describe the learning pro-cess. As was shown earlier, final test per-formance should be better when the all-or-noneoptimization strategy is adopted as opposedto the standard procedure. Also the greaterproportion of error for this strategy duringtraining is to be expected. The all-or-nonestrategy presents the items least likely to be in

the learned state, so it is natural that moreerrors would be made during training.

TEST OF A PARAMETER-DEPENDENT STRATEGY

As noted earlier, the strategy derived for theall-or-none model in the case of homogeneousitems does not depend on the actual values ofthe model parameters. In many situationseither the assumptions of the all-or-none modelor the assumption of homogeneous items orboth are seriously violated, so it is necessary toconsider strategies based on more generalmodels. Laubsch4 considered the optimization

4 J. H. Laubsch. An adaptive teaching system foroptimal item allocation. Unpublished doctoral disserta-tion, Stanford University, 1969.

PSYCHOLOGY OF INSTRUCTION 57

1.0

.9

oQ_

LUcr

o 6UJ

8 - 5u.o

_DO .3<CD

g.2a.

1 I I

STANDARD PROCEDURE

ALL-OR-NONE STRATEGY

I

I 2 3 4 5 6 7 8SUCCESSIVE 3-DAY TRAINING BLOCKS

INITIAL DELAYEDPOST- POST-TEST TEST

FIG. 2. Probability of correct response in Lorton's experiment.

problem for cases where the random-trialincrements model is appropriate. He madewhat is perhaps a more significant departurefrom the assumptions of the all-or-none strat-egy by allowing the parameters of the model tovary with students and items. The followingdiscussion is based upon Laubsch's work, butintroduces a more satisfactory formulation ofindividual differences. This change and theestimation of initial condition parameters pro-duce experimental measures of the effectivenessof optimization procedures that are signifi-cantly greater than those reported by Laubsch.

It is not difficult to derive an approximationto the optimal strategy for the random-trialincrements model that can accommodatestudent and item differences in parametervalues, if these parameters are known. Sinceparameter values must be specified in orderto make the necessary calculations to deter-mine the optimal study list, it makes littledifference whether these numbers are fixed orvary with students and items. However,making estimates of these parameter valuesin the heterogeneous case presents somedifficulties.

When the parameters of a model are homo-geneous, it is possible to pool data from

different subjects and items to obtain preciseestimates. Estimates based on a sample ofstudents and items can be used to predict theperformance of other students or the samestudents on other items. When the parametersare heterogeneous, these advantages no longerexist unless variations in the parameter valuestake some known form. For this reason it isnecessary to formulate a model stating thecomposition of each parameter in terms of asubject and item component.

Let T.-y be a generic symbol for a parametercharacterizing student i and item j. Anexample of the kind of relationship desired isa fixed-effects Subjects X Items analysis ofvariance model:

E(itij) = m + a, + dj £41

where m is the mean, a,- is the ability of studenti, and dj is the difficulty of item j. Because thelearning model parameters we are interestedin are probabilities, the above assumption ofadditivity is not met; that is, there is noguarantee that Equation 4 would yield esti-mates bounded between 0 and 1. But there is atransformation of the parameter that circum-vents this difficulty. In the present context,

58 R. C. ATKINSON AND J. A. PAULSON

this transformation has an interesting intuitivejustification.

Instead of thinking directly in terms of theparameter in,-, it is helpful to think in terms ofthe odds ratio Try/1 — inj. Allow two assump-tions: (a) the odds ratio is proportional tostudent ability; (6) the odds ratio is inverselyproportional to item difficulty. This can beexpressed algebraically as

1-<*»•

K^'(tj[S]

where K is a proportionality constant. Takinglogarithms on both sides yields

log1 Try

= log K + log a, — log dj. [6]

The logarithm of the odds ratio is usuallyreferred to as the "logit." Let log K = M,log a,- = A i, and —log dj — Dj. Then Equation6 becomes

logit Try = M + Ai + [7]

Thus, the two assumptions made above leadto an additive model for the values of theparameters transformed by the logit function.Equation 7, by defining a subject-item param-eter Try in terms of a subject parameter Atapplying to all items and an item parameterDj applying to all subjects, significantly reducesthe number of parameters to be estimated. Ifthere are N items and S subjects, then themodel requires only N + 5 parameters tospecify the learning parameters for N X Ssubject items. More importantly, it makes itpossible to predict a student's performance onitems he has not been exposed to from theperformance of other students on them. Thisformulation of learning parameters is essen-tially the same as the treatment of an analogousproblem in item analysis given by Rasch(1966). Discussion of this and related modelsfor problems in mental test theory is givenby Birnbaum (1968).

Given data from an experiment, Equation 7can be used to obtain reasonable parameterestimates, even though the parameters varywith students and items. The parameters ?ryare first estimated for each student-itemprotocol, yielding a set of initial estimates.

Next the logistic transformation is applied tothese initial estimates, and then using thesevalues subject and item effects (At and D3) areestimated by standard analysis of varianceprocedures. The estimates of student and itemeffects are used to adjust the estimate of eachtransformed student-item parameter, which inturn is transformed back to obtain the final esti-mate of the original student-item parameter.

The first students in an instructional pro-gram that employs a parameter-dependentoptimization scheme like the one outlinedabove do not benefit maximally from theprogram's sensitivity to individual differencesin students and items; the reason is that theinitial parameter estimates must be based onthe data from these students. As more andmore students complete the program, estimatesof the DJ& becomes more precise until finallythey may be regarded as known constants ofthe system. When this point has been reached,the only task remaining is to estimate A{ foreach new student entering the program. Sincethe Dft are known, the estimates of Try for anew student are of the right order, althoughthey may be systematically high or low untilthe student component can be accuratelyassessed.

Parameter-dependent optimization programswith the adaptive character just described arepotentially of great importance in long-terminstructional programs. Of interest here is therandom-trial increments model, but the methodof decomposing parameters into student anditem components would apply to other modelsas well. We turn now to an experimental testof the adaptive optimization program based onthe random-trial increments model. In thiscase the parameters a, c, and q' of the random-trial increments model were separated intoitem and subject components following thelogic of Equation 7. That is, the parameters forsubject i working on item j were defined asfollows:

logit ctij = /*<«> + A

logit en = M ( C ) + Alogit q'n = M

(9 / ) +

+

[8]

Note that AiM, Ai(c\ andyl,-'9'' are measuresof the ability of subject i and hold for all

PSYCHOLOGY OF INSTRUCTION 59

items, whereas ZV°> DjM, and A''3'' aremeasures of the difficulty of item j and holdfor all subjects.

The instructional program was designed toteach 300 Swahili vocabulary items to college-level students. Two presentation strategieswere employed: (a) the all-or-none procedureand (b) the adaptive optimization procedurebased on the random-trial increments model.As in the Lorton study, a within-subjectsdesign was employed in order to provide asensitive comparison of the strategies. For eachstudent two sublists of 150 items were formedat random from the master list; instruction onitems from one sublist was governed by theall-or-none strategy and by the adaptiveoptimization strategy for the other sublist.Each day a student was tested on and studied100 items presented in a random order; 50items were from the all-or-none sublist chosenusing the all-or-none strategy, and 50 from theadaptive optimization list selected accordingto that strategy. A Swahili word would bedisplayed, and the student was required togive its English translation. Reinforcementconsisted of a printout of the correct Swahili-English pair. Twenty such training sessionswere involved, each lasting for approximately1 hour. Two or three days after the last trainingsession an initial posttest was administeredover all 300 items; a delayed posttest was givenapproximately 2 weeks later.

The lesson optimization program for therandom-trial increments model was more com-plex than those described earlier. Each nightthe response data for that day were enteredinto the system and used to update parameterestimates; in this case an exact record of thecomplete presentation sequence and responsehistory had to be preserved. A computer-basedsearch algorithm was used to estimate param-eters and thus the more accurate the previousday's estimates, the more rapid was the searchfor the updated parameter values. Once up-dated estimates had been obtained, they wereentered into the optimization program to selectindividual lists for each student to be run thenext day. Early in the experiment (beforeestimates of DM, D(c), and D(q>) had stabi-lized) the computation time was fairly lengthy,but it rapidly decreased as more data accumu-

AL L - OR - NONE S TRA TEG Y

ADAPTIVE STRATEGY

INITIALPOST-TEST

DELAYEDPOST-TEST

FIG. 3. Posttest performance for the all-or-nonestrategy and for the parameter-dependent strategy ofthe random-trial increments model.

lated, and the system homed in on preciseestimates of item difficulty.

Figure 3 presents the final test results andindicates that for both the initial and delayedposttests the parameter-dependent strategyof the random-trial increments model wasmarkedly superior to the all-or-none strategy;on the initial posttest the relative improvementwas 41% and 67% on the delayed posttest. Itis apparent that the parameter-dependentstrategy was more sensitive than the all-or-none strategy in identifying and presentingthose items that would benefit most fromadditional study. Another feature of the experi-ment was that students were run in successivegroups, each starting about one week after theprior group. As the theory would predict, theoverall gains produced by the parameter-dependent strategy increased from one groupto the next. The reason is that early in theexperiment estimates of item difficulty werecrude, but improved with each successive wave

60 R. C. ATKINSON AND J. A. PAULSON

of students. Near the end of the experimentestimates of item difficulty were quite exact,and the only task that remained when a newstudent came on the system was to estimatehis particular A M, A (c), and A («° values.

CONCLUDING REMARKS

The studies reported here illustrate oneapproach that can contribute to the develop-ment of a theory of instruction. This is not tosuggest that the strategies they tested representa complete solution to the problem of optimalitem selection. The models upon which thesestrategies are based ignore several potentiallyimportant factors, such as short-term memoryeffects, interitem relationships, and motivation.Undoubtedly, strategies based on learningmodels that take some of these variables intoaccount would be superior to those analyzedso far.

The task and learning models considered inthis article are extremely simple and of re-stricted generality; nevertheless, there are atleast two reasons for studying them. First,this type of task occurs in many fields ofinstruction and needs to be understood in itsown right. No matter what the pedagogicalorientation, it is hard to conceive of an initialreading program or foreign language coursethat does not involve some form of list-learningactivity. In this respect it should be noted thatthe parameter-dependent strategy of therandom-trial increments model has been incor-porated into several parts of the Stanfordcomputer-assisted instruction program in initialreading; although no formal evaluation has yetbeen made, the strategy appears to be highlyeffective.

There is a second reason for the type ofanalysis reported here. By making a carefulstudy of a few cases that can be understoodin detail, it is possible to develop prototypicalprocedures for analyzing more complex opti-mization problems. At present, analyses com-parable to those reported here cannot be madefor many problems of central interest to educa-tion, but by having examples of the above sortit is possible to specify with more clarity thesteps involved in devising optimal procedures.Three aspects need to be emphasized: (a) thedevelopment of an adequate description of the

learning process, (b) the assessment of costsand benefits associated with possible instruc-tional actions and states of learning, and (c)the derivation of optimal strategies based onthe goals set for the student. The examplesconsidered here deal with each of these factorsand point out the issues that arise.

It has become fashionable in recent years tocriticize learning theorists for ignoring theprescriptive aspects of instruction, and somehave argued that efforts devoted to thelaboratory analysis of learning should beredirected to the study of learning as it occursin real-life situations. These criticisms are notentirely unjustified for in practice psychologistshave too narrowly defined the field of learning,but to focus all effort on the study of complexinstructional tasks would be a mistake. Somesuccesses might be achieved, but, in the longrun, understanding complex learning situationsmust depend upon a detailed analysis of theelementary perceptual and cognitive processesthat make up the human information handlingsystem. The trend to press for relevance oflearning theory is healthy, but if the surge inthis direction goes too far, we will end up witha massive set of prescriptive rules and notheory to integrate them.

It needs to be emphasized that the inter-pretation of complex phenomena is prob-lematical, even in the best of circumstances.The case of hydrodynamics is a good examplefor it is one of the most highly developedbranches of theoretical physics. Differentialequations expressing certain basic hydro-dynamic relationships were formulated byEuler in the eighteenth century. Special casesof these equations sufficed to account for awide variety of experimental data. Thesesuccesses prompted Lagrange to assert thatthe success would be universal were it not forthe difficulty in integrating Euler's equationsin particular cases. Lagrange's view is stillwidely held, in spite of numerous experimentsyielding anomalous results. Euler's equationshave been integrated in many cases, and the re-sults were found to disagree dramatically withobservation, thus contradicting Lagrange'sassertion. The problems involve more thanmere fine points, and raise serious paradoxeswhen extrapolations are made from resultsobtained in the laboratory to actual conditions.

PSYCHOLOGY OF INSTRUCTION 61

The following quotation from Birkhoff (1960)should strike a sympathetic chord among thosetrying to relate psychology and education:

These paradoxes have been the subject of manywitticisms. Thus, it has been said that in the nineteenthcentury, fluid dynamicists were divided into hydraulicengineers who observed what could not be explained,and mathematicians who explained things that couldnot be observed. It is my impression that manysurvivors of both species are still with us [p. 26].

Research on learning appears to be in asimilar state. Educational researchers are con-cerned with experiments that cannot be readilyinterpreted in terms of learning theoreticconcepts, while psychologists continue todevelop theories that seem to be applicableonly to the phenomena of the laboratory.Hopefully, work of the sort described here willbridge this gap and help lay the foundationsfor a theory of instruction.

REFERENCES

ATKINSON, R. C. Computer-assisted learning in action.Proceedings of the National A cademy of Sciences, 1969,63, 588-594.

ATKINSON, R. C., & CROTHERS, E. J. A comparison ofpaired-associate learning models having differentacquisition and retention axioms. Journal oj Mathe-matical Psychology, 1964, 2, 285-315.

ATKINSON, R. C. & FLETCHER, J. D. Teaching childrento read with a computer. The Reading Teacher, 1972,25, 319-327.

BELLMAN, R. Adaptive control processes. Princeton:Princeton University Press, 1961.

BIRKHOFF, G. Hydrodynamics: A study in logic, fad,and similitude. (2nd ed.) Princeton: Princeton Uni-versity Press, 1960.

BIRNBAUM, A. Some latent trait models and their usein inferring an examinee's ability. In F. M. Lord &M. R. Novick (Eds.), Statistical theories of mentaltest scores. Reading, Mass.: Addison-Wesley, 1968.

BRUNER, J. S. Some theorems on instruction statedwith reference to mathematics. In E. R. Hilgard(Ed.), Theories of learning and instruction: The Sixty-Third Yearbook of the National Society for the Studyof Education. Chicago: University of Chicago Press,1964.

CALFEE, R. C. The role of mathematical models inoptimizing instruction. Scientia: Revue Internationalede Synthese Scientifique, 1970, 105, 1-25.

DEAR, R. E., SILBERMAN, H. F., ESTAVAN, D. P., &Atkinson, R. C. An optimal strategy for the presenta-tion of paired-associate items. Behavioral Science,1967,12,1-13.

GROEN, G. J., & ATKINSON, R. C. Models for optimizingthe learning process. Psychological Bulletin, 1966, 66,309-320.

HILGARD, E. R. (Ed.) Theories of learning and instruc-tion: The Sixty-Third Yearbook of the NationalSociety for the Study of Education. Chicago: Universityof Chicago Press, 1964.

KARUSH, W., & DEAR, R. E. Optimal stimulus presenta-tion strategy for a stimulus sampling model of learn-ing. Journal of Mathematical Psychology, 1966, 3,19-47.

NORMAN, M. F. Incremental learning on random trials.Journal of Mathematical Psychology, 1964, 2, 336-350.

RAIFFA, H., & SCHLAIFFER, R. Applied statisticaldecision theory. Cambridge, Mass.: M.I.T. Press,1968.

RASCH, G. An individualistic approach to item analysis.In P. F. Lazarsfeld & N. W. Henry (Eds.), Readingsin mathematical social science. Chicago: Science Re-search Associates, 1966.

SMAIXWOOD, R. D. The analysis of economic teachingstrategies for a simple learning model. Journal ofMathematical Psychology, 1971, 8, 285-301.

(Received October 15, 1970)