bulletin 1966, vol. 66, no, 4, 309-320 models for ...rca.ucsd.edu/selected_papers/models for...

12
Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING THE LEARNING PROCESS 1 G. J. GROEN AND R. C. ATKINSON Stanford University This paper is concerned with showing how certain instructional problems can be reformulated as problems in the mathematical theory of optimization. A com- mon instructional paradigm is outlined and a notational system is proposed which allows the paradigm to be restated as a multistage decision process with an explicit mathematical learning model embedded within it. The notion of an optimal stimulus presentation strategy is introduced and some prob- lems involved in determining such a strategy are discussed. A brief descrip- tion of dynamic programming is used to illustrate how optimal strategies might be discovered in practical situations. Although the experimental work in the field of programmed instruction has been quite extensive, it has not yielded much in the way of unequivocal results. For example, Silber- man (1962), in a summary of 80 studies dealing with experimental manipulations of instructional programs, found that 48 failed to obtain a significant difference among treatment comparisons. When significant dif- ferences were obtained, they seldom agreed with findings of other studies on the same problem. The equivocal nature of these re- sults is symptomatic of a deeper problem that exists not only in the field of programmed instruction but in other areas of educational research. An instructional program is usually devised in the hope that it optimizes learning ac- cording to some suitable criterion. However, in the absence of a well-defined theory, grave difficulties exist in interpreting the results of experiments designed to evaluate the pro- gram. Usually the only hypothesis tested is that the program is better than programs with different characteristics. In the absence of a theoretical notion of why the program tested should be optimal, it is almost impos- sible to formulate alternative hypotheses in the face of inconclusive or contradictory re- sults. Another consequence of an atheoretical approach is that it is difficult to predict the magnitude of the difference between two experimental treatments. If the difference is 1 Support for this research was provided by the National Aeronautics and Space Administration, Grant No. NGR-05-020-036, and by the Office of Education, Grant No. OES-10-050. small, then it may not turn out to be sig- nificant when a statistical test is applied. However, as Lumsdaine (1963) has pointed out, lack of significance is often interpreted as negative evidence. What appears to be missing, then, is a theory that will predict the conditions under which an instructional procedure optimizes learning. A theory of this type has recently come to be called a theory of instruction. It has been pointed out by several authors (e.g., articles in Gage, 1963, and Hilgard, 1964), that one of the chief problems in educational research has been a lack of theories of instruction. Bruner (1964) has characterized a theory of instruction as a theory that sets forth rules concerning the most efficient way of achieving knowledge or skill; these rules should be derivable from a more general view of learning. However, Bruner made a sharp distinction between a theory of learning and a theory of instruc- tion. A theory of learning is concerned with describing learning. A theory of instruction is concerned with prescribing how learning can be improved. Among other things, it pre- scribes the most effective sequence in which to present the materials to be learned and the nature and pacing of reinforcement. While the notion of a theory of instruc- tion is relatively new, optimization problems exist in many other areas and have been extensively studied in a mathematical fashion. Within psychology, the most prominent ex- ample is the work of Cronbach and Gleser (1965) on the problem of optimal test selec- tion for personnel decisions. Outside psy- 309

Upload: vophuc

Post on 15-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

Psychological Bulletin1966, Vol. 66, No, 4, 309-320

MODELS FOR OPTIMIZING THE LEARNING PROCESS1

G. J. GROEN AND R. C. ATKINSON

Stanford University

This paper is concerned with showing how certain instructional problems can bereformulated as problems in the mathematical theory of optimization. A com-mon instructional paradigm is outlined and a notational system is proposedwhich allows the paradigm to be restated as a multistage decision processwith an explicit mathematical learning model embedded within it. The notionof an optimal stimulus presentation strategy is introduced and some prob-lems involved in determining such a strategy are discussed. A brief descrip-tion of dynamic programming is used to illustrate how optimal strategies mightbe discovered in practical situations.

Although the experimental work in the fieldof programmed instruction has been quiteextensive, it has not yielded much in the wayof unequivocal results. For example, Silber-man (1962), in a summary of 80 studiesdealing with experimental manipulations ofinstructional programs, found that 48 failedto obtain a significant difference amongtreatment comparisons. When significant dif-ferences were obtained, they seldom agreedwith findings of other studies on the sameproblem. The equivocal nature of these re-sults is symptomatic of a deeper problem thatexists not only in the field of programmedinstruction but in other areas of educationalresearch.

An instructional program is usually devisedin the hope that it optimizes learning ac-cording to some suitable criterion. However,in the absence of a well-defined theory, gravedifficulties exist in interpreting the results ofexperiments designed to evaluate the pro-gram. Usually the only hypothesis tested isthat the program is better than programswith different characteristics. In the absenceof a theoretical notion of why the programtested should be optimal, it is almost impos-sible to formulate alternative hypotheses inthe face of inconclusive or contradictory re-sults. Another consequence of an atheoreticalapproach is that it is difficult to predict themagnitude of the difference between twoexperimental treatments. If the difference is

1 Support for this research was provided by theNational Aeronautics and Space Administration,Grant No. NGR-05-020-036, and by the Office ofEducation, Grant No. OES-10-050.

small, then it may not turn out to be sig-nificant when a statistical test is applied.However, as Lumsdaine (1963) has pointedout, lack of significance is often interpretedas negative evidence.

What appears to be missing, then, is atheory that will predict the conditions underwhich an instructional procedure optimizeslearning. A theory of this type has recentlycome to be called a theory of instruction. Ithas been pointed out by several authors(e.g., articles in Gage, 1963, and Hilgard,1964), that one of the chief problems ineducational research has been a lack oftheories of instruction. Bruner (1964) hascharacterized a theory of instruction as atheory that sets forth rules concerning themost efficient way of achieving knowledgeor skill; these rules should be derivable froma more general view of learning. However,Bruner made a sharp distinction between atheory of learning and a theory of instruc-tion. A theory of learning is concerned withdescribing learning. A theory of instruction isconcerned with prescribing how learning canbe improved. Among other things, it pre-scribes the most effective sequence in whichto present the materials to be learned andthe nature and pacing of reinforcement.

While the notion of a theory of instruc-tion is relatively new, optimization problemsexist in many other areas and have beenextensively studied in a mathematical fashion.Within psychology, the most prominent ex-ample is the work of Cronbach and Gleser(1965) on the problem of optimal test selec-tion for personnel decisions. Outside psy-

309

Page 2: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

310 G. J. GROEN AND R. C. ATKINSON

chology, optimization problems occupy animportant place in areas as diverse as physics,electrical engineering, economics, and opera-tions research. Despite the fact that thespecific problems vary widely from one fieldto another, several mathematical techniqueshave been developed that can be applied toa broad variety of optimization problems.

The purpose of this paper is to indicatehow one of these techniques, dynamic pro-gramming, can be utilized in the developmentof theories of instruction. Dynamic program-ming was developed by Bellman and hisassociates (Bellman, 1957, 1961, 1965;Bellman & Dreyfus, 1962) for the solutionof a class of problems called multistagedecision processes. Broadly speaking, theseare processes in which decisions are madesequentially, and decisions made early in theprocess affect decisions made subsequently.

We will begin by formalizing the notionof a theory of instruction and indicating howit can be viewed as a multistage decisionprocess. This formalization will allow us togive a precise definition of the optimizationproblem that arises. We will then considerin a general fashion how dynamic program-ming techniques can be used to solve thisproblem. Although we will use some specificoptimization models as illustrations, our mainaim will be to outline some of the obstaclesthat stand in the way of the development ofa quantitative theory of instruction andindicate how they might be overcome.

Multistage Instructional Models

The type of multistage process of greatestrelevance to the purposes of this paper is theso-called discrete N-stage process. This proc-ess is concerned with the behavior of asystem that can be characterized at any giventime as being in State w. This state may beunivariate but is more generally multivariateand hence is often called a state vector (thetwo terms will be used interchangeably). Thestate of this system is determined by a setof decisions. In particular, every time adecision d (which may also be multivariate)is made, the state of the system is trans-formed. The new state is determined by bothd and w and will be denoted by T(w,d). Theprocess consists of N successive stages. At

each of the first N — 1 stages, a decision d ismade. The last stage is a terminal stage inwhich no decision is made. The process canbe viewed as proceeding in the followingfashion: Assume that, at the beginning ofthe first stage, the system is in state WL Aninitial decision d\ is made. The result is anew state w2 given by the relation:

w2 = T(w2,di).

We are now in the second stage of the proc-ess, so a second decision d2 is made, resultingin a new state ws determined by the relation

WB = T(w2,d2).

The process continues in this way untilfinally:

If each choice of d determines a uniquenew state, T(w,d), then the process is de-terministic. It is possible, however, that thenew state is probabilistically related to theprevious state. In this nondeterministic case,it is also necessary to specify for each Stage /a probability distribution Pr(a>«|wt_i,dj_i).

In a deterministic process, each sequenceof decisions di, d2, . . . dN-\ and states w\, w2,. . . , WN has associated with it a function thathas been termed the criterion or return func-tion. This function can be viewed as definingthe utility of the sequence of decisions. Theoptimization problem one must solve is tofind a sequence of decisions that maximizestheir criterion function. The optimizationproblem for a nondeterministic process issimilar except that the return function issome suitable type of expectation.

In order to indicate how an instructionalprocess can be considered as an N-stagedecision process, a fairly general instructionalparadigm will be introduced. This paradigmis based on the type of process that isencountered in computer-based instruction(Atkinson & Hansen, 1967). In instruc-tion of this type, a computer is pro-grammed to decide what will be presented tothe student next. The decision procedure,which is given in diagrammatic form inFigure 1, is based on the past stimulus-response history of the student. It should benoted that this paradigm contains, as special

Page 3: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

MODELS FOR OPTIMIZING THE LEARNING PROCESS 311

cases, all other programmed instructionaltechniques currently in vogue. It may also cor-respond to the behavior of a teacher makinguse of a well-defined decision procedure.

It will be assumed that the objective ofthe instructional procedure is to teach a setof concepts and that the instructional systemhas available to it a set of stimulus materialsregarding these concepts. A stage of theprocess will be defined as being initiatedwhen a decision is made regarding whichconcept is to be presented and terminatedwhen the history file is updated with theoutcome of the decision. In order to com-pletely define the instructional system weneed to define:

1. The set S of all possible stimuluspresentations.

2. The set A of all possible responses thatcan be made by the student.

3. The set H of histories. An element ofH need not be a complete history of the stu-dent's behavior. It may only be a summary.

4. A function 8 of H onto S. This definesthe decision procedure used by the systemto determine the stimulus presentation on thebasis of the history.

5. A function jw, of S X A X H onto H.This function updates the history.

Thus, at the beginning of Stage i, the historycan be viewed as being in State ht. A decisionis then made to present Si = &(ht), a responseflj is made to jj and the state of the systemis updated to hi+1 = ^(s^di,^).

In a system such as this, the stimulus set Sis generally predetermined by the objectivesof one's instructional procedure. For example,if the objective is to teach a foreign languagevocabulary, then S might consist of a set ofwords from the language. The response set Ais, to a great extent, similarly predetermined.Although there may be some choice regardingthe actual response mode utilized (e.g.,multiple choice vs. constructed response), thisproblem will not be considered here. Theobjectives of the instructional procedure alsodetermine some criterion of optimality. Forexample, in our vocabulary example thismight be the student's performance on a testgiven at the end of a learning session. Theoptimization problem that is the main concern

START INSTRUCTIONAL SESSION

INITIALIZE THE STUDENT'SHISTORY FOR THIS SESSION

DETERMINE, ON THE BASIS OF THE CURRENT HISTORY,WHICH STIMULUS IS TO BE PRESENTED NEXT

PRESENT STIMULUS TO STUDENT

RECORD STUDENTS RESPONSE

UPDATE HISTORY BY ENTERINGTHE LAST STIMULUS AND RESPONSE

HAS STAGE N OF THE PROCESS BEEN REACHED 7 J

TERMINATE INSTRUCTIONAL SESSION

FIG. 1. Flow diagram for an instructional system.

of this paper is to find a suitable decisionprocedure for deciding which stimulus st topresent at each stage of the process, giventhat S, A, and the optimality criterion arespecified in advance. Such a decision pro-cedure is called a strategy. It is determinedby the set of possible histories H, the de-cision function 8, and the updating function /x.

For a particular student, the stimulus pre-sented at a given stage and the response thestudent makes to that stimulus can be viewedas the observable outcome of that stage ofthe process. For an W-stage process, thesequence (slt at, st, a« , . . . , sN-i, as-i) of out-comes at each stage can be viewed as theoutcome of the process. The set of all possibleoutcomes of an instructional procedure can berepresented as a tree with branch points oc-curring at each stage for each possible stimu-lus presentation and each possible response toa stimulus. An example of such a tree is givenin Figure 2 for the first two stages of aprocess with stimulus presentations, s, s' andtwo responses a, a'.

The most complete history would contain,at the beginning of each stage, a completeaccount of the outcome of the procedure upto that stage. Thus, hi would consist of some

Page 4: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

312 6. J. GROEti AND R. C.

sequence (si, di, s2, a 2 , . . . , $*-i, fli-i). Ideally,one could then construct a decision function8 which specified, for each possible outcome,the appropriate stimulus presentation s«. How-ever, two problems emerge. The first is thatthe number of outcomes increases rapidly asa function of the number of stages. For ex-ample, at the tenth stage of a process such asthat outlined in Figure 2, we would have 410

outcomes. The specification of a unique de-cision for each outcome would clearly be aprohibitively lengthy procedure. As a result,any practical procedure must classify thepossible outcomes in such a way as to reducethe size of the history space. Apart from theproblem of the large number of possible out-comes, one is also faced with the problemthat many procedures do not store as muchinformation as others. For example, in alinear program in which all students are runin lockstep, it is not possible to make use ofinformation regarding the student's responses.

In general, instructional systems may beclassified into two types: those that makeuse of the student's response history in theirstage-by-stage decisions, and those that donot. The resulting strategies may be termedresponse insensitive and response sensitive. Aresponse insensitive strategy can be specifiedby a sequence of stimulus presentations(si, Sz, • • •, SN-I). A response sensitive strat-egy can be represented by a subtree of thetree of possible outcomes. An example isgiven in Figure 2. There are two chief reasons

STAGEI

STAGE IZ I

III[ q q q q q a q q q o q q q

FIG. 2. Tree diagram for the first two stages of aprocess with two stimuli s and s', and two responsesa and a'. (The dotted lines enclose the subtree gener-ated by a possible response sensitive strategy.)

for making this distinction. The first isthat response insensitive strategies are lesscomplicated to derive. The second is thatresponse insensitive strategies can be com-pletely specified in advance and s6 do notrequire a system capable of branching duringan actual instructional session.

While this broad classification will beuseful in the ensuing discussion, it is impor-tant to note that several types of historyspaces are possible within each class of strat-egy. Even if the physical constraints of thesystem are such that only response insensitivestrategies can be considered, it is possibleto define many ways of "counting" stimuluspresentations. The most obvious way is todefine the history at Stage * as the numberof times each stimulus has been presented.A more complicated procedure (which mightbe important in cases where stimuli wereeasily forgotten) would be to also count foreach stimulus the number of other items thathad been presented since its most recentpresentation.

The discussion up to this point has beenconcerned mainly with the canonical repre-sentation of an instructional system, and adeliberate effort has been made to avoid theo-retical assumptions. While this leads to someinsight into the nature of the optimizationproblem involved, the multistage process can-not be sufficiently well-defined to yield asolution to the optimization problem withoutimposing theoretical assumptions. It will berecalled that, in order to define a multistageprocess, it is necessary to specify a trans-formation WH.I = T(wi,dt) given the state anddecision at Stage i. In order to optimize theprocess, it is also necessary to be able to stateat each stage the effect of a decision uponthe criterion function. The simplest criterionfunction to use is one which depends onlyon the final state of the system. This functionwill be called the terminal return functionand denoted <P(WS). An intuitive interpreta-tion of <p(wN) is the score on a final test.

If the transformation T were deterministic,then the sequence of optimum decisions couldbe determined by enumerating in a tree dia-gram all possible outcomes and computing<p(wy) for each path. A path that maximized<P(WS) would yield a sequence of decisions

Page 5: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

MODELS FO.R OPTIMIZING fttE LEARNING t>&OCES$ 313

corresponding to appropriate nodes of thetree. If T were nondeterministic, then astrategy <j would yield a tree. Each treewould define a probability distribution overthe w and thus an expected terminal return

could be computed for each strategy o-. Ineither case, this process of simple enumera-tion of the possible branches of a tree isimpossible in any practical situation sincetoo many alternative paths exist, even for areasonably small N. The problem of develop-ing a feasible computational procedure willbe discussed in the next section. The problemof immediate concern is the most satisfactoryway of denning the state space W and thetransformations T(wi,di).

At first sight, it would seem that wt couldbe defined as the history at Stage i andT(Wi,dt) as the history updating rule. How-ever, while this might be feasible in caseswhere the history space is either specified inadvance or subject to major constraints, ithas the severe disadvantage that it neces-sitates an ad hoc choice of histories and,without the addition of theoretical assump-tions, it is impossible to compare the ef-fectiveness of different histories. Even if thehistory space is predetermined, such as mightbe the case in a simple linear program whereall a history can do is "count" the occur-rences of each stimulus item, it is neces-sary to make some theoretical assumptionregarding the precise form of <p(wy)-

One way to avoid problems such as this isto introduce theoretical assumptions regard-ing the learning process explicitly in the formof a mathematical model. In the context ofan W-stage process, a learning model con-sists of: (a) a set of learning states Y; (b) ausually nondeterministic response rule whichgives (for each state of learning) the proba-bility of a correct response to a given stimu-lus; and (c) an updating rule which pro-vides a means of determining the new learn-ing state (or distribution of states) that re-sults from the presentation of a stimulus, theresponse the student makes, and the rein-forcement he receives. At the beginning ofStage i of the process, the student's state is

denoted by y«. After Stimulus St is presented,the student makes a response o«, the proba-bility of which is determined by si and y\.The learning state of the student thenchanges to y^i =T(yt,sl). Models of thistype have been found to provide satisfactorydescriptions for a variety of learning phe-nomena in areas such as paired-associatelearning and concept formation. Detailed ac-counts of these models and their fit to em-pirical phenomena are to be found in Atkin-son, Bower, and Crothers (1965), Atkinsonand Estes (1963), and Sternberg (1963).

In the example of the learning of a list ofvocabulary items, the two simplest modelsthat might provide an adequate descriptionof the student's learning process are thesingle-operator linear model (Bush & Stern-berg, 1959) and the one-element model(Bower, 1961; Estes, 1960). In the single-operator linear model, the set Y is theclosed unit interval [0,1]. The states arevalues of response probabilities. Althoughthese probabilities can be estimated fromgroup data, they are unobservable when indi-vidual subjects are considered. If, for a par-ticular stimulus ;', q^ is the probability ofan error at the start of Stage i and that itemis presented, then the new state (i.e., theresponse probability) is given by

qwu> = aqP> (0 < a < 1). [2]

If #iw) is the error probability at the begin-ning of the first stage, then it is easily shownthat

q{* = 9i«a«'-y) [3]

where »4<y> is the number of times Item ;' has

been presented and reinforced prior to Stage*'. The response rule is simply that if a sub-ject is in state gt with respect to a stimulusand that stimulus is presented, then he makesan error with probability q^ This model hastwo important properties. The first is that,although the response rule is nondetermin-istic, the state transformation that occurs asthe result of an item presentation is determin-istic. The second is that the model is responseinsensitive since the same transformation isapplied to the state whether a correct or in-correct response occurs. The only informationthat can be used to predict the state is thenumber of times an item has been presented.

Page 6: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

314 G. J. GROEN AND R. C. ATKINSON

In the one-element model, an item can bein one of two states: a learned state L andan unlearned state L. If an item is in State L,it is always responded to correctly. If it is inState L, it is responded to correctly withprobability g. The rule giving the state trans-formation is nondeterministic. If an item isin State L at the beginning of a stage and ispresented, then it changes its state to L withprobability c (where c remains constantthroughout the procedure). Unlike the linearmodel, the one-element model is responsesensitive. If an error is made in response toan item then that item was in State L at thetime that the response was made. To seehow this fact influences the response proba-bility, it is convenient to introduce the fol-lowing random variable:

Then

1, if an error occurs on the nthpresentation of Item j

0, if a success occurs on the wthpresentation of Item j

Pr(Xn™ = 1) = (1 - g)(l -but

[4]

c). [5]

In contrast, for the single-operator linear model

Pr(X»U> = 1) = Pr(Xn"> = 1|JW = 1)= q^a^-1. [6]

Although these two models cannot be ex-pected to provide the same optimization

STAGEI

STAGE2

STAGE3

scheme in general, they are equivalent whenonly response-insensitive strategies are con-sidered. This is due to the fact that if ais set equal to 1 — c and <?i(/) to 1 — g thenidentical expressions for Pr(xn

{i) — 1) resultfor both models.

With the introduction of models such asthese, the state space W and the transforma-tion T can be defined in terms of some"learning state" of the student. For example,in the case of the linear model and a listof m stimulus items, we can define the stateof the student at Stage i as the w-tuple

/ f\\ (9) (ni) \ r*7 *~|

where g0) denotes the current probability ofmaking an error to item s, and defineT(q,Sj) as the vector obtained by replacingq(1) with aq

(i). This notation is illustratedin Figure 3. If the behavioral optimizationcriterion were a test on the m items admin-istered immediately after Stage N of theprocess, then the return function would bethe expectation of the test score, that is,

where 1 — gvw) is the probability of a cor-rect response at the end of the instructionalprocess. It is not necessary, however, thatW be the actual state space of the model.It may, instead, be more convenient to defineWi as some function of the parameters of themodel. For example, if the process of learn-ing a list of simple items can be described

(.60, .90)

(.60,.22)(.15,.90)

,S2 'I / \S2 'I

/ . ,(.08,.90) (.15,.45) (.15,.45) (.30,.22) (.15,.45) (.30,.22) (.30,.22) ( . 60 , . I I )STAGE

Fio. 3. Outcome tree of response probabilities for linear model with a = .5, g/0 = .6, gi(a) = .9.

Page 7: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

MODELS FOR OPTIMIZING THE LEARNING PROCESS 315

by a one-element model, then w>< can be de-nned as the w-tuple whose ;'th element iseither L or L. However, if one is interestedin some criterion that can be expressed interms of probabilities of the items being inState L, then it may be computationally moreconvenient to consider wi as an w-tuple whose;'th element is the probability that stimulusSj is in State L at the beginning of Stage i.

It is clear from these examples that alearning model can impose a severe con-straint upon the history space in the sensethat information regarding observable out-comes is rendered redundant. For example,if 0i(" is known on a priori grounds (foreach ;'), then the linear model renders theentire response history redundant. This isbecause the response probability of each itemis completely determined by the number oftimes it has been presented. With the one-element model, the nature of the constrainton the history is not immediately clear.

In general, the problem of deciding onan appropriate history, hi, is similar to theproblem of finding an observable statisticthat provides an adequate basis for inferringthe properties of the distribution of states.A desirable property for such a history wouldbe for it to summarize all information con-cerning the state so that no other historywould provide additional information. Ahistory with this property can be called asufficient history. The most appropriate suf-ficient history to use would be that whichwas the most concise.

In the theory of statistical inference, astatistic with an analogous property is calleda sufficient statistic. Since w« is a function ofthe parameters of the model, it would seemreasonable to expect that if a sufficient sta-tistic exists for these parameters, then asufficient history would be some function ofthe sufficient statistic. For a general discus-sion of the role of sufficient statistics inreducing the number of paths that must beconsidered in trees resulting from processessimilar to those considered here, the reader isreferred to Raiffa and Schlaiffer (1961).

Optimization Techniques

Up to now, the only technique we haveconsidered that enables us to find an optimal

strategy is to enumerate every path of thetree generated by the ./V-stage process. Al-though the systematic use of learning modelscan serve to reduce the number of paths thatmust be considered, much too large a numberof paths still remains in most problems wherea large number of different stimuli are used.The main success with a direct approach hasbeen in the case of response insensitivestrategies (Crothers, 1965; Dear, 1964;Dear & Atkinson, 1962; Suppes, 1964). Inthese cases, either the number of stimulustypes is drastically limited or the problemis of a type where the history can be simpli-fied on a priori grounds. The techniques usedin these approaches are too closely connectedwith the specific problem treated and themodels used for any general discussion oftheir merits.

The theory of dynamic programming pro-vides a set of techniques that reduce theportion of a tree that must be searched. Thesetechniques have the merit of being model-free. Moreover, they provide a computationalalgorithm that may be used to discoveroptimal strategies by numerical methods incases where analytic methods are too compli-cated. The first application of dynamic pro-gramming to the design of optimal instruc-tional systems was due to Smallwood (1962).Since then, several other investigators haveapplied dynamic programming techniques toinstructional problems of various types (Dear,1964; Karush & Dear, 1966; Matheson,1964). The results obtained by these investi-gators are too specific to be reviewed indetail. The main aim in this section is toindicate the nature of the techniques andhow they can be applied to instructionalproblems.

Broadly speaking, dynamic programming isa method for finding an optimal strategy bysystematically varying the number of stagesand obtaining an expression which gives thereturn for a process with N stages as a func-tion of the return from a process with N — 1stages. In order to see how this is done, it isnecessary to impose a restriction on the re-turn function and define a property ofoptimal policies. Following Bellman (1961,p. 54), a return function is Markovian if,for any K < N, the effect of the remaining

Page 8: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

316 G. J. GROEN AND R. C. ATKINSON

N — K stages of the ./V-stage process uponthe return depends only upon: (a) the stateof the system at the end of the Kih decision,and (b) whatever subsequent decisions aremade. It is clear that the return function(p(wy) possesses this property. Another typeof return function that possesses this propertyis one of the form:

g(w2,d2)

A return function of this latter form may beimportant when cost as well as final test per-formance is an important criterion in design-ing the system. For example, in a computer-based system, g(wi,di) might be the cost ofusing the computer for the amount of timerequired to make Decision dt and present theappropriate stimulus. Since the expressionsresulting from a function of this form aresomewhat more complicated, we will limit ourattention to return functions of the form<f(wN). However, it should be borne in mindthat essentially the same basic procedures canbe used with the more complicated returnfunction.

If a deterministic decision process has thisMarkovian property, then an optimal strategywill have the property expressed by Bellman(1961, p. 57) in his optimality principle:whatever the initial state and the initialdecision are, the remaining decisions consti-tute an optimal policy with regard to thestate resulting from the first decision. To seehow this principle can be utilized, let fy(w)denote the return from an ./V-stage processwith initial state w if an optimal strategyis used throughout, and let us assume thatT is deterministic and W is discrete. Sincethe process is deterministic, the final state iscompletely determined by the initial state wand the sequence of decisions d\, d2, . . . , ds-i(it should be recalled that no decision takesplace during the last stage). If DN denotesan arbitrary sequence of N — 1 successivedecisions (Di being the empty set), then thefinal state resulting from w and Ds can bewritten as w'(DN,w). The problem that mustbe solved is to find the sequence Ds whichmaximizes <p[w'(Dy,w)}. If such a sequenceexists, then

/y(a>) = max [8]

While the solution of such a problem can beextremely complicated for an arbitrary N, it iseasily shown that for N = 1

fi(w) = [9]

As a result, if a relation can be found that con-nects fi(w) with ft-i(w) for each i< N then/AT(W) can be evaluated recursively by evaluat-ing fi(w) for each i. Suppose that, in ant-stage process, an initial decision d is made.Then w is transformed into a new state,T(w,t£), and the decisions that remain can beviewed as forming an (i — l)-stage processwith initial state T(w,d). The optimalityprinciple implies that the maximum returnfrom the last i — 1 stages will be/i_i[r(w,<2)].Moreover, if Di = (d,d*, . . . ,a!i_i) and A_i= (dz,dz,. • -,di-i) then

rfX (Z>,-,«0] = v[w'(D^,T(w,d)')']. [10]

Suppose that Z)»_i is the optimal strategy forthe i — 1 stage process. Then the right-handside of this equation is equal to/,-_i[T(w,a!)].An optimal choice of d is one which maximizesthis function. As a result, the following basicrecurrence relation holds:

fn(w) =

f\(w) = <p(w).

(2<n<N) [11]

[12]

Equations 11 and 12 relate the optimalreturn from an w-stage process with theoptimal return from a process with onlyn — 1 stages. Formally, n may be viewed asindexing a sequence of processes. Thus, thesolution of these equations provides us witha maximum return function fn(w) for eachprocess and (also for each process) an initialdecision d which ensures that this maximumwill be attained if optimal decisions are madethereafter. It is important to note that bothd and fn(w) are functions of w and that wshould, in general, range over all values inthe state space W. In particular, the initialstate and initial decision of a typical memberof the sequence of processes we are consider-ing should not be confused with the initialstate and initial decisions of the JV-stageprocess we are trying to optimize. In fact, theinitial decision of the two-stage process corre-

Page 9: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

MODELS FOR OPTIMIZING THE LEARNING PROCESS 317

spends to the last decision ds-i of the N-stage process; the initial decision of thethree-stage process corresponds to the next-to-last decision dn-% of the A^-stage process,and so on.

The linear model of Equation 2 with thestate space denned in Equation 7 providesan example of a deterministic process. Theuse of Equations 11 and 12 to find anoptimal strategy for the special case of afour-stage process with two items is illus-trated in Table 1. The state at the be-ginning of Stage i is denned by the vector(liw,Qiw)- The optimization criterion is thescore on a test administered at the end ofthe instructional process. Since Item j willbe responded to correctly with probability1 — qnll), the terminal return function foran TV-stage process is 2 — (gjr(1) + qnm). Thecalculation is begun by viewing the fourthstage as a one-stage process and obtainingthe return for each possible state by meansof Equation 12. The possible states at thisfourth stage are obtained from Figure 3. Thethird and fourth stages are then viewed as

a two-stage process and Equation 11 is usedto determine the return that results from pre-senting each item for every possible statethat can occur in Stage 3, the previouslycomputed result for a one-stage process beingused to complete the computations. For eachstate, the item with the maximum returnrepresents the optimal decision to make atStage 3. The three-stage process beginningat Stage 2 is analyzed in the same way, usingthe previously computed results for the lasttwo stages. The result is an optimal decisionat Stage 2 for each possible state, assumingoptimal decisions thereafter. Finally, the pro-cedure is repeated for the four-stage processbeginning at Stage 1. The optimal strategiesof item presentation that result from thisprocedure are given at the bottom of Table 1.

With a nondeterministic process, the situa-tion is considerably more complicated. Thetransformation T is some type of probabilitydistribution and the final return is a mathe-matical expectation. While arguments basedon the optimality principle allow one to ob-tain recursive equations similar in form to

TABLE 1

CALCULATION OF OPTIMAL STRATEGY FOR EXAMPLE OF FIGURE 3 USING DYNAMIC PROGRAMMING

Number of stagesin Process N

1

2

3

4

Initial State w

(.08, .90)(.IS, .45)(.30, .22)(.60, .11)(.15, .90)

(.30, .45)

(.60, .22)

(.30, .90)

(.60, .45)

(.60, .90)

Initial Decision d

121212121212

Next StateT(w,d)

(.08, .90)(.15, .45)(.15, .45)(.30, .22)(.30, .22)(.60, .11)(.15, .90)(.30, .45)(.30, .45)(.60, .22)(.30, .90)(.60, .45)

Final return ofoptimal N — 1stage process/W-l[T(B.,d)]

1.021.401.481.291.021.401.401.481.481.291.401.481.481.481.481.48

Optimal decision

2

2

1

2

lor 2

lor 2

Optimal strategies

Stage 1

Item 1Item 2Item 2

Stage 2

Item 2Item 1Item 2

Stage 3

Item 2Item 2Item 1

Page 10: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

318 G. J. GROEN AND R. C. ATKINSON

Equations 11 and 12, both the argumentsused to obtain the equations and the methodsused to solve them can contain many subtlefeatures. A general review of the problems en-countered in this type of process was givenby Bellman (1961), and some methods ofsolution were discussed by Bellman andDreyfus (1962). For the case where thetransformation defines a Markov process withobservable states, Howard (1960) has de-rived a set of equations together with aniterative technique of solution which hasquite general applicability. However, in thecase of instructional processes, it has so fartended to be the case that either the learningmodel used has unobservable states or thatthe process can be reduced to a more deter-ministic one (as is the case with the linearmodel discussed in the example above).

A response insensitive process can often beviewed as a deterministic process. This isnot, in general, possible with a responsesensitive process. The only process of thistype that has been extensively analyzed isthat in which a list of stimulus-response itemsis to be learned, the return function is thescore on the test administered at the end ofthe process, and the learning of each itemis assumed to occur independently and obeythe assumptions of the one-element model. Anattempt to solve this problem by means of adirect extension of Howard's techniques toMarkov processes with unobservable stateshas been made by Matheson (1964). How-ever, this approach appears to lead to some-what cumbersome equations that are impos-sible to solve in any nontrivial case. A morepromising approach has been devised byKarush and Dear (1966). As in our exampleof the linear model, the states of the processare denned in terms of the current probabil-ity that an item is in the conditioned stateand a similar (though somewhat more gen-eral) return function is assumed. An expres-sion relating the return from an N — 1stage process to the return from an N-stageprocess is then derived. The main complica-tion in deriving this expression results fromthe fact that the outcome tree is more com-plicated, the subject's responses having to beexplicitly considered. Karush and Dear pro-ceeded to derive certain properties of the

return function and proved that in an JV-trialexperiment2 with items ^1,^2, . . . , sm

(where N > tn) and arbitrary initial condi-tioning probabilities (A.(1),A(2),...,A(m)), anoptimal strategy is given by presenting atany trial an item for which the current con-ditioning probability is least. In most applica-tions, the initial probabilities \(i) can be as-sumed to be zero. In this case, an observablesufficient history can be defined in terms of acounting process. An optimal strategy isinitiated by presenting the m items in anyorder on the first m trials, and a continua-tion of this strategy is optimal if and only ifit conforms to the following rules:

1. For every item set the count at 0 at thebeginning of Trial m + 1.

2. Present an item at a given trial if andonly if its count is least among the countsfor all items at the beginning of the trial.

3. Following a trial, increase the count forthe presented item by 1 if the response wascorrect but set it at 0 if the response wasincorrect.

Discussion

In this paper we have attempted to achievetwo main goals. The first has been to providean explicit statement of the problems ofoptimal instruction in the framework of multi-stage decision theory. The main reason forintroducing a somewhat elaborate notationalsystem was the need for a clear distinctionbetween the optimization problem, the learn-ing process that the student is assumed tofollow, and the method of solving the optimi-zation problem. The second goal has beento indicate, using dynamic programming asan example, how optimization problems canbe solved in practice. Again, it should beemphasized that dynamic programming is notthe only technique that can be used tosolve optimization problems. Many response-insensitive problems are solvable by moresimple, though highly specific, techniques.However, dynamic programming is the onlytechnique that has so far proved useful inthe derivation of response sensitive strategies.

2 Here the term ^V-trial experiment refers to ananticipatory paired-associate procedure which in-volves N presentations. To each stimulus presenta-tion, the subject makes a response and then is toldthe correct answer for that stimulus.

Page 11: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

MODELS FOR OPTIMIZING THE LEARNING PROCESS 319

In describing dynamic programming, an at-tempt has been made to emphasize two basicfeatures: the optimality principle, and thebackward-induction procedure by means ofwhich an optimal strategy is obtained bystarting, in effect, at the last stage. It shouldbe noted that these can be used independ-ently. For example, it is possible to combinethe optimality principle with a forward in-duction procedure which starts at the firststage of the process.

In any attempt to apply an optimizationtheory in practice, one must ask the ques-tion; how can it be tested experimentally?In principle, it is easy to formulate such anexperiment. A number of strategies arecompared—some theoretically optimal, otherstheoretically suboptimal. A test is adminis-tered at the end of the process that is de-signed to be some observable function of thefinal return. However, the only experimentthat has been explicitly designed to test anoptimization theory is that by Dear, Silber-man, Estavan, and Atkinson (1965), al-though in the case of response insensitivetheories, it is often possible to find experi-ments in the psychological literature whichprovide indirect support.

The experiment reported by Dear et al. wasconcerned with testing the strategy proposedby Karush and Dear (1966) for the caseoutlined in the preceding section. The majormodification of this strategy was to prohibitrepeated presentations of the same item byforcing separations of several trials betweenpresentations of individual items.8 Each sub-ject was presented with two sets of paired-associate items. The first set of items waspresented according to the optimization algo-rithm. Items in the second set were presentedan equal number of times in a suitable randomorder. (It will be recalled that this strategyis optimal if the linear model is assumed.)It was found that while the acquisition data(e.g., rate of learning) tended to favor itemsin the first set, no significant difference wasfound in posttest scores between items ofthe two sets.

8 This modification was necessary because it hasbeen shown experimentally that if the same itemis presented on immediately successive trials thenthe subject's response is affected by considerationsof short-term memory.

It follows from the result of this experi-ment that, even for a simple problem suchas this, an optimization theory is needed thatassumes a more complicated learning model.At least one reason for this is that, in simplepaired-associate experiments that result indata which is fitted by the one-elementmodel, any systematic effects of stimulus-presentation sequences are usually eliminatedby presenting different subjects with differentrandom sequences of stimuli. When a specificstrategy is used, it may be the case thateither the assumption of a forgetting processor of some short-term memory state becomesimportant in accounting for the data (Atkin-son & Shiffrin, 1965).

Unfortunately, the analytic study of theoptimization properties of more complexmodels, at least by dynamic programmingtechniques, is difficult. The only majorextension of response sensitive models hasbeen a result of Karush and Dear (1965)which shows that the optimal strategy forthe one-element model is also optimal if itis assumed that the probability of a correctresponse in the conditioned state L is lessthan one. However, there are ways by meansof which good approximations to optimalstrategies might be achieved, even in the caseof extremely complex models. Moreover, inmany practical applications one is notreally critically concerned about solving foran optimal procedure, but would instead bewilling to use an easily determined procedurethat closely approximates the return of theoptimum procedure. The main means ofachieving a good approximation is by ana-lyzing the problem numerically, computingthe optimal strategy for a large number ofspecial cases. A useful general algorithm fordoing this is the backward induction pro-cedure described in the preceding section.Table 1 illustrates how this algorithm canbe used to find an optimal strategy for oneparticular case. Dear (1964) discussed theuse of this algorithm in other responseinsensitive problems.

The chief disadvantage of the backwardinduction algorithm is that it can only beused for optimal strategy problems involvinga fairly small number of stages. Althoughits use can eliminate the need to search every

Page 12: Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR ...rca.ucsd.edu/selected_papers/Models for Optimizing the...Psychological Bulletin 1966, Vol. 66, No, 4, 309-320 MODELS FOR OPTIMIZING

320 G. J. GROEN AND R. C. ATKINSON

branch of a tree, the computation time stillincreases as a function of the number of pos-sible final states that can result from a giveninitial state. However, a backward inductionsolution for even a small number of stageswould provide a locally optimal policy fora process with a large number of stages,and this locally optimal strategy might pro-vide a good approximation to an optimalstrategy. To decide how "good" an ap-proximation such a strategy provided, itsreturn could be evaluated and this could becompared with the returns of alternativestrategies.

REFERENCES

ATKINSON, R. C., BOWER, G. H., & CROTHERS, E. J.An introduction to mathematical learning theory.New York: Wiley, 1965.

ATKINSON, R. C., & ESTES, W. K. Stimulus samplingtheory. In R. D. Luce, R. R. Bush, & E. Galanter(Eds.), Handbook of mathematical psychology.Vol. 2. New York: Wiley, 1963. Pp. 121-268.

ATKINSON, R. C., & HANSEN, D. N. Computer-assisted instruction in initial reading: The StanfordProject. Reading Research Quarterly, 1967, inpress.

ATKINSON, R. C., & SHOT-BIN, R. M. Mathematicalmodels for memory and learning. Technical Re-port 79, Institute for Mathematical Studies in theSocial Sciences, Stanford University, 1965. (To bepublished in D. P. Kimble, Ed., Proceedings ofthe Third Conference on Learning, Remembering,and Forgetting. New York: New York Academyof Sciences, 1966, in press.)

BELLMAN, R. Dynamic programming. Princeton:Princeton University Press, 1957.

BELLMAN, R. Adaptive control processes. Princeton:Princeton University Press, 1961.

BELLMAN, R. Functional equations. In R, D. Luce,R. R. Bush, & E. Galanter (Eds.), Handbook ofmathematical psychology. Vol. 3. New York:Wiley, 1965. Pp. 487-513.

BELLMAN, R., & DREYFUS, S. E. Applied dynamicprogramming. Princeton: Princeton UniversityPress, 1962.

BOWER, G. H. Application of a model to paired-associate learning. Psychometrika, 1961, 26, 255-280.

BRUNER, J. S. Some theorems on instruction statedwith reference to mathematics. In E. R. Hilgard(Ed.), Theories of learning and theories ofinstruction. (63rd National Society for the Studyof Education Yearbook, Part 1) Chicago: Uni-versity of Chicago Press, 1964. Pp. 306-335.

BUSH, R. R., & STERNBERO, S. H. A single operatormodel. In R. R. Bush & W. K. Estes (Eds.),Studies in mathematical learning theory. Stanford:Stanford University Press, 1959. Pp. 204-214.

CRONBACH, L. J., & GLESER, G. Psychological tests

and personnel decisions. Urbana: University ofIllinois Press, 1965.

CROTHERS, E. J. Learning model solution to aproblem in constrained optimization. Journal ofMathematical Psychology, 1965, 2, 19-25.

DEAR, R. E. Solutions for optimal designs ofstochastic learning experiments. Technical ReportSP-1765/000/00. Santa Monica: System Develop-ment Corporation, 1964.

DEAR, R. E., & ATKINSON, R. C. Optimal allocationof items in a simple, two-concept automatedteaching model. In J, E. Coulson (Ed.), Pro-grammed learning and computer-based instruction.New York: Wiley, 1962. Pp. 25-45.

DEAR, R. E., SILBERMAN, H. F., ESTAVAN, D. P., &ATKINSON, R. C. An optimal strategy for thepresentation of paired-associate items. TechnicalMemorandum TM-193S/101/00. Santa Monica:System Development Corporation, 1965.

ESTES, W. K. Learning theory and the new mentalchemistry. Psychological Review, 1960, 67, 207-223.

GAGE, N. L. (Ed.) Handbook of research on teach-ing. Chicago: Rand McNally, 1963.

HILOARD, E. R. (Ed.) Theories of learning and theo-ries of instruction. (63rd National Society for theStudy of Education Yearbook, Part 1) Chicago:University of Chicago Press, 1964.

HOWARD, R. A. Dynamic programming and Markovprocesses. New York: Wiley, 1960.

KARUSH, W., & DEAR, R. E. Optimal procedure foran JV-state testing and learning process—II.Technical Report SP-1922/001-00. Santa Monica:System Development Corporation, 1965.

KARUSH, W., & DEAR, R. E. Optimal stimulus presen-tation strategy for a stimulus sampling model oflearning. Journal of Mathematical Psychology,1966, 3, 19-47.

LUMSDAINE, A. A. Instruments and media of in-struction. In N. L. Gage (Ed.), Handbook ofresearch on teaching. Chicago: Rand McNally,1963. Pp. 583-683.

MATHESON, J. Optimum teaching procedures derivedfrom mathematical learning models. Unpublisheddoctoral dissertation, Stanford University, 1964.

RAIETA, H., & SCHLAHTER, R. Applied statisticaldecision theory. Cambridge, Mass.: Harvard Uni-versity, Graduate School of Business Administra-tion, 1961.

SILBERMAN, H. F. Characteristics of some recentstudies of instructional methods. In J. E. Coulson(Ed.), Programmed learning and computer-basedinstruction. New York: Wiley, 1962. Pp. 13-24.

SMALLWOOD, R. D. A decision structure for teachingmachines. Cambridge, Mass.: MIT Press, 1962.

STERNBERG, S. H. Stochastic learning theory. In R.D. Luce, R. R. Bush, & E. Galanter (Eds.),Handbook of mathematical psychology. Vol. 2.New York: Wiley, 1963. Pp. 1-120.

SUPPES, P. Problems of optimization in learning alist of simple items. In M. W. Shelly, II, & G. L.Bryan (Eds.), Human judgment and optimally.New York: Wiley, 1964. Pp. 116-126.

(Received February 2, 1966)