intelligent tutoring systems

Journal of Computer Assisted Learning (1987) 3, 194-203

Intelligent tutoring systems

P. ROSS, Department of Artificial Intelligence, University of Edinburgh

Abstract This is a non-specialist introduction to intelligent tutoring systems, one way in which artificial intelligence is being applied to education. It presumes little knowledge of AI. The aim is to explain what an ITS consists of, in general terms, and to give some feel for the current products of this kind of research.

Keywords: Artificial intelligence; Tutorial; Computer-assisted instruction; Learner model; Teaching strategy; Domain model.

Introduction

Traditional CAL has been with us for a good while now, and its pros and cons are beginning to be well understood. Most CAI programs that are any good are the result of much effort and are hard to modify, even though the design principles are fairly superficial in educational terms - the builder must ‘compile’ the educational decisions in his head beforehand. However, once built and thoroughly tested, they can then be used by large numbers of people. The simple ‘linear’ types of CAI are rooted in Skinnerian stimulus- response psychology. Crowder (1959) introduced the ideas behind ’branching’ CAI, namely

‘. . . The student’s response serves primarily as a means of determining whether the communication process has been effective and at the same time allows appropriate corrective action to be taken.’ The basic branching types of CAI can be used to present basic facts,

concepts and terminology. It is much harder to build programs for teaching new skills. These typically need some explicit model of the skills to be taught together with explicit models of what can go wrong with the learning, so that the skills can be taught in a synthetic way rather than by example. Not many teachers (and almost no educational program designers) can anticipate all the likely misconceptions that a learner may acquire. As a personal example which still surprises me, a programming student of mine once told me that he knew that the computer was objecting to his program, but he felt that his way ought to be the right way and so he’d try

Invited paper. Correspondence: Peter Ross, Department of AI, University of Edinburgh, 80 South Bridge, Edinburgh EHI IHN, Scotland. Originally this was an invited talk at the Fourth International Technology and Education Conference, Fort Worth, Texas, April 1987.

194

Intelligent tutoring systems 195

the same thing a few more times in case the computer relented (and he did). It was a statement not of despair but of his firm belief that computers could be persuaded. The surprising part was the combination of articulacy and irrationality.

Recently, public interest in the application of artificial intelligence ideas to computers used in education has blossomed, in part because of sen- sational but misguided publicity about AI. This paper concerns one such application, intelligent tutoring systems or ITSs. These, it is often claimed, will provide new and effective ways of teaching skills and will help to make the teaching more individualized than is possible with normal CAI. This claim rests substantially on the notion that current AI techniques can permit the ITS itself to solve the problems which it sets for the user, in a human-like and appropriate way, and then reason about the solution process and make comments on it. ITSs could be viewed as being one step further along the road to achieving Crowder’s objectives.

Education is one of the most demanding fields to which AI might be applied. It offers scope to the AI researcher for work in all the basic aspects of his subject: how to represent knowledge, how to reason using that knowledge, and how to control the interplay between these. It also seems to offer a chance to apply existing ideas. The typical AI researcher uses considerable computing power in his daily work, and much effort has been spent on creating languages, tools and environments that make it easy to use that power for experiment in AI rather than as an isolated exercise in computer science. One spin-off of these attempts to make life easier has been the creation and development of languages such as LOGO, which have undoubtedly had an impact on the world of education. A1 does have a lot to offer when it comes to creating environments that offer a suitable viewpoint on some domain, and allow a person to explore it without having to know much about computers or programming. This seam of AI research has yet to be properly mined. Intelligent tutoring systems show less promise as yet, although the argument is that these have a place in educational research even if their time is not yet ripe. At least they offer a more controllable framework for experiments than is normally possible, even if a more restrictive one too. However, many of the early efforts have been educationally nai’ve.

A digression about A1

That use of the word ‘intelligent’ causes a lot of problems too; the AI community tends to store up trouble for its public image by using a readily misunderstandable jargon. That someone is intelligent is, after all, a subjective consensus judgement when the word is used in its normal sense. X may say that Y and Z are both intelligent, but Y may not say that of Z. If Z (or Y) is a computer program, who can justify calling it intelligent to the world at large? My own stance is that A1 is largely about questions of control theory: how can ‘programs’, in the widest sense, be self-adapting and how

196 P. ROSS

wide is the range of unforseen circumstances to which they could be engineered to adapt? Besides the commercial view of AI as something new in software engineering, there are two other schools from which to take your pick. One considers AI to be about the building and testing of artificial models of (parts of) existing intelligent systems, in particular humans: the other considers AI to be about the building of intelligent artefacts, without there needing to be any existing parallels. The first is more glamorous, the second is more practicable.

The skeleton of an ITS

Intelligent tutoring systems are intended to take some of the part of a human teacher or tutor, and to take at least some of the initiative in an educational dialogue. A famous paper by Hartley & Sleeman (1973) specified the ingredients that an ITS ought to have: - knowledge of the domain; - knowledge of the person being taught (a student model); - knowledge of teaching strategies; - knowledge of how to apply the knowledge of teaching strategies to the

Humans need these, and the definitions of these ingredients can of course be stretched to the point where these are all that a human teacher needs. Professional arguments about this shopping list are still common in the literature; for example, O’Shea and co-workers (1984) come closest to specifying a general toolkit for building ITSs and base a ‘five-ring model’ on these ingredients, offering a slightly different emphasis to the one above: - a student history: - a student model; - a set of teaching strategies; - a teaching generator, that orchestrates information from the above three

- a teaching administrator component responsible for overall control. Specifications such as these can be made a little more formal, in a way that helps to show the kind of work that needs to be done to construct an ITS. First, it is important to realize that AI is largely to do with pushing symbols about, representing relationships between meaningless symbols and devising methods for producing new relationships from old, in such a way that both the old and the new are analogous to aspects of the real world. The A1 worker’s first task is often to pick an appropriate set of meaningless symbols to be the basic vocabulary. His next task is to describe the grammar rules of how they can be combined into structures that might stand for relationships between their elements. After that, he must worry about the semantics of these structures, how they might make sense together. For example, I might choose: - a vocabulary in which ‘father’ and ‘daughter’ were to be the names of

needs of an individual.

to produce output;

relationships, and ‘peter’ and ‘rebecca’ were to be mere symbols;


- a grammar which allowed structures such as (relation-name symbol1 symbol2 . . . I to represent a relationship between the given symbols;

- semantic rules requiring that ‘father’ and ‘daughter’ are relations concerning two different symbols, that the relationships are either true or false and that (father X Yl and (daughter Y XI are both true or both false. This gets trickier if I want to extend this unreal microcosm to cater for sons too - Y might be a son, after all. Let VGS stand for ‘vocabulary, grammar and semantics’. An ITS could

be characterized as consisting of: 1 A VGSdomain about the domain in question, such as elementary electricity. This might include notions of current, voltage, resistance, and so on. To leave out notions such as valency, electron orbitals and the skin effect would obviously make it simpler: it is an educational decision as to whether anything serious is lost by that. The VGSdomain stands for what can be said about the domain, and by its nature it will characterize the kind of misconceptions - falsehoods about the domain - that the ITS could address. Much data gathering and educational thought has to be put in at this point in the design of such a system. 2 A set of expressions composed in VGSdomain that express the true facts and rules of the domain, such as Ohm’s law and Kirchoff’s laws. This set of expressions forms the subject matter for the ITS. 3 A VGS,,,, about the class of intended users. This might be no more than VGSdOmain, but to be read as a representation of what the student can know. It might subsume VGSdomain, permitting statements about typing competence or the range of a user’s goals in using the ITS, for example. There might even be no overlap between VGSdomain and VGS,,,, at all. Devising VGS,,,, calls for studying the significant characteristics of the class of users: this is sometimes confusingly referred to as user modelling and might be better named as user prototyping. 4 An initial set of expressions in VGS,,,,, representing the initial assumptions about an individual user, together with a method of modifying the set in the light of information extracted from the user’s inputs. This is also referred to as user modelling. 5 A VGSteaching about teaching the subject. This characterizes the teaching styles of the system. In the case of basic electricity, it might permit statements about specific ways of teaching Ohm’s law as well as general statements about introduction of new material, about motivation, diagnosis and remediation and other ‘soft’ concepts. Thus VGSteaching might subsume both VGSdomain and VGS,,,,, particularly if the approach is to have some explicit representation of ideal knowledge and to steer the user towards acquiring this. 6 A set of expressions in VGSteaching that dictate the system’s actual teaching strategies. 7 An algorithm that, given the current and past user input, blends all this together to produce tutorial output. This might make many foms: con- structive or hostile criticism, questions, Socratic utterances, illustrative

198 P. Ross

examples, even just temporizing remarks equivalent to silence. The provision of decent input and output is usually a major engineering job in itself. The cycle of operations of an intelligent teaching system is usually something like this, after some initial setting up: 1 The system sets a problem. 2 The user inputs his answer, or part of it. What counts as a unit of input depends very much on the subject and approach. It may be a complete computer program as in the case of PROUST (Johnson & Soloway. 1984) or just a single word or character of an answer. 3 The system generates possible analyses of the input using its knowledge of the subject, of the problem itself and of the user and his past input. The analyses will include such details as whether it was syntactically or semantically correct and what knowledge might have been used in its construction. 4 If necessary, the system gets confirmation from the user as to which analysis was the true one, by asking as few questions as possible to discriminate. Such confirmation may not be needed - for instance, there may be one clear choice, or the system may be designed to permit the user to get into a mess, so that it has the option of biding its time waiting for further evidence from later inputs. 5 If the confirmed analysis is satisfactory (which may mean only that it is good enough, not 100%), the system then decides what to do next, using algorithms that may depend on a syllabus and on what is believed about the user. 6 If the analysis shows that the input was somehow unsatisfactory, the system may need to start some teaching activity. This can mean various things, for example: - start a dialogue about the problem and the user’s answer; - show some test cases that make the flaws stunningly clear; - force the user to go step by step through an example that focusses on the

weak points in the answer. As an example, consider PROUST (Johnson & Soloway, 1984). It

analyses beginners’ PASCAL programs, and runs on a VAX-11/750. The source code consists of more than 15 000 lines of LISP. It needs to know what problem the program is supposed to solve - this is given in the form of a statement of the goals and conditions that the program must meet, using a highly specialized language. PROUST holds knowledge of the various ways in which a programmer might satisfy prototypes of each possible requirement expressible in that language, and can use it to generate possible answers to the specific problem. This is the set of expressions in the VGSdomain. It tries to fit such answers to the user’s input, producing lists of discrepancies. If none fit, it tries to ‘explain’ the best fits by employing a library of the common misconceptions and errors found in PASCAL programs to see if any of these can account for the discrepancies found. The ‘bug’ library forms the set of expressions in the VGS,,,,,. If so, it


comments appropriately. It does not (yet) engage in a dialogue at all, so it is better thought of as a component of an ITS although it is useful as it stands. It has been tested on thousands of students.

GREATERP (Reiser et al., 1986) teaches elementary LISP, and has many features in common with PROUST’s methods, although its unit of input is a single LISP atom - that is, one of the lowest level units of syntax such as a bracket or a text item such as ‘member’ or ‘car’. It comments as soon as it is unable to explain an atom, so the system is no use for teaching debugging skills because the user cannot get himself into a really serious mess. It is an unusual example, because it has reached the marketplace - it is sold by Advanced Computer Tutoring Inc., Pittsburgh, under the name LISP-ITS. In tests, it has shown impressive shortenings of learning time. The VGSs are all subsets of a specially created language called GRAPES (Goal Restricted Production System), which was developed at CMU as part of an effort to model how people learn LISP. GRAPES is a realization of certain parts of Anderson’s ACT* theory of acquisition of cognitive skills (Anderson, 1983). Four conclusions, consistent with ACT* and supported by empirical evidence, underpin the design of GREATERP (see Anderson et al. (1984) for details of the studies of LISP programming): 1 in learning LISP, the problem-solving is hierarchically organized around a set of goals and subgoals; 2 initially, problem-solving is guided by structural analogy to actual examples: 3 learning processes of composition and proceduralization use this knowledge to create internal ‘compiled’ procedures specific to programming; 4 the learner’s working memory capacity sets limits on how successfully he can apply analogical and programming processes at any stage. This work is among the most principled and documented of current ITS research. Although Anderson’s ACT* theory seems rather behaviourist in outlook, the results are impressive.

Systems such as these call for many of the techniques that AI has made its own, such as knowledge representation, specialized searching and matching techniques, inference and dialogue handling. Unfortunately these tend to consume a great deal of computing resources, so preventing widespread economic use. ITSs tend to be very hard to construct, too, and so their practical use can only be economically justified in special cases. A further problem is that it is often very hard to come to any firm conclusions about what the user does know - techniques of cognitive modelling of individuals are not that good. Even the ideas within existing systems have limited applicability. For instance, it is not so easy to apply the ideas within GREATERP or PROUST to the domain of teaching PROLOG - the syntax of PROLOG is so simple that there is no way to tell whether a program is complete except by considering its meaning. Since there is no real distinc- tion between data and program in PROLOG, and since an absence of data (or program) is not an error, it may even be that parts of a program are non-

200 P. Ross

existent by design! User modelling methods still tend to rest on one of these three assumptions: I the student only ever suffers from a lack of knowledge; 2 (or) the student may have incorrect versions of the knowledge; 3 (or) the student may have the correct knowledge, but fail to make correct use of it. The trouble with adopting more than one of these assumptions at once is that when a student does make a mistake, how is it to be accounted for in any systematic way? This is still an interesting research issue.

Another flavour of ITS is exemplified by the Recovery Boiler Tutor (Woolf et a]. , 1986) commercially available from the American Paper Institute Inc., New York, and developed by J.H. Jansen Inc., Woodinville, WA. It provides a detailed simulation of the kind of recovery boiler used to recycle chemicals needed in paper making, and can tax the user with various kinds of operating problems and emergencies. A tutoring component criticizes his actions as he tries to fix the problems. This system runs on a PC/AT. Systems of this ilk, consisting of a good simulation coupled with a tutoring component, are perhaps the most promising at the moment. The simulation need not be of a physical process. BUGGY (Brown & Burton, 1978) is a good example: it simulates a pupil who is having trouble with basic addition or subtraction. The user’s job is to set simple problems that will help him to diagnose the bugs in the pupil’s work. When he believes he has the answer, BUGGY sets him a test in which he has to replicate the effects of the bugs. BUGGY, and its companions DEBUGGY and IDEBUGGY, have been frequently cited as success stories and the ideas applied to other procedural skills. For example, the set of sums in Fig. 1 was automatically generated by a BUGGY-like system created as part of an MSc

5.6 + 4.0 = 8.66 + 9.28 = 6.79 + 8.72 = 1.45 + 3 = 32.2 + 1 = 8.8 + 58 = 4.31 + 1.6 = 11.05 + 6.7 = 1.51 + 1.54 = 4.01 + 2.03 =

Fig. 1. An automatically generated diagnostic test.


project at Edinburgh (Cawsey 1986); it distinguishes between 15 types of error in decimal addition - on the assumption that those are the only type of errors that may be made.

How does the diagnostic system builder, or indeed any ITS designer, get a handle on the range of possible misconceptions? One of the first steps is normally protocol analysis. This usually produces a useful harvest. Merely ‘hardwiring’ these into the design tends to lead to an inflexible system, of course. There have been some valiant attempts to produce theories that can generate possible misconceptions in certain domains. These often fall foul of two criteria - firstly, they must generate most of the observed problems and not too many others, and secondly, they must not be at too general a level. If they are, then they are usually too easy to bend to fit by adding special cases. One interesting approach (Van Lehn, 1983) has been based on the observation that misconceptions are themselves often learnt rather than innate or analogized. Introductory textbooks rarely present algorithms as such; they usually present illustrative examples. There are some teaching maxims behind such examples: - the examples are sufficient to learn from; - where there is a choice possible in the implicit algorithm, only one

- in introductory work, nothing is invisible or unstated; - learning takes place by assimilation rather than restructuring. It is possible to disagree with this, and anyway teachers often present algorithms even if books don’t. Nevertheless, Van Lehn’s SIERRA system attempts to learn from sets of examples which fit the above criteria. After each lesson, a subsystem tries to predict the range of bugs that might be observed in students who had reached the same point. In tests with real schoolchildren, the correspondence between observed and predicted bugs was impressive.

However, diagnostic systems such as those mentioned above are often built on the notion that a pupil who is doing arithmetic is obeying an essentially meaningless algorithm, which may be a flawed version of a valid one. This ‘content-free’ hypothesis is convenient but wrong, and extremely limiting. Moreover, it is not entirely clear that the output of such a diagnostic system, namely the nature of the student’s misconceptions, is such a big step on the road to fixing them.

branch of the choice is taught per lesson, to avoid confusion of goals;

The range of prototype ITSs

Table 1 lists examples of ITSs that have been or are being developed. It is reasonably representative, and shows that the AI world is fairly inward- looking when it comes to picking experimental domains.

Most of these do a lot less than the topic indication suggests and are only experimental, with the earliest ones being in general the least sophisticated. Hardly any have been properly tested on more than a very few people. Fortunately, computing power is becoming sufficiently cheap and powerful

202 P. Ross

Table 1. Some examples of intelligent tutoring systems and environments

Name Date Topic

ACE 1982 Interpretation of NMR spectra ALGEBRA TUTOR 1983 Solving simple h e a r equation systems

ALGEBRALAND 1985 Algebraic proofs tool

BIP 1976 BASIC programming BANDAID 1978 Introductory BASIC programing

B UGGY/DEB UGG YIIDEB UGGY 19 78/82 Diagnosis in basic arithmetic EUROHELP 1987 Tutorial help, initially for UNIX mail

EXCHECK 1983 Simple logic and set theory FGA 1985 Basic French grammar

FLOW 1977 Flow computer language GEOMETRY TUTOR 1985 Geometric proofs tool

GERMAN LANGUAGE TUTOR 1978 Simple German syntaxlvocabulary GUIDON [e -DEBUG, -WATCH) 1979186 Medical diagnosis

INTEGRATION TUTOR 1976 Basic integral calculus LISP-ITS 1985 LISP programming

U of T LISP TUTOR 1985 LISP programming MACSYMA ADVISOR 1979 Use of MACSYMA

MALT 1973 Basic machine language programing MENO-11 1983 Very simple PASCAL programming

NEOMYCIN 1981184 Medical diagnosis PIXIE 1983 Algebra equation solving

PROUST 1985 PASCAL programming QUADRATIC TUTOR 1975 Solving quadratic equations

QUEST 1984/87 Simple automotive electrics

SIERRA 1983187 Learning arithmetic procedures SOPHIE-I.II,III 1976182 Electronic trouble shooting

SPADE 1982 Simple LOGO programming SPIRIT 1984 Probability theory

STEAMER 1982/87 Marine steam propulsion plant TALUS 1985 Basic LISP programming

THEVENIN 1985 Simple electrical circuits TRILL 1983 Basic concepts of LISP

TUTOR 1985 British Highway Code

SCHOLAR 1973 Facts of South American geography

VP2 1985 Basic English grammar for Spanish WEST 1982 Simple arithmetic skills

WUSOR 1982 Expertise in a maze game WHY 1982 Basic processes in meteorology

that we can expect to see some interesting tests of the merits of ITSs within the next decade.

The educational approach is still, in some cases, very simplistic - little more than ‘present it, test it, assess it, maybe reteach it’. The move towards the use of theories such as ACT* and STEP is a heartening one, although an ITS may still succeed or fail because of some incidental interface issues rather than because of its design principles. It will be a good while before the teaching rises above the purely methodical towards the inspirational, too. In the meantime, ITSs have at least given rise to many interesting


practical studies in the course of protocol gathering, and to much argument about educational theories.

For further reading, see O’Shea & Self (1983), Sleeman & Brown (1982) or Wenger (1987).

References

Anderson, J.R. (1983) The Architecture of Cognition, Harvard University Press. Anderson, J.R. Farrell, R. & Sauers R. (1984) Learning to program in LISP Cognitive

Science, 8 , 87-129. Brown, J.S. & Burton, R.R. (1978) Diagnostic models for procedural bugs in basic

mathematical skills. Cognitive Science, 2, 155- 192. Cawsey, A. (1986) Bugs in decimal addition: model, applications and explanations

AISB Quarterly, 59, 14 - 16. Crowder, N.A. (1959) Automatic tutoring by means of intrinsic programming. In

(1959) Automatic Teaching: the State of the Art, (ed. E. Galanter). Wiley. Hartley, J.R. & Sleeman, D.H. Towards more intelligent teaching systems.

International Journal of Man-Machine Studies, 2, 215 -236. Johnson, W.L. & Soloway, E. (1986) PROUST, Byte, April 1985, 179-190. See also

Johnson, W.L. Intention-based diagnosis of Novice Programming Errors, Morgan Kaufmann, 304pp.

O’Shea, T. & Self, J. (1983) Learning and Teaching with Computers: Artificial Intelligence in Education, Harvester Press.

O’Shea, T., Bornat, R., du Boulay, B., Eisenstadt, M. & Page, I. (1984) Tools for creating intelligent computer tutors. In Human and Artificial Intelligence (eds Elithor & Banejii). North Holland.

Reiser. B. Anderson, J.R. & Farrell, R. (1985) Dynamic student modelling in an intelligent tutor for LISP programming. IJCAI-85 Proceedings, 8 - 14, Morgan Kaufmann.

Sleeman, D. & Brown, J.S. (1982) Intelligent Tutoring Systems, Academic Press. Van Lehn, K. (1983) Felicity conditions for human skill acquisition: validating and

AI-based theory. PhD thesis, Department of Computer Science, MIT. See also: Human procedural skill acquisition: theory, model and psychological vali- dation. Proceedings of AAAI-83 420-423, 1983.

Wenger, E. (1987) Artificial Intelligence and Tutoring Systems: Computational Approaches to the Communication of Knowledge, Morgan Kaufmann.

Woolf, B., Blegen, D., J.H. Jansen & Verloop A. (1986) Teaching a complex indus- trial process. Proceeding of AAAI-86, 722 - 729, Morgan Kaufmann.

intelligent tutoring systems

Documents