equivalence class formation: a method for teaching statistical

19
EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL INTERACTIONS LANNY FIELDS THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK QUEENS COLLEGE OF THE CITY UNIVERSITY OF NEW YORK ROBERT TRAVIS THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK DEBORAH ROY UNIVERSITY OF ULSTER, COLERAINE EYTAN YADLOVKER AND LILIANE DE AGUIAR-ROCHA THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK AND PETER STURMEY THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK QUEENS COLLEGE OF THE CITY UNIVERSITY OF NEW YORK Many students struggle with statistical concepts such as interaction. In an experimental group, participants took a paper-and-pencil test and then were given training to establish equivalent classes containing four different statistical interactions. All participants formed the equivalence classes and showed maintenance when probes contained novel negative exemplars. Thereafter, participants took a second paper-and-pencil test. Participants in the control group received two versions of the paper-and-pencil test without equivalence-based instruction. All participants in the experimental group showed increased paper-and-pencil test scores after forming the interaction-indicative equivalence classes. Class-indicative responding also generalized to novel exemplars and the novel question format used in the paper-and-pencil test. Test scores did not change with repetition for control group participants. Implications for behavioral diagnostics and teaching technology are discussed. DESCRIPTORS: college students, computer-based training, equivalence classes, general- ization to novel exemplars _______________________________________________________________________________ The ability to manipulate, interpret, and describe data are key skills needed to evaluate published empirical work, plan experimental research, and function effectively in the natural and social sciences (Mulhern & Wylie, 2004; Ward & Kaflowitz, 1986). In addition, these skills can enhance a person’s ability to understand the complex information encountered in everyday settings in our increasingly sophisticated world. For example, health and longevity can be influ- enced in complex ways by variables such as genetic background, exercise, diet, years of marriage, and so on. The enhancement of longevity and health then might depend on an ability to understand what it means for these factors to interact and how those interactions might inform the implementa- tion of beneficial changes in lifestyle. For many individuals, notions of interaction are introduced in college courses in statistics. Address correspondence to Lanny Fields, Department of Psychology, Queens College/CUNY, 65-30 Kissena Boulevard, Flushing, New York 11367 (e-mail: Lanny. [email protected]). doi: 10.1901/jaba.2009.42-575 JOURNAL OF APPLIED BEHAVIOR ANALYSIS 2009, 42, 575–593 NUMBER 3(FALL 2009) 575

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHINGSTATISTICAL INTERACTIONS

LANNY FIELDS

THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK

QUEENS COLLEGE OF THE CITY UNIVERSITY OF NEW YORK

ROBERT TRAVIS

THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK

DEBORAH ROY

UNIVERSITY OF ULSTER, COLERAINE

EYTAN YADLOVKER AND LILIANE DE AGUIAR-ROCHA

THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK

AND

PETER STURMEY

THE GRADUATE CENTER OF THE CITY UNIVERSITY OF NEW YORK

QUEENS COLLEGE OF THE CITY UNIVERSITY OF NEW YORK

Many students struggle with statistical concepts such as interaction. In an experimental group,participants took a paper-and-pencil test and then were given training to establish equivalentclasses containing four different statistical interactions. All participants formed the equivalenceclasses and showed maintenance when probes contained novel negative exemplars. Thereafter,participants took a second paper-and-pencil test. Participants in the control group received twoversions of the paper-and-pencil test without equivalence-based instruction. All participants inthe experimental group showed increased paper-and-pencil test scores after forming theinteraction-indicative equivalence classes. Class-indicative responding also generalized to novelexemplars and the novel question format used in the paper-and-pencil test. Test scores did notchange with repetition for control group participants. Implications for behavioral diagnostics andteaching technology are discussed.

DESCRIPTORS: college students, computer-based training, equivalence classes, general-ization to novel exemplars

_______________________________________________________________________________

The ability to manipulate, interpret, anddescribe data are key skills needed to evaluatepublished empirical work, plan experimentalresearch, and function effectively in the naturaland social sciences (Mulhern & Wylie, 2004; Ward& Kaflowitz, 1986). In addition, these skills canenhance a person’s ability to understand the

complex information encountered in everydaysettings in our increasingly sophisticated world.For example, health and longevity can be influ-enced in complex ways by variables such as geneticbackground, exercise, diet, years of marriage, andso on. The enhancement of longevity and healththen might depend on an ability to understandwhat it means for these factors to interact and howthose interactions might inform the implementa-tion of beneficial changes in lifestyle.

For many individuals, notions of interactionare introduced in college courses in statistics.

Address correspondence to Lanny Fields, Departmentof Psychology, Queens College/CUNY, 65-30 KissenaBoulevard, Flushing, New York 11367 (e-mail: [email protected]).

doi: 10.1901/jaba.2009.42-575

JOURNAL OF APPLIED BEHAVIOR ANALYSIS 2009, 42, 575–593 NUMBER 3 (FALL 2009)

575

Page 2: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

Therefore, the concepts imparted in a statisticscourse could have a beneficial influence on anindividual’s quality of life. Many collegestudents, however, find it difficult to masterthe content of a statistics course (Rosenthal,1992; Simon & Bruce, 1991). Explanations ofthese difficulties include interference withperformance by affective variables such asanxiety (Nasser, 1999), deficiencies of theprimarily lecture-based instructional methodsused to teach concepts in statistics (Christopher& Marek, 2002; Peden, 2001), and deficienciesin mathematical skills (Mulhern & Wylie,2004). A cooperative learning approach toteaching statistics that combines in-class groupactivities with conceptual material providedduring lectures appears to improve performancein (Hinde & Kovac, 2001) and student ratingsof (Davidson & Kroll, 1991) statistics courses.These studies, however, did not operationalizehow the teaching factors influenced the learningof the statistical concepts. In another study,although students in a traditionally taughtstatistics course learned to manipulate defini-tions and algorithms, often they were unable toapply these concepts to real-world problems(Bradstreet, 1996). Finally, Seipel and Apigian(2005) noted that a better understanding of the‘‘behavioral weaknesses’’ of students might leadto new instructional modes designed to correctthese deficits. The present experiment sough toaddress these shortcomings by the application ofan equivalence class analysis to a difficult topicin statistics: interaction.

Equivalence classes. Three or more physicallydisparate stimuli are equivalent when thepresentation of any stimulus from the set evokesselection of any other stimulus in the same set(Fields & Reeve, 2000; Sidman, 1971). Theprocedural variables that lead to the formationof equivalence classes in laboratory settings havebeen well documented (Fields, Reeve, Adams,& Verhave, 1991; Fields & Verhave, 1987;Fields, Verhave, & Fath, 1984; Sidman, Kirk,& Willson-Morris, 1985; Sidman & Tailby,

1982; Smeets & Barnes-Holmes, 2005) andhave been used in applied settings to establishequivalence classes indicative of reading reper-toires by individuals with developmental dis-abilities (Connell & Witt, 2004; de Rose, deSouza, & Hanna, 1996; Sidman, 1971; Sidman& Cresson, 1973), facial recognition in adultswith brain damage (Cowley, Green, & Braun-ling-McMorrow, 1992; Guercio, Podolska-Schroeder, & Rehfeldt, 2004), geographicrelations in children with autism (LeBlanc,Miguel, Cummings, Goldsmith, & Carr,2003), and fraction-decimal relations in chil-dren (Lynch & Cuvo, 1995). Thus, similarprocedures might be effective for teachingrelations among the complex stimuli typicallyencountered by college students in statistics.

Statistical interaction. Personal observationand those of many colleagues who have taughtcourses in statistics and experimental psychol-ogy indicate that many college students havedifficulties recognizing representations of thecombined effects of two independent variableson some dependent variable (i.e., statisticalinteraction). Specifically, when two indepen-dent variables are simultaneously manipulated,two possible outcomes can occur. First, analteration in the value of one independentvariable can produce a constant change in theeffects of a second independent variable on adependent variable. In this case, the effects ofthe two independent variables are said to beadditive (i.e., the effect of the second variable onthe first is constant across manipulations).Second, an alteration in the value of oneindependent variable can modulate the effectof a second independent variable on a depen-dent variable. In this case, the effect of onevariable on a dependent variable is determinedby the value of the other variable. When themanipulations of independent variables producesuch an outcome, the effect is referred to as aninteraction. In addition, a change in the value ofone variable can reverse, enhance, or diminishthe effects of a second independent variable.

576 LANNY FIELDS et al.

Page 3: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

Finally, each type of interaction can be depictedin many ways (e.g., as a graph, a textualdescription, a definition, and a name).

The representations of a statistical interactioncan be viewed as four different stimuli that arepresented to a student during instruction.Comprehending a particular type of statisticalinteraction can be operationally defined asselecting any stimulus from a given set ofrepresentations when presented with any otherstimulus from the same four-member interactionset. This goal can be achieved by the establish-ment of interaction-indicative equivalence class-es. To illustrate, assume that the stimuli for atype of statistical interaction are a graph (A), awritten description of the data in the graph (B),the label of the type of interaction (C), and itsdefinition (D). Matching-to-sample training canbe used to establish the relations for each class offour stimuli representing a particular interaction:A-B, B-C, and C-D. Grasping a statisticalinteraction can be inferred when a studentresponds in a class-consistent manner to thetrained and untrained relations among thestimuli in the set. Specifically, a student mustselect the correct description (B) when given itsgraph (A-B), the correct graph when given thedescription (B-A), the correct label (C) whenpresented with the corresponding description (B-C), the description (B) when presented with thecorrect label (C-B), the correct label when giventhe correct graph (A-C), and vice versa (C-A).Further, a student should be able to select thecorrect definition when given its correspondinggraph (A-D), description (B-D), or label (C-D)and vice versa (D-A, D-B, and D-C). Thus, theemergence of the three symmetrical (B-A, C-B,D-C), three transitive (A-C, A-D, B-D), andthree equivalence (C-A, D-A, D-B) relationswould indicate the formation of a four-memberequivalence class after the training of only threebaseline conditional discriminations (A-B, B-C,and C-D).

To be of practical value, the selection of anystimulus in a class that represents an interaction

would also have to generalize to new variationsof each member of that class. For example,presenting some novel graphic or textual variantof an A or a B stimulus as a sample shouldoccasion selection of the stimuli in the class thathad been used as comparisons during trainingand vice versa. A graphic variant (A) would bean interaction graph that contained functionswith slopes and intercepts that differed fromthose used in training and also had differentindependent and dependent variables. A variantof a descriptive variable (B) would be text thatparaphrased the trained descriptions. Further,presentation of any of these novel stimulusvariants as samples should also occasion selec-tion of any novel stimulus variant as acomparison. Such an outcome would demon-strate that the perceptually distinct exemplars ofa given class along with their variants werefunctioning as members of a generalizedequivalence class (Fields & Reeve, 2001).Finally, these performances would indicategeneralization among stimuli within a classand discrimination between stimuli in differentclasses, the behavior-analytic definition ofconcept formation (Keller & Schoenfeld,1950). These data, then would operationallydefine the establishment of the concept ofinteraction.

The present study addressed four questions.First, can computer-based procedures that areknown to form equivalence classes with arbi-trary stimuli also be used to establish classes ofstimuli that represent four types of statisticalinteraction in which each class contains differ-ent depictions of the designated type ofinteraction? Second, would the trained andderived relations in the equivalence classes bemaintained when tested in the context of novelnegative exemplars, a form of generalizationacross contexts? Third, would the trained andderived relations in the equivalence classesgeneralize to novel representations of statisticalinteractions in a novel paper-and-pencil testingformat that contained more choices than those

EQUIVALENCE CLASS FORMATION AND STATISTICS 577

Page 4: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

used during class formation? Fourth, wouldstudents have a preference for the procedureused to establish the interaction-based equiva-lence classes (i.e., social validity)? These ques-tions were answered in a two-group pretest–posttest design. An experimental group receiveda paper-and-pencil pretest on statistical interac-tions, computer-based equivalence class forma-tion training, and then a paper-and-pencilposttest. A control group received only thepretest and posttest alone. Outcomes weredetermined by comparing the scores obtainedfrom the pretests and posttests for both groups.

METHOD

Participants

Twenty-one students, enrolled in a class inintroductory psychology, satisfied one of thecourse requirements by participation in thepresent experiment. To participate, a studentfirst signed an informed consent statement forthe 3- to 3.5-hr experiment. All participantsreceived the same credit toward satisfaction ofthe course requirement.

Apparatus

Setting and hardware. All computer trainingphases took place in cubicles (1.8 m by 1.5 m)that contained an IBM computer, a keyboard, adot matrix printer, and a desk and chair. Allstimuli were presented on the computermonitor, and all responses to the stimuliinvolved pressing specific keys on the computerkeyboard.

Software. A customized DOS-based programwritten in Visual Basic controlled all aspects ofcomputer-based training, testing, and recordingof the relations presented for training andtesting, the choices made by the participant,reaction times, and the feedback provided onevery trial. All stimuli measured 5 cm by 5 cmand were presented on a 380-mm SVGAcomputer monitor.

Stimuli used in equivalence class formation.The four members of each statistical stimulus

class used during computer-based equivalencetraining are shown in Figure 1. Each stimulusclass contained four different stimulus typesthat were assigned a letter designation. The Astimuli were line graphs depicting four types ofstatistical interactions. The B stimuli weretextual descriptions of the interactions depictedin each graph. The C stimuli were labels of eachinteraction or no interaction. The D stimuliwere textual definitions of each type ofinteraction. Each stimulus class was alsonumbered (1 5 no interaction, 2 5 crossoverinteraction, 3 5 divergent interaction, and 4 5

synergistic interaction). For example, the A1stimulus was a line graph from the no-interaction class, and the D3 stimulus was adefinition from the divergent class.

Procedure

Experimental design. The experiment was apretest–posttest design with control and exper-imental groups. Participants in the controlgroup received two versions of the paper-and-pencil test without intervening establishment ofequivalence classes. Participants in the experi-mental group received a paper-and-pencil test,computer training to form four four-memberequivalence classes, and then a second version ofthe paper-and-pencil test. Across groups, par-ticipants were matched on pretest scores andthen randomly assigned to the experimental orcontrol group by the flip of a coin to reducebetween-groups variability by ensuring thatparticipants in both conditions performedessentially equally before the intervention.Because 1 participant dropped out of theexperimental group after group assignment, anuneven number of participants were in the twoconditions. The dependent variable was perfor-mance on the paper-and-pencil test. Finally, allparticipants completed a social validity ques-tionnaire to evaluate four aspects of theexperiment.

Paper-and-pencil pretest. The paper-and-pen-cil tests contained 24 multiple-choice questionsabout statistical interactions with four options

578 LANNY FIELDS et al.

Page 5: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

Figure 1. An example of the four members of each class of stimuli used during the equivalence training.

EQUIVALENCE CLASS FORMATION AND STATISTICS 579

Page 6: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

as answers (a, b, c, and d). A participantanswered the questions by entering the lettercorresponding to the correct answer on astandard Scantron sheet that was scored elec-tronically. Of the 24 questions, two wereincluded from each possible stimulus relation(A-B, B-A, B-C, C-B, A-C, C-A, C-D, D-C, A-D, D-A, B-D, D-B), with six questions fromeach stimulus class. The information in eachquestion in the paper-and-pencil tests containedstatements and graphs that differed in textualand graphic content from those used as stimulifor the computer-induced equivalence classes.Thus, a B1-A1 question contained a descriptionof a graph that was similar to but differed fromthe description of the B1 stimulus depicted inFigure 1. In addition, the answer optionsconsisted of four graphs that were similar tobut differed from those used as the A1 throughA4 stimuli depicted in Figure 1.

These distinctions are illustrated in Figure 2.Whereas all of the B stimuli used in the computertraining depicted the effects of age and sleepdeprivation on aggressive responses, the B2stimulus used in the B2-A2 question in thepaper-and-pencil test described the effects of lightexposure and water intake on plant growth.Whereas the B2 stimulus used in trainingincluded the phrase ‘‘intersected at an interme-diate level of sleep deprivation,’’ the B2 stimulusin the paper-and-pencil test contained the phrase‘‘did intersect.’’ Similarly, the four A graphs usedin the paper-and-pencil test were the same formatas those used for computer-based training; thegraphs used in the paper-and-pencil test con-tained functions with slopes and intercepts thatwere different from those used in the A stimuli inFigure 1. Three faculty members in the Depart-ment of Psychology at Queens College/CUNY,each recognized as an expert teacher of statistics,assessed the validity of the test and concludedthat it would measure knowledge of each type ofstatistical interaction accurately.

Although unlikely, it is possible that theanswers to the questions in the two versions of

the test could be determined by listing thequestions in the same order or by listing theanswers to each question in the same order. Toobviate such a source of control, the twoversions of the test listed the same questionsin different orders and listed the answers to thesame question in a different order. (The testscan be obtained from the first author.)

All participants in both experimental andcontrol groups were randomly assigned toreceive A or B versions of the paper pretest inalternating orders. The test was conducted in aclassroom and given to all participants at thesame time in a group format. Instructions forcompleting the test were dictated from a typedsheet. All participants were given a maximumof 45 min to complete the test. After comple-tion, experimental participants were led tocubicles and began computer-based training,and control group participants were given a1.5-hr break before returning to take theposttest.

The two versions of the test were randomlyassigned as pretest and posttest with theconstraint that each was used equally in eachtest. The sequence of test administrations wasnearly balanced across both groups; because ofthe odd number of participants, the A then Btest sequence was presented one more time thanthe B then A test sequence. Thus, differences inscores on the pretest and posttest could not beattributed to the particular version of the test.

Computer-based procedure. Equivalence class-es were established with trials presented inmatching-to-sample format. Three stimuli werepresented on the computer screen in anequilateral triangular array with the sample atthe top of the triangle and the two comparisonsat the bottom left and right of the triangle. Atrial began by pressing the ENTER key, whichproduced the sample stimulus. Pressing thespace bar then produced the two comparisonstimuli. All three stimuli remained on the screenuntil the participant selected the comparison onthe left by pressing the 1 key or the comparison

580 LANNY FIELDS et al.

Page 7: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

on the right by pressing the 2 key. Immediatelythereafter, the stimuli disappeared and werereplaced with one of two informative feedbackmessages or a noninformative feedback message.Correct and incorrect choices produced thewords ‘‘right’’ and ‘‘wrong,’’ respectively. If a

trial was scheduled for noninformative feed-back, the letter E appeared in the screen. Thefeedback messages remained on the screen untilthe participant pressed the R key in the presenceof right, the W key in the presence of wrong,and the E key in the presence of E.

Figure 2. Two examples of questions on the paper pretest and posttest. The first question tests a B-A relation fromClass 2, and the second question tests a D-C relation from Class 4.

EQUIVALENCE CLASS FORMATION AND STATISTICS 581

Page 8: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

All training and testing were conducted inblocks of trials presented in a randomized orderwithout replacement. For training, a block wasrepeated until performances reached a masterycriterion, and trials in a block were conductedin conditions that either produced (a) informa-tive feedback on 100% of the trials; (b)informative feedback on 75%, 25%, or 0% ofthe trials; or (c) noninformative feedback.During blocks that tracked the emergence ofderived relations, all trials produced noninfor-mative feedback.

Keyboard familiarization. Training began aprocedure to teach participants the sequence ofresponses needed to negotiate the matching-to-sample trials used throughout the experiment(Fields et al., 1997). The stimuli were two setsof three words semantically related to eachother. Each trial in the block consisted of asample and positive comparison that was fromthe same semantically related set and a negativecomparison that was from the other set. Inaddition, the response keys were indicated withonscreen prompts. If the performance criterionof 100% accuracy was achieved in a block oftrials, the next block contained fewer prompts,which were faded in four steps. Familiarizationtraining was complete once a block of trialsproduced the mastery level of responding in thepresence of trials that did not contain anyprompts.

Equivalence class formation. At the comple-tion of keyboard familiarization training, par-ticipants in the experimental group wereexposed to computer-based protocol to inducefour four-member interaction-indicative equiv-alence classes (Class 1 5 no interaction, Class 25 crossover interaction, Class 3 5 divergentinteraction, and Class 4 5 synergistic interac-tion). Trials were presented in the samematching-to-sample format used during key-board familiarization, but with no prompts.The sequence of training and testing blocksfollowed the simple-to-complex protocol (Ad-ams, Fields, & Verhave, 1993; Imam, 2006).

Because the participants were university stu-dents, it was assumed that a generalizedidentity-matching repertoire was present already(i.e., if given an A1 stimulus, participants wouldbe able to select the A1 comparison because itwas identical to the sample stimulus); therefore,identity relations were not tested.

During all training and testing phases, unlessotherwise noted, stimuli from one class werelocked with stimuli from a specific correspondingclass as negative comparisons. Stimuli from Class1 served as negative comparisons for Class 4members, and Class 2 stimuli served as negativecomparisons for Class 3 members, and vice versa.For example, when training A1 to B1, the negativecomparisons for the Class 1 stimuli consisted of Bmembers from Class 4 (B4). Trials used to train ortest for each relation are listed in Table 1.

Training for baseline conditional discrimina-tions and testing for the emergence of derivedrelations began with establishing the baseline A-B relations, using a block that contained 16trials: four presentations of each trial listed inthe A-B section of Table 1. Training continuedwith 100% feedback until the mastery criterionwas achieved. Thereafter, feedback in successiveblocks was systematically reduced from 100%to 75% to 25% and then to 0% of trials as longas performance was maintained at the masterylevel of responding. These blocks containedonly eight A-B trials. This method establishedthe baseline conditional discriminations using100% feedback and assessed the maintenance ofthese relations with the reduction of feedback.

The maintenance of the A-B relations wasfollowed with tests for the emergence of thesymmetrical properties of the A-B relation withB-A probes. This B-A test block contained eightB-A trials: two presentations of each trial listedin the B-A section of Table 1. These trials werepresented with no informative feedback. Theblock was repeated up to three times or until aparticipant responded in a class-indicativemanner on all trials (the mastery criterion of100% accuracy). After passing the B-A test, the

582 LANNY FIELDS et al.

Page 9: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

B-C relations were trained in the same manneras A-B relations. The block used for trainingwith 100% feedback contained 16 B-C trials:four presentations of each trial listed in the B-Csection of Table 1. Maintenance of B-Crelations during feedback reduction used ablock that contained eight B-C trials: twopresentations of each trial listed in the B-Csection of Table 1. This was followed by a testfor C-B symmetry that was conducted in thesame manner as the B-A test. The C-B testing

block contained two presentations of each triallisted in the C-B section of Table 1. Afterpassing the C-B test, a maintenance test of bothsymmetrical relations was conducted by pre-senting the B-A and C-B relations together inthe same block of 16 that contained twopresentations of each trial listed in the B-Aand C-B sections of Table 1. This was followedby a test for transitivity with a block thatcontained the eight trials listed in the A-Csection of Table 1. Finally, the emergence of

Table 1

Symbolic Representation of Samples (Sa), Positive Comparisons (Co+), and Negative Comparisons (Co2) Used During

Equivalence Class Formation

Three-member classes Four-member classes

Rel Type Sa Co+ Co2 Rel Type Sa Co+ Co2

A-B BL A1 B1 B4 A3 C3 C2A4 B4 B1A2 B2 B3 C-A EQV C1 A1 A4A3 B3 B2 C4 A4 A1

C2 A2 A3B-A SYM B1 A1 A4 C3 A3 A2

B4 A4 A1B2 A2 A3 C-D BL C1 D1 D4B3 A3 A2 C4 D4 D1

C2 D2 D3B-C BL B1 C1 C4 C3 D3 D2

B4 C4 C1B2 C2 C3 D-C SYM D1 C1 C4B3 C3 C2 D4 C4 C1

D2 C2 C3C-B SYM C1 B1 B4 D3 C3 C2

C4 B4 B1C2 B2 B3 B-D TTY B1 D1 D4C3 B3 B2 B4 D4 D1

B2 D2 D3A-C TTY A1 C1 C4 B3 D3 D2

A4 C4 C1A2 C2 C3 A-D TTY A1 D1 D4A4 D4 D1A2 D2 D3A3 D3 D2

D-B EQV D1 B1 B4D4 B4 B1D2 B2 B3D3 B3 B2

D-A EQV D1 A1 A4D4 A4 A1D2 A2 A3D3 A3 A2

Note. Entries in the Rel column indicate the stimulus–stimulus pairs in the equivalence classes. Entries in the Typecolumn indicate the kind of relation served by each stimulus–stimulus pair. BL indicates the trials used to train thebaseline relations, and SYM indicates the symmetry probe trials. TTY indicates transitivity probe trials. EQV indicatesequivalence probe trials. Each Sa/Co+/Co2 trial was presented two times per block, once each with the positivecomparison presented on the left and the right, and vice versa for the negative comparisons.

EQUIVALENCE CLASS FORMATION AND STATISTICS 583

Page 10: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

the equivalence relations was assessed with a testblock that contained eight C-A trials (C-Asection of Table 1). The occurrence of class-consistent responding on all training and probeblocks would indicate the formation of fourthree-member equivalence classes. For eachderived relations test, a block was repeated upto three times or until a participant respondedin a class-indicative manner on all trials within ablock (the mastery criterion of 100% accuracy).

The next phase was a three-mix probe testthat involved the presentation of A-B, B-A, B-C, C-B, A-C, and C-A trials in one test block.Each relation was presented eight times, all withno informative feedback. This test was present-ed in three blocks, each of which contained 16trials. The presentation of each relation wasbalanced across the three blocks, and each classappeared an equal number of times within andacross these three blocks. Class-consistentperformances on these blocks would indicatethe maintenance of the four three-memberinteraction classes when all baseline relationsand derived relations were presented together.

Once maintenance of the three-memberclasses was established, the class membershipwas expanded by training C-D relations for allfour equivalence classes in blocks of descendingfeedback. After C-D training, participants werepresented with a four-mix test that included allpossible relations, A-B, B-A, B-C, C-B, A-C, C-A, C-D, D-C, A-D, D-A, B-D, and D-B. Eachrelation was assessed with the presentation ofeight trials, as listed in Table 1. This testconsisted of 96 trials that represented all possiblestimulus relation in the four classes presented infour separate blocks containing 24 trials each toavoid participant fatigue. Progress through eachtesting block was not dependent on performance.

In all previous training and testing blocks, asample stimulus on a trial was presented with apositive comparison from the same class and anegative comparison that was drawn from onespecific class (i.e., the locked class: Class 1 withClass 4 and Class 2 with Class 3). Because the

positive and negative comparisons were frominvariant classes, it was possible that the fourclasses would remain intact only in the contextof the stimuli used as negative comparisons.Alternatively, the classes might have remainedintact regardless of the stimuli used as negativecomparisons. These possibilities were evaluatedwith the next battery of probes, called a four-mix-plus test.

The four-mix-plus test involved the presen-tation of trials that contained examples of all ofthe relations used in the four-mix test with thefollowing extension. Each sample and positivecomparison (Co+) from the same class was nowpresented with negative comparisons (Co2)that were drawn from the two classes that hadnot been used during class formation. Forsamples and positive comparisons drawn fromClasses 1 and 4, the negative comparisons weredrawn from Classes 2 and 3, and for samplesand positive comparisons drawn from Classes 2and 3, the negative comparisons were drawnfrom Classes 1 and 4. In addition, the new Co–swere used on different trials. For example, in thefour-mix test, the A1 stimulus would bepresented with B1 as the positive comparisonand B4 as the only negative comparison in everytrial. By contrast, a trial in the four-mix-plus testthat contained A1 and B1 as the sample andpositive comparison would now be presentedwith the novel negative comparisons B2 and B3in two separate trials, but not with B4.

To avoid participant fatigue, the 192 trialswere presented in 16 blocks that contained 12trials each presented once each and in the sameorder for all participants. The correct compari-son appeared with equal probability in the leftand right positions in each block. In addition,each stimulus relation contained three questionsin each block. Because there were three questionsper relation, the number of questions from eachstimulus class could not be balanced per blockgiven this uneven number. Nevertheless, if oneblock contained fewer questions from a certainclass, the following block would correct the

584 LANNY FIELDS et al.

Page 11: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

imbalance by presenting more questions fromthat class. This would create an imbalance withinanother block that was again corrected in thesubsequent block. Thus, the questions from eachrelation were balanced within each block, but thenumber of questions drawn from each class wasbalanced over the entire 16 blocks.

Paper-and-pencil retest. The second version ofthe paper-and-pencil test was administered aftercompletion of computer-based class inductionfor participants in the experimental group, andabout 90 min after the administration of thefirst paper-and-pencil tests for participants inthe control group.

Social validity questionnaire. A social validityquestionnaire assessed the goals, methods, andoutcome of the experiment. Participants an-swered four questions by assignment of scoresfrom 1 to 7 on a Likert scale, with 1 and 7 beingthe lowest and highest rankings, respectively.Once completed, participants were debriefed,given the opportunity to ask questions, provid-ed with a means to contact experimenters in thefuture, and issued course credit.

RESULTS

Time spent in the experiment. The participantsin the class formation group spent from 2.8 to3.5 hr in the experiment; about 1.5 hr wasspent in the formation of the interaction-indicative equivalence classes. The participantsin the control group were given a 1.5-hr delaybetween the completion of the first paper-and-pencil test and the presentation of the secondpaper-and-pencil test. Thus, the time betweentest administrations was equivalent for partici-pants in both conditions.

Formation of three-member equivalence classes.All participants in the experimental groupformed four four-member interaction-indicativeequivalence classes. Therefore, equivalence classformation was depicted using group means foreach phase of training and testing (Figure 3). Aminimum of four blocks were needed to establish

each baseline relation. The A-B and B-Crelations were acquired rapidly, in means of 5.5and 5.6 blocks, respectively. The narrowness ofthe standard error bars indicates the similarity inperformances across participants. With fewexceptions, all emergent relations tests (B-A, C-B, A-C, and C-A) produced mastery levels ofresponding in the first block of a test. The fewparticipants who needed to repeat test blockswere able to meet mastery criterion on the secondpresentation of the block. Along with the masterylevels of responding produced by the baselinerelations (A-B and B-C), these performancesdocumented the formation of four three-mem-ber equivalence classes.

Maintenance of the three-member equivalenceclasses. In all cases, these probes producedmastery levels of responding during the firstpresentation of the test block when all relationswere mixed together, rather than being presentedon an individual basis in separate test blocks.These performances, obtained with all 10participants, demonstrated the maintenance ofall four three-member classes. Thus, the perfor-mances produced by all of the emergent relationswere not compromised by their presentation in asingle test block.

Figure 3. The mean number of blocks needed for allexperimental group participants to achieve masterycriterion during the computer-based equivalence training.Each training and testing phase of equivalence classformation appears as a separate bar, and the left to rightposition of each bar corresponds to the order in whicheach relation was trained or tested. The height of each barindicates the mean number of blocks needed to form abaseline relation or to pass an emergent relations test.

EQUIVALENCE CLASS FORMATION AND STATISTICS 585

Page 12: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

Expansion to four-member equivalence classes.The C-D baseline relations were acquired in amean of 4.7 blocks (Table 2). The criterionused to define the formation of an equivalenceclass was the experimenter-selected score of atleast 90% class-consistent comparison selectionswhen averaged across all four blocks of the four-mix test. Nine of the 10 participants met thiscriterion, which demonstrated the formation ofthe four-member interaction-indicative equiva-lence classes. The stability of accuracy scoresacross the four blocks of the four-mix test alsodemonstrated the immediate emergence of allfour interaction-indicative equivalence classes.One participant (3273) just missed the criterionlevel of responding needed to demonstrate classformation (i.e., 89% correct).

Four-member classes with novel negative com-parisons. The emergence of the three- and four-member classes could have been contextuallylimited to the particular negative comparisonsused for training and testing. The four-mix-plustest evaluated that possibility by presenting alltrials with negative comparisons from all classes(Table 2). In the first three blocks of the four-mix-plus test, performances were typically100% accurate for all 10 participants. Themaintenance of criterion levels of respondingwith the introduction of the four-mix-plus testdemonstrated that the relations among thestimuli in the four interaction-indicative equiv-

alence classes were maintained in the presenceof new Co–s in the baseline and emergentrelations test trials. Notably, these class-indica-tive performances were maintained even withthe sudden substitution of trials that containednew comparisons. These performances thendemonstrated one level of generalization of thefour interaction-indicative equivalence classes.

With a continuation of four-mix-plus testing,different patterns of responding emerged fordifferent participants. Six of the 10 participantsresponded at the mastery level for the entirety ofthe four-mix-plus test, which demonstrated themaintenance of the classes with extensive testingunder conditions of uninformative feedback. Twoof the 4 remaining participants (3311 and 3309)showed some minor performance breakdowns insome of the test blocks (shaded cells in Table 2).For Participants 3311 and 3309, the performancebreakdowns were more precipitous and occurredwith increased frequency in the later test blocks(shaded cells). For them, the classes did not remainintact. Additional research will be needed toidentify factors that are responsible for themaintenance of equivalence relations with contin-ued testing and with novel negative comparisons.

Overall effects: Paper-and-pencil test scores.Figure 4 depicts the overall effects of the twoindependent variables by plotting the meanscores on the paper-and-pencil tests for theparticipants in the experimental and control

Table 2

Scores in Test Blocks on the Two Posttraining Computer Tests

586 LANNY FIELDS et al.

Page 13: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

groups on the first and second administrationsof the test. By design, the pretest scores for bothgroups were very similar to each other. Thus,any post-class-formation differences could notbe attributed to participant-based variables. Inthe control group, the mean posttest score wasonly 2% greater than the pretest score. Theoverlap in standard errors showed that thedifference was not significant. In the experi-mental group, the mean posttest score was 37%higher than the mean pretest score. When theposttest scores were compared, the participantsin the experimental group had paper-and-pencilposttest scores that were 35% greater than thecorresponding scores for the participants in thecontrol group. The difference between groupson the posttest score was significant after

Figure 5. A scattergram showing posttest scores plotted as function of pretest scores for each participant in theexperimental (filled circles) and control (open circles) groups. Two participants in the experimental group producedidentical pretest and posttest scores, indicated by the arrow. Separate regression lines are also shown for the data obtainedfrom participants in the experimental and control groups.

Figure 4. The mean pretest–posttest scores for bothexperimental (filled squares) and control (open circles)groups. The I beams that bracket each data point indicated6 1 SE.

EQUIVALENCE CLASS FORMATION AND STATISTICS 587

Page 14: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

controlling for any potential differences be-tween the groups on pretest scores (ANCOVA,df 19, n 5 21, F 5 42.56, p , .000004). Inaddition, r2 5 0.775 indicated that more than77% of the variance in the values of thedependent variable was accounted for by theexperimental intervention. Finally, effect sizewas d 5 2.3 (Cohen, 1992). Because effect sizesthat are greater than 0.8 are considered to belarge, the obtained effect size obtained in thepresent experiment is exceptionally large.

Performances by matched participants. Thedata in Figure 4 did not permit a comparison ofindividual participants who were matched interms of initial knowledge of interaction. Thatinformation is presented in Figure 5, which plotsposttest scores as a function of pretest scores foreach participant. The diagonal line with a slopeof 1 that began at the origin indicated one-to-onecorrespondences of pre- and posttest scores. Thescores on the pretest varied from 29% to 83% forparticipants in both conditions. For participantsin the control condition, the posttest scores werequite similar to the pretest scores. These scoresstraddled the diagonal line, thereby indicating anearly one-to-one correspondence of pretest andposttest scores.

For participants in the experimental condition,posttest scores were reliably higher than the scoresproduced by matching participants in the controlcondition. Although the posttest scores weresimilar to each other, there was a small increase inposttest score that was directly correlated withpretest score. This was indicated by the shallowpositive slope of the regression line that was fittedto the data obtained from participants in theexperimental condition. The weakness of thecorrelation was documented by the fact that only42% of the variance in the posttest scores wasaccounted for by the pretest scores. For theseparticipants, the magnitude of the increment inposttest score over pretest score became smallerwith increases in pretest scores. This ceiling effectwas inevitable because high pretest scores pre-cluded large increases in posttest scores.

The data presented in Figure 5 can also beviewed in terms of traditional letter grades earnedon a typical classroom quiz. Test scores of at least80% correct correspond to letter grades of A andB. Test scores no greater than 69% correctcorrespond to letter grades of D and F. As can beseen in the posttest data in Figure 5, 10 of the 11participants with grades in the A and B rangewere in the experimental group and one was inthe control group. The 1 participant in thecontrol group who did obtain a high grade hadalready scored a passing grade in the pretest. Bycontrast, all 8 of the participants with grades inthe D and F range were in the control group, andnone were in the experimental group. Thesedifferences could have occurred by chance withan exact probability of .0001 (Fisher’s exact test).If grades on an examination can be used to assesssocial validity in an academic setting, this analysisindexed the high level of social validity that canbe ascribed to equivalence-based instruction.

Social validity. The four questions on thesocial validity questionnaire provided the fol-lowing mean ratings. ‘‘Please rate your currentunderstanding of statistical interactions’’ pro-duced a mean rating of 6 (SE 5 0.5) forparticipants in the experimental group and 3 (SE5 0.5) for the participants in the control group.‘‘Are you happy with the methods used in thisstudy?’’ produced a mean rating of 6 (SE 5 1.0)for participants in the experimental group and 3(SE 5 0.33) for the participants in the controlgroup. ‘‘Are the methods used in this studyacceptable?’’ produced a mean rating of 6 (SE 5

0.66) for participants in the experimental groupand 3 (SE 5 0.66) for the participants in thecontrol group. Thus, participants in the exper-imental group reported that the computertraining was acceptable and effective. ‘‘Is it agood goal to use effective teaching methods toteach the concept of statistical interactions tostudents?’’ produced a mean rating of 6 (SE 5

0.5) for participants in the experimental groupand 6 (SE 5 0.66) for participants in the controlgroup. Finally, during a postexperiment debrief-

588 LANNY FIELDS et al.

Page 15: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

ing, the participants in the experimental groupreported feeling more confident in their under-standing of statistical interactions, and severalstudents reported that they would like to see asimilar teaching format used for other difficultconcepts in statistics.

DISCUSSION

Formation of interaction-indicative equivalenceclasses. Knowledge of statistical interaction wasevaluated with a paper-and-pencil test thatdetermined whether an individual could matchfour different representations of interactionswith each other for four different types ofinteraction. Before intervention, the populationof college students enrolled in a course inintroductory psychology provided correct an-swers to about 54% of the questions on thepretest, indicating a low level of knowledgeregarding the interchangeability of representa-tions of each type of interaction. Participants inan experimental condition were exposed to acomputer-based program that resulted in theformation of four interaction-indicative equiv-alence classes. In this part of the experiment,after training three stimulus–stimulus relationsin each of the four classes, 12 new relationsamong the stimuli in each class emergedimmediately and without benefit of directtraining. Further, the paper-and-pencil testadministered after class formation yielded scoresthat were on average 37% higher than pretestscores. Thus, equivalence class induction pro-cedures established knowledge of the inter-changeability of perceptually distinct represen-tations of four different forms of statisticalinteraction. Because the representations used inthe paper-and-pencil tests differed from thoseused as members of the trained equivalenceclasses, the participants generalized the knowl-edge learned during training to novel exemplars.By implication, those participants should alsobe able to apply what had been learned to newexamples encountered in real-world settings.

Prior research has shown that test repetitioncan increase scores on a test without any explicitintervention (Lievens, Buyse, & Sackett, 2005;Wing, 1980). Therefore, the increment in thepaper-and-pencil test scores after the establish-ment of the equivalence classes could have beeninfluenced by the repetition of the test. Thepresent experiment used a control conditionthat measured the effect of test repetition in theabsence of an intervention. Thus, any effects oftest repetition on score improvements would befactored out by the subtraction of any increasein the control group score from the pretest tothe posttest from the gains obtained in theexperimental condition. The repetition of thetest in the control group produced a 2%increase in test scores. When this estimate issubtracted from the improvements in experi-mental group scores (37%), the computer-basedequivalence intervention accounted for a 35%mean improvement in posttest scores. Thus, testrepetition had a minimal effect on the increasein scores on the test after the establishment ofequivalence classes. The increase in test scoresfor participants in the experimental group canbe attributed to the induction of the fourinteraction-indicative equivalence classes.

Social validity and pedagogical implications.The study ended with an evaluation of socialvalidity for the participants in the experimentalgroup. They indicated that the treatment goalswere valid, the procedures were acceptable, andtheir changes in test scores were important. Duringthe debriefing phase of the experiment, manyexperimental group participants reported that they‘‘finally got’’ what constituted a statistical interac-tion. This verbal report is supported by theirimproved performances from pre- to posttesting.In summary, these postexperimental commentsabout the procedure support the validity of itsusage to teach this subject matter.

Equivalence class formation was an effectivemethod for teaching individuals to identifyequivalent representations of the combinedeffects of two variables on some dependent

EQUIVALENCE CLASS FORMATION AND STATISTICS 589

Page 16: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

variable. Given the fact that many studentsstruggle with statistics in particular, the stimuluscontrol technology embodied in the establishmentof equivalence classes can provide an importantcontribution to the longstanding debate about theimprovement of the skills deficits these individualspresent. Indeed, establishing classes of equivalentstimuli may be an effective technology forremedying both students’ inabilities to apply theconcepts learned in instruction to real-worldproblems and the ‘‘behavioral weaknesses’’ iden-tified by Bradstreet (1996) and Seipel and Apigian(2005), respectively.

Factors that influence the likelihood of classformation. Sidman (1987), Carrigan and Sid-man (1992), and Johnson and Sidman (1992)have argued that the use of only two comparisonstimuli could lead to responding away from aCo2 (called a reject relation) rather thanresponding to an experimenter-defined sample–Co+ relation (called a select relation). If so,responding would give the illusion of control bythe relation between a sample and a comparisonfrom the same class and of class formation. In thepresent experiment, although training andtesting were conducted with two comparisonsper trial, class-consistent performances weremaintained during the four-mix-plus tests, whichinvolved the presentation of trials with twoadditional negative comparisons. Responding,then, had to be controlled by the relationsbetween the samples and the comparisons thatcame from the same class as the samples (i.e., byselect relations). Thus, four four-member equiv-alence classes were formed using only twocomparisons per trial. This finding is consistentwith recent data that showed similar likelihoodsof class formation using two, three, and sixcomparisons per trial (Saunders, Chaney, &Marquis, 2005). Perhaps the establishment ofclasses using locked pairs is one parameter thatincreases the likelihood of forming equivalenceclasses using only two comparisons.

Many studies have shown that equivalenceclass formation is optimized with classes that have

only one nodal stimulus and that have sample-as-node or comparison-as-node training structuresinstead of linear series training structures (Arnt-zen & Holth, 1997; Green & Saunders, 1998;Saunders & Green, 1999). In the presentexperiment, however, all participants formedfour four-member equivalence classes with rapid-ity even though they contained two nodal stimuliinstead of one and had linear series trainingstructures. These results raise questions regardingthe validity of the general view mentioned above.Perhaps it was the use of a simple-to-complextraining and testing protocol and the use ofsemantically meaningful stimuli that were re-sponsible for the reliable and rapid formation ofequivalence classes that contained a few nodesand had a linear series training structure.

Generalization of relations in equivalenceclasses. In many situations, it is necessary toestablish behavioral repertoires that are expectedto occur in contexts other than those in whichthe behavior is trained (Stokes & Baer, 1977).Within the context of education, a student isexpected to respond correctly to appropriateand novel examples that differ from the stimulior relations used during formal instruction.

In the present experiment, the generalizationof the relations in the interaction-indicativeequivalence classes to novel exemplars wasassessed in five ways. One involved determiningwhether the within-class relations remainedintact when tested in the context of novelnegative comparisons. This circumstance wasevaluated with the results of the four-mix-plustest. Specifically, it is possible that the relationsamong the stimuli in one equivalence classwould remain intact only when tested in thepresence of the negative comparisons used inthe training trials. The results of the four-mix-plus tests proved that the relations in each classremained intact even when tested in the contextof comparisons drawn from classes not usedduring training. These data then demonstratedthe generalization of the emergent relations tonew contexts that varied in terms of negative

590 LANNY FIELDS et al.

Page 17: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

exemplars. Other tests assessed the generaliza-tion of relational control when the within-classrelations contained stimuli that were variants ofthe stimuli used to establish the classes, when testtrials were presented in a format that differedfrom that used during computer-based instruc-tion, when test trials contained a differentnumber of choices from which to select thecomparison that was from the same class as thesample, and when the order of test questions wascontrolled by the participant rather than by theexperimenter. Generalization to these fourmodes of testing was assessed concurrently withthe performances recorded on the post-class-formation paper-and-pencil tests. Specifically,the tests contained questions that differed incontent from the stimuli used when forming thecorresponding equivalence classes during com-puter-based instruction. The format of thequestions in the paper-and-pencil test differedin many ways from the trial format used duringthe formation of equivalence classes. If choices inthe test are equated to the comparisons presentedin the class-formation procedures, the twodiffered in terms of using two versus four choicesper question or trial. Finally, whereas theparticipant did not control the order of trialpresentations during the computer-based four-mix and four-mix-plus tests, the participant wasfree to scan the questions in the paper-and-penciltest in any order and to change answers to anyquestion prior to submitting it. In most cases,participants responded with high levels ofaccuracy on the post-class-formation paper-and-pencil tests.

Generalized equivalence classes. The perfor-mances mentioned above demonstrated thegeneralization of the relations among the stimuliin each of the equivalence classes to novelexemplars presented in novel formats. This sortof generalization is also characteristic of gener-alized equivalence classes, classes that contain setsof perceptually disparate stimuli and otherstimuli that are perceptual variants of the formerstimuli (Adams et al., 1993; Belanich & Fields,

2003; Branch, 1994; Fields & Reeve, 2000;Lane, Clow, Innis, & Critchfield, 1998). Thus,the classes that emerged in the present experi-ment were generalized equivalence classes.

The generalization that occurred to novelstimulus exemplars in the present experimentwas also reported by Ninness et al. (2006) butnot by Cowley et al. (1992) and Lynch andCuvo (1995). A number of studies haveidentified training and testing parameters thatbroaden the range of variants that come tofunction as members of generalized equivalenceclasses. (Belanich & Fields, 2003; Fields et al.,1991, 2002; Fields & Reeve, 2001; Galizio,Stewart, & Pilgrim, 2004). Perhaps the gener-alization problems reported by Cowley et al.and Lynch and Cuvo could be overcome by theinclusion of the above-mentioned parameters inreplications of their experiments.

Limitations of the present study. This experi-ment had four limitations. First, it formed classeswith only four exemplars. An interaction,however, can have other representations such asbar graphs, tables of data, and summarystatements of factorial ANOVAs. The expansionof class size to include these exemplars and theirvariants would extend a student’s ability toidentify the wide range of representations ofinteractions that would be encountered innatural settings. Second, to understand interac-tions, a student should be able to identifydifferent representations of an interaction, whichuses a selection-based or receptive repertoire, andalso describe an interaction verbally or in writtenform, which uses a production-based or expres-sive repertoire. The present study explored theemergence of the former but not the latterrepertoire. Third, the present experiment did notdetermine how different modes of instructionsuch as equivalence class formation, listening totraditional lectures, and self-study of textbookmaterial affect the acquisition of knowledge ofstatistical interactions. Fourth, the present studydemonstrated the feasibility of using equiva-lence-based instruction to teach one particular

EQUIVALENCE CLASS FORMATION AND STATISTICS 591

Page 18: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

college-level subject matter. A similar approachmight be used to establish understanding of thecontents of other academic subject matters.Additional research would be needed to addresseach of these limitations.

Individualized education and behavioral diag-nostics. Equivalence class procedures can isolatespecific relational deficits among the stimuli thatshould be functioning as members of a particularinteraction-indicative equivalence class. For ex-ample, although a student may accurately identifya particular type of interaction when given a graphand a description of the effects of the variablesdepicted in that graph, the same individual mightnot identify that type of interaction when given adescription of the graph. Once discovered, thatinformation might be used to design a minimalintervention that should induce all of the deficientor missing relations in a class. In short, a system ofbehavioral diagnostics (Sidman, 1986) could beused to develop tailor-made training programsthat would correct the stimulus control deficien-cies in an individual’s behavioral repertoire with aminimal amount of training and testing—anindividualized instruction process that is largelyabsent in standardized group-oriented teachingcurricula. Such a strategy, then, should lead to thedevelopment of a technology of teaching (Skinner,1968) and a personalized system of instruction(Keller, 1968; Pear & Crone-Todd, 1999).

REFERENCES

Adams, B. J., Fields, L., & Verhave, T. (1993). Formationof generalized equivalence classes. The PsychologicalRecord, 43, 553–566.

Arntzen, E., & Holth, P. (1997). Probability of stimulusequivalence as a function of training design. ThePsychological Record, 47, 309–320.

Belanich, J., & Fields, L. (2003). Generalized equivalenceclasses as response transfer networks. The PsychologicalRecord, 53, 373–414.

Bradstreet, T. E. (1996). Teaching introductory statisticscourses so that nonstatisticians experience statisticalreasoning. The American Statistician, 50, 69–78.

Branch, M. (1994). Stimulus generalization, stimulusequivalence, and response hierarchies. In S. C. Hayes,L. J. Hayes, M. Sato, & K. Ono (Eds.), Behavioranalysis of language and cognition (pp. 51–70). Reno,NV: Context Press.

Carrigan, P. F., & Sidman, M. (1992). Conditionaldiscrimination and equivalence relations: A theoret-ical analysis of control by negative stimuli. Journal ofthe Experimental Analysis of Behavior, 58, 183–204.

Christopher, A. N., & Marek, P. (2002). A sweet tastingdemonstration of random occurrences. Teaching ofPsychology, 29, 122–125.

Cohen, J. (1992). A power primer. Psychological Bulletin,112, 155–159.

Connell, J. E., & Witt, J. C. (2004). Applications ofcomputer-based instruction: Using specialized soft-ware to aid letter-name and letter-sound recognition.Journal of Applied Behavior Analysis, 37, 67–71.

Cowley, B. J., Green, G., & Braunling-McMorrow, D.(1992). Using stimulus equivalence procedures to teachname-face matching to adults with brain injuries.Journal of Applied Behavior Analysis, 25, 461–475.

Davidson, G. V., & Kroll, D. L. (1991). An overview ofresearch on cooperative learning related to mathe-matics. Journal of Research in Mathematics Education,22, 362–365.

de Rose, J. C., de Souza, D. G., & Hanna, E. S. (1996).Teaching reading and spelling: Exclusion and stim-ulus equivalence. Journal of Applied Behavior Analysis,29, 451–469.

Fields, L., Matneja, P., Varelas, A., Belanich, J., Fitzer, A.,& Shamoun, K. (2002). The formation of linkedperceptual classes. Journal of the Experimental Analysisof Behavior, 78, 271–290.

Fields, L., & Reeve, K. F. (2000). Synthesizing equiva-lence classes and natural categories from perceptualand relational classes. In J. C. Leslie & D. Blackman(Eds.), Experimental and applied analysis of humanbehavior (pp. 59–83). Reno, NV: Context Press.

Fields, L., & Reeve, K. F. (2001). A methodologicalintegration of generalized equivalence classes, naturalcategories, and cross-modal perception. The Psycho-logical Record, 51, 67–87.

Fields, L., Reeve, K. F., Adams, B. J., & Verhave, T.(1991). Stimulus generalization and equivalenceclasses: A model for natural categories. Journal of theExperimental Analysis of Behavior, 55, 305–312.

Fields, L., Reeve, K. F., Rosen, D., Varelas, A., Adams, B. J.,Belanich, J., et al. (1997). Using the simultaneousprotocol to study equivalence class formation: Thefacilitating effects of nodal number and size of previouslyestablished equivalence classes. Journal of the ExperimentalAnalysis of Behavior, 67, 367–389.

Fields, L., & Verhave, T. (1987). The structure ofequivalence classes. Journal of the ExperimentalAnalysis of Behavior, 48, 317–332.

Fields, L., Verhave, T., & Fath, S. (1984). Stimulusequivalence and transitive associations: A methodo-logical analysis. Journal of the Experimental Analysis ofBehavior, 42, 143–157.

Galizio, M., Stewart, K. L., & Pilgrim, C. (2004).Typicality effects in contingency-shaped generalizedequivalence classes. Journal of the ExperimentalAnalysis of Behavior, 82, 253–273.

592 LANNY FIELDS et al.

Page 19: EQUIVALENCE CLASS FORMATION: A METHOD FOR TEACHING STATISTICAL

Green, G., & Saunders, R. R. (1998). Stimulusequivalence. In K. A. Lattal & M. Perone (Eds.),Handbook of research methods in human operantbehavior (pp. 229–262). New York: Plenum.

Guercio, J. M., Podolska-Schroeder, H., & Rehfeldt, R. A.(2004). Stimulus equivalence technology to teachemotion recognition skills to adults with acquiredbrain injury. Brain Injury, 18, 593–601.

Hinde, R. J., & Kovac, J. (2001). Student active learningmethods in physical chemistry. Journal of ChemicalEducation, 78, 93–99.

Imam, A. (2006). Experimental control of nodality viaequal presentations of conditional discriminations indifferent equivalence protocols under speed and no-speed conditions. Journal of the Experimental Analysisof Behavior, 85, 107–124.

Johnson, C., & Sidman, M. (1992). Conditionaldiscriminations and equivalence relations: Controlby negative stimuli. Journal of the ExperimentalAnalysis of Behavior, 59, 333–347.

Keller, F. S. (1968). Good-bye teacher. Journal of AppliedBehavior Analysis, 1, 79–89.

Keller, F. S., & Schoenfeld, W. N. (1950). Principles ofpsychology. New York: Appleton-Century-Crofts.

Lane, S. D., Clow, J. K., Innis, A., & Critchfield, T. S.(1998). Generalization of cross-modal stimulusequivalence classes: Operant processes as componentsin human category formation. Journal of the Exper-imental Analysis of Behavior, 70, 267–280.

LeBlanc, L. A., Miguel, C. F., Cummings, A. R.,Goldsmith, T. R., & Carr, J. E. (2003). The effectsof three stimulus-equivalence testing conditions onemergent US geography relations of children diagnosedwith autism. Behavioral Interventions, 18, 279–289.

Lievens, F., Buyse, T., & Sackett, P. R. (2005). Retest effectsin operational selection settings: Development and testof a framework. Personnel Psychology, 58, 981–1007.

Lynch, D. C., & Cuvo, A. J. (1995). Stimulus equivalenceinstruction of fraction-decimal relations. Journal ofApplied Behavior Analysis, 28, 115–126.

Mulhern, G., & Wylie, J. (2004). Changing levels ofnumeracy and other core mathematical skills amongpsychology undergraduates between 1992 and 2002.British Journal of Psychology, 95, 355–370.

Nasser, F. (1999). Prediction of statistics achievement. InProceedings of the International Statistical Institute52nd Conference (Vol. 3, pp. 7–8). Helsinki, Finland.

Ninness, C., Barnes-Holmes, D., Rumph, R., McCuller,G., Ford, A. M., Payne, R., et al. (2006). Transfor-mations of mathematical and stimulus functions.Journal of Applied Behavior Analysis, 39, 299–321.

Pear, J. J., & Crone-Todd, D. E. (1999). Personalizedsystems of instruction in cyberspace. Journal ofApplied Behavior Analysis, 32, 205–209.

Peden, B. F. (2001). Correlational analysis and interpre-tation: Graphs prevent gaffes. Teaching of Psychology,28, 129–131.

Rosenthal, B. (1992). No more sadistics, no more sadists,no more victims [editorial]. UMAP Journal, 13,281–290.

Saunders, R. R., Chaney, L., & Marquis, J. G. (2005).Equivalence class establishment with two, three, andfour-choice matching-to-sample by senior citizens.The Psychological Record, 55, 539–559.

Saunders, R. R., & Green, G. (1999). A discriminationanalysis of training-structure effects on stimulusequivalence outcomes. Journal of the ExperimentalAnalysis of Behavior, 72, 117–137.

Seipel, S. J., & Apigian, C. H. (2005). Perfectionism instudents: Implications in the instruction of statistics.Journal of Statistics Education, 13, Retrieved February16, 2006, from http://www.amstat.org/publications.jse/v13n2/seipel.html

Sidman, M. (1971). Reading and audio-visual equivalenc-es. Journal of Speech and Hearing Research, 14, 5–13.

Sidman, M. (1986). The measurement of behavioraldevelopment. In N. A. Krasnegor, D. B. Gray, & T.Thompson (Eds.), Advances in behavioral pharmacol-ogy: Vol. 5. Developmental behavioral pharmacology(pp. 43–52). Hillsdale, NJ: Erlbaum.

Sidman, M. (1987). Two choices are not enough. BehaviorAnalysis, 22, 11–18.

Sidman, M., & Cresson, O., Jr. (1973). Reading andcrossmodal transfer of stimulus equivalence in severeretardation. American Journal of Mental Deficiency,77, 515–523.

Sidman, M., Kirk, B., & Willson-Morris, M. (1985). Six-member stimulus classes generated by conditional-discrimination procedures. Journal of the ExperimentalAnalysis of Behavior, 43, 21–42.

Sidman, M., & Tailby, W. (1982). Conditional discrim-ination vs. match-to-sample: An expansion of thetesting paradigm. Journal of the Experimental Analysisof Behavior, 37, 5–22.

Simon, J. L., & Bruce, P. (1991). Resampling: A tool foreveryday statistical work. Chance, 4, 23–32.

Skinner, B. F. (1968). The technology of teaching. Engle-wood Cliffs, NJ: Prentice Hall.

Smeets, P. M., & Barnes-Holmes, D. (2005). Establishingequivalence classes in preschool children with one-to-many and many-to-one training protocols. Behav-ioural Processes, 69, 281–293.

Stokes, T. F., & Baer, D. M. (1977). An implicittechnology of generalization. Journal of AppliedBehavior Analysis, 10, 349–367.

Ward, L. G., & Kaflowitz, N. G. (1986). Issues in researchtraining … again? Counseling Psychologist, 14,139–145.

Wing, H. (1980). Practice effects with traditional mentaltest items. Applied Psychological Measurement, 4,141–155.

Received January 7, 2008Final acceptance June 18, 2008Action Editor, Chris Ninness

EQUIVALENCE CLASS FORMATION AND STATISTICS 593