the testing effect: applications in composition …

125
THE TESTING EFFECT: APPLICATIONS IN COMPOSITION PEDAGOGY By Adam Danger Channel A Project Presented to The Faculty of Humboldt State University In Partial Fulfillment of the Requirements for the Degree Master of Arts in English: Teaching Writing Committee Membership Dr. Corey Lewis, Committee Chair Dr. Suzanne Scott, Committee Member Dr. Nikola Hobbel, Graduate Coordinator May 2014

Upload: others

Post on 20-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

THE TESTING EFFECT: APPLICATIONS IN COMPOSITION PEDAGOGY

By

Adam Danger Channel

A Project Presented to

The Faculty of Humboldt State University

In Partial Fulfillment of the Requirements for the Degree

Master of Arts in English: Teaching Writing

Committee Membership

Dr. Corey Lewis, Committee Chair

Dr. Suzanne Scott, Committee Member

Dr. Nikola Hobbel, Graduate Coordinator

May 2014

Page 2: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

ABSTRACT

THE TESTING EFFECT: APPLICATIONS IN COMPOSITION PEDAGOGY

Adam Channel

This project advocates for the use of frequently-administered, low-stakes tests to

enhance student learning of the disciplinary content of composition. Though there is

widespread disdain for the role of standardized tests in education today, not all forms of

testing are the same, and some forms of testing can be very effective teaching tools. Tests

should ideally be locally generated with relevance to class content, frequently

administered, and low-stakes, with feedback provided shortly after testing. This project

lays the groundwork for how testing can dovetail into the student-centered dialogic

classroom, a common practice in composition today. Theories of human learning

(including the testing effect and the spacing effect) show that active retrieval is the best

way to ensure long-term retention and understanding. Frequent tests provide active-

retrieval opportunities for students, which should enhance learning and retention. The

project concludes with how to build tests specifically for the instruction of first-year

college composition.

ii

Page 3: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

TABLE OF CONTENTS

ABSTRACT ........................................................................................................................ ii

TABLE OF CONTENTS ................................................................................................... iii

TABLE OF FIGURES ........................................................................................................ v

CHAPTER 1: INTRODUCING TEST-ENHANCED LEARNING .................................. 1

Retrieval-Based Learning ................................................................................................ 1

Composition History and the Move Away from Testing ................................................ 5

The Banking Model of Education ................................................................................... 8

Critiques of Standardized Tests and High-Stakes Assessment ..................................... 11

CHAPTER 2: THE STUDENT-CENTERED COMPOSITION CLASSROOM ............ 15

Moving towards a Dialogic Classroom ......................................................................... 15

The Writing Process in a Dialogic Classroom .............................................................. 23

Building Writing Fluency with Journal Writing ........................................................... 27

The Dialogic Classroom and the Academic Discourse Community ............................. 29

CHAPTER 3: HUMAN LEARNING ............................................................................... 35

The Physiology of Learning and Memory .................................................................... 35

Cognitive Definitions of Learning and Memory ........................................................... 41

Three Theories of Learning ........................................................................................... 47

Social Constructivist Theories of Learning and Testing ............................................... 53

CHAPTER 4: THE TESTING EFFECT AND THE SPACING EFFECT ...................... 61 iii

Page 4: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

What is the Testing Effect? ........................................................................................... 61

Studies Reporting a Testing Effect................................................................................ 64

The Spacing Effect ........................................................................................................ 80

CHAPTER 5: TEST-ENHANCED LEARNING IN COMPOSITION ........................... 89

Indirect Benefits of Testing ........................................................................................... 89

Testing and Grades ........................................................................................................ 96

The Benefits of Tests According to Three HSU Professors.......................................... 98

Building, Administering, and Grading Tests .............................................................. 103

Recognition and Free-Response Questions ................................................................. 107

WORKS CITED ............................................................................................................. 113

iv

Page 5: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

TABLE OF FIGURES

Figure 1: “Neuron” ........................................................................................................... 36

Figure 2: “Neuronal Communication” .............................................................................. 38

Figure 3: “Stage Theory of Memory” ............................................................................... 42

Figure 4: “Performance on Immediate and Delayed Tests” ............................................. 66

Figure 5: “Testing Schedule Shows a Forgetting Curve” ................................................. 68

Figure 6: “Study-Test-Study-Test (STST) Most Effective Learning Strategy” ............... 70

Figure 7: “Proportion Correct in Immediate and Delayed Recall” ................................... 73

Figure 8: “Word Recall on Immediate and Delayed Tests” ............................................. 74

Figure 9: “Student Performance Averaged across Unit Exams” ...................................... 77

Figure 10: “Hypothetical Forgetting Curve 1” ................................................................. 82

Figure 11: “Hypothetical Forgetting Curve 2” ................................................................. 86

Figure 12: “Hypothetical Forgetting Curve 3” ................................................................. 87

v

Page 6: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

1

CHAPTER 1: INTRODUCING TEST-ENHANCED LEARNING

Retrieval-Based Learning

This project argues that student learning will be enhanced if teachers frequently

administer low-stakes tests. This chapter will first define what a “test” is. It will proceed

to describe a brief history of testing in composition, and finally it will rationalize the

current lack of testing seen in composition practice today. Primarily, this chapter argues

the lack of testing is due to three widely held misconceptions: (1) tests promote the

“banking model of education,” (2) tests do not encourage critical thinking, and (3) all

forms of tests are subject to the same deficiencies as standardized tests.

What we know about the cognition of learning has considerably advanced in

recent decades. Today we have a research-based theory of learning that is grounded in

physiological and empirical data gathered from brain-imaging and cognitive studies.

Despite this, there remains a schism between the laboratory and the field, a gap between

theory and practice (see . Recent advances in cognitive science and studies on memory

and learning do not seem to have had a significant impact on composition pedagogy. This

thesis argues that two theories of learning in particular are of the utmost importance but

remain widely unknown: the testing effect and the spacing effect. Put simply, these

theories hold that long-term retention is improved through repeated testing over time.

Page 7: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

2

Composition practice today relies on a varied set of skills and knowledge; students’

learning and retention of this course content can be enhanced through the introduction of

frequently-administered low-stakes tests.

It is important to define what is meant by the word “test” in this project,

especially because for many people the term carries negative connotations. “Test” or

“testing” makes many people think of summative high-stakes assessment or top-down

administered standardized tests. Both of these types of testing have serious deficiencies

that will be elaborated on later in this chapter. Testing, however, can come in many

different forms. For the purposes of understanding how tests operate cognitively, in this

project the terms “tests” or “testing” are defined as “an induced act of retrieval”—any

sort of material or question which necessitates a “retrieval” action on the part of the

reader. Retrieval is the process of accessing information stored in the memory and

articulating it, generally in response to an inquiry. You are asked a question, and you

provide a written answer: that is the meaning of a test in this project.

In order for us to understand how taking a test can enhance learning, it is

important to distinguish between two types of learning: passive and active. Cognitive

research would describe reading a textbook or reviewing class notes as “passive

learning,” because reading is an input-only activity (Roediger and Karpicke 181; Knight

and Wood 298). Learning through a lecture is similarly defined as passive learning for

the same reason. However, if one was to convert their notes and annotations into flash

Page 8: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

3

cards that they could test themselves on (with the possibility of failure to recall), that

would be considered “active learning” because it would require an act of retrieval on the

part of the learner. As another example, when we read the statement, “Black holes are the

remnants of the gravitational collapse of a star,” our brain connects that to existing

knowledge structures and schemata. We can read that same statement numerous times

and recognize it each time; this type of learning is passive because there is no demand

made on the brain to reproduce the information. On the other hand, active learning is

demonstrated when that same statement is turned into a question by occluding keywords,

like “_________ are the remnants of the gravitational collapse of a star” or “Black holes

are the remnants of _______.” In this case the brain must fill in the blank with the correct

information, necessitating an act of retrieval.

This act of retrieval will be successful if students remember the relevant

information (like the black hole example), but, as we all know, memory is not perfect.

There is a clear difference in difficulty between the two types of learning. Passive

learning only requires recognition and comprehension; active learning, on the other hand,

requires retrieval with an increased possibility of failure. The act of retrieval required to

answer the fill-in-the-blank questions (also called “cloze deletion”) will produce better

long-term retention than passive reading of the same statement. Veteran scholar on the

subject of learning, professor H. L. Roediger of Washington University in St. Louis,

describes it this way: “We are much more likely to remember something again if we

Page 9: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

4

actively retrieve it than if we are passively exposed to it in restudying” (“Advice from

Cognitive Psychologist…”). This result has been found in many studies and is referred to

in the literature as “the testing effect” (these claims will be referenced and further

substantiated in Chapters 3 and 4).

Like all fields, composition has a set of specialized terms. One reason why

retrieval practice is important is that when students reach into memory and recall a term

or phrase from the class lectures or course readings, then it is no longer just the term that

was on the board or in the reading; it has moved toward becoming their own term. Once

students have internalized that term or phrase, they will be able to reproduce it in their

own writing and cognition. In composition, these terms help us communicate more

specifically about writing (comma splices, topic sentences, thesis, dependent clause, etc.).

Chapters 3 and 4 show that our ability to remember terms like these generally depends on

the number of times we have retrieved them. Peer review is an integral part of modern

composition practice, and internalizing these specialized terms will help students improve

their peers’ writing, in addition to their own. Since it is necessary—or at least

beneficial—to know these terms when discussing writing, a pedagogical method that is

effective in helping students remember them would be of great benefit in composition.

This project argues that teachers can best facilitate students’ successful learning and long-

term retention of class content in composition by giving retrieval opportunities to

students through frequently administered low-stakes tests.

Page 10: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

5

Composition History and the Move Away from Testing

For most of human history, story tellers in oral cultures would verbally repeat the

extensive myths and epics of their people. This level of memorization required great skill

and tenacity, and could only be reinforced through continual testing and retrieval. In

classical times, Memoria, or memorization, was one of the five canons of rhetoric

described by Aristotle. In his essay on memory, he wrote, “Exercise in repeatedly

recalling a thing strengthens the memory” (202). Scholars have known the power of

testing, and written about it for centuries. Francis Bacon wrote in The New Organanon,

published in 1620, "If you read a piece of text through twenty times, you will not learn it

by heart so easily as if you read it ten times while attempting to recite from time to time

and consulting the text when your memory fails" (143). In The Principles of Psychology

(1890), William James also argued for the power of testing through active recitation,

writing:

A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. I mean that in learning (by heart, for example), when we almost know the piece, it pays better to wait and recollect by an effort from within, than to look at the book again. If we recover the words in the former way, we shall probably know them the next time; if in the latter way, we shall very likely need the book once more (646).

These famous authors and others have written about the power of retrieval. Retrieval

Page 11: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

6

practice has long been an acknowledged part of learning. The understanding that retrieval

is an essential part of learning should be a guiding principle in our teaching practice.

This project urges for retrieval practice through the use of frequent tests; however,

the role of testing in education is currently a fiercely debated subject. The passage of the

2001 No Child Left Behind act and the use of high-stakes standardized testing have

pushed many people, including teachers, to reject testing. This rejection seems especially

apparent among teachers in composition programs, many of whom follow Expressivist

practices. With the Expressivist movement of the 60s and 70s, there came a rejection of

testing and drilling in composition practice. Christopher Burham, in his chapter in A

Guide to Composition Pedagogies describes the divergence of two composition

pedagogies: Expressivism and Current-Traditional Rhetoric (CTR). He describes

Expressivism as “The movement [that] originated […] as a set of values and practices

opposing current-traditional rhetoric” (Tate 21). Current-Traditional Rhetoric (CTR), a

school of composition instruction developed in the early nineteenth century, relied

heavily on a prescriptive notion that there is a syntactically and stylistically correct and

incorrect way to write. The goal of CTR was to teach students how to conform to those

standards through drilling and regular testing. The Expressivist movement in the 60-70s

took a sharp turn away from testing and focused on “writers writing” rather than “writer’s

writing”—or process rather than product.

Burnham describes two figureheads in the Expressivist movement, Donald

Page 12: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

7

Murray and Ken Macrorie, as opponents of rules and directive feedback. He writes of

Murray's A Writer Teaches Writing (1968) that “Murray's use of non-directive feedback

from both teacher and students turns the responsibility for writing back to the student”

(23). Expressivism is characterized by a focus on language as a tool for personal rather

than social expression. The Expressivists believed that just getting students to write and

write and write and to do so uninhibited by top-down rules was the best teaching practice.

The movement recognized that good writing needs more than good mechanics and

syntax, so the focus of instruction shifted away from the then-prevalent practices of

drilling and testing on rules of grammar and style. Composition instruction today owes a

great deal to Expressivism, and many of our current practices stem directly from the

Expressivist school of thought. However, with the current lack of testing in composition,

this project poses the question: "Did we throw the baby out with bathwater?"

Though some expressivists would disagree, this project argues that there is value

in teaching rules of writing. This disciplinary content is valuable because it allows us to

engage in meta-discourse about the process and products of writing. Teaching the

disciplinary content of composition through test-enhanced learning isn’t guaranteed to

improve students’ writing abilities, but I argue that there should be positive transfer

between the two activities. For example, if students can identify and correct a run-on

sentence on a test, then they are more likely to be able to find run-on sentences and

prevent them in their own writing. As another example, composition classes teach that

Page 13: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

8

writing is a process with different stages, and each of those stages has certain activities

that a writer can use. During pre-writing we can use free-writing, brainstorming

activities, mind mapping, and outlining to help our writing. We teach these activities, but

if students don’t have retrieval opportunities (i.e. if they do not practice outside class or

on tests) it is unlikely they will remember all of these activities. Teachers use retrieval

practice for multiplication tables, or learning a new language, but the idea that you can

use it for more complex ideas is not widely appreciated. For example, when students

learn words and verb conjugations for another language, Spanish for instance, they often

use flash cards with the English word on one side and the Spanish corollary on the other,

and teachers say, “Practice until you really know it. Practice until it’s completely

automatic.” In composition practice we want the same type of automaticity with writing

terms and practices. This project argues that this type of automaticity will be produced

with frequently-administered low-stakes tests.

The Banking Model of Education

In the 1970’s influential author Paolo Friere critiqued the common practice of

lecturing, arguing that contemporary academic systems teach “the banking model of

education.” In the banking model, the teacher is the exclusive authority who stands at the

front of the class while students sit in desks, all facing forward, posited as empty vessels

waiting to be filled by the teacher with knowledge. Friere critiques such authoritative

Page 14: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

9

models of education, writing

Instead of communicating, the teacher issues communiques and makes deposits which the students patiently receive, memorize, and repeat. This is the "banking" concept of education, in which the scope of action allowed to students extends only as far as receiving, filing, and storing the deposits (58).

Friere argues that this model does not foster critical thinking skills in students and

conditions students to unquestioningly trust teachers and classroom content.

Unfortunately, many consider teaching through tests to be a variation on the banking

model, arguing that it leads to convergence on a single answer given by an authority

figure, or they argue that teaching through tests does not lead to the development of

critical thinking. The banking model of education positions students as the passive

recipients of education, and is therefore not structured in a manner that encourages the

development of critical thinking skills. This history of valid critiques of the banking

model of education has unfortunately led many to dismiss all forms of testing as falling

within the banking model. This blanket dismissal of all forms of testing is problematic

because it discards forms of testing that can improve critical thinking and enhance

student authority.

Our educational systems today, and the role of tests as a teaching tool, are

influenced by history. Gordon Wells of the University of Toronto, writing in the

anthology Vygotskian Perspectives of Literary Research, explains that universal public

education through mandatory attendance at school is a “historically and culturally

Page 15: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

10

localized activity system that owes more to models of industrial mass production than to

that of development through assisted participation in social activity” (Lee 59). In his

widely recognized 2010 TED talk, “Changing Education Paradigms” (with over 13

million views to date on youtube and TED.com), Ken Robinson draws a parallel between

the school and the factory-line—both have hyper-specialization, ringing bells, and

separate facilities. In this model, testing is seen primarily as a means of assessment. Like

a quality control stamp in a factory, tests are commonly used as a means to categorize

and rank students by their performance. The factory and the banking models of

education, which many students today have experienced, position students as the

inheritors of knowledge, rather than the co-producers of understanding.

However, teaching through tests is not necessarily a banking model practice. If

done correctly, test-enhanced learning can encourage creativity and facilitate the student-

directed classroom that modern composition instruction relies on. Teaching with tests

may seem to be opposed to the development of critical thinking skills; perhaps this in part

explains the current lack of testing in composition. Indeed, tests should not be considered

the end-all solution, and instead should be seen as a supplement to a wide variety of other

instructional methods. Frequent tests will enhance learning, which will increase student

authority. When used alongside a variety of other teaching practices, this should enhance

the development of critical thinking.

Page 16: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

11

Critiques of Standardized Tests and High-Stakes Assessment

In the abundant critiques of testing today, it appears that many are conflating the

issues seen with standardized tests to all forms of testing. There are many different forms

and uses of tests, however, which can eliminate the myriad disadvantages of top-down

standardized tests. The type of testing this project advocates avoids these problems

because every test is designed by the local teacher specifically for his or her students and

curriculum. In addition to being locally generated, tests should also be frequently

administered. This section will consider some of the most significant deficiencies of

standardized tests and the changes that standardized testing have brought to the education

system. Such a consideration is necessary because it will show us by example what not to

do, and help rationalize the current lack of testing seen in composition.

Federally administered standardized tests have been a compulsory part of K-12

education since the passage of the federal Elementary and Secondary Education Act of

1965. The 2001 revision, No Child Left Behind, increased the importance of

standardized-test results in determining allocation of federal funding for schools, and by

extension, the importance of receiving federal funding has driven schools to pressure

teachers to emphasize these standardized tests in their classrooms. According to Dianne

Ravitch, previous U.S. Assistant Secretary of Education, “So now we have schools being

closed and people getting bonuses all around the student test scores. It’s made testing,

somehow, the central activity of American public schools today, which is just so wrong”

Page 17: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

12

(Kastenbaum “The high stakes of standardized testing”). Ravitch is a former political

figure and frequently seen pundit on the subject of testing and education in the media

today. She is representative of the many people who feel that standardized tests are an

encroachment on the rights of local teachers.

Focusing instruction around top-down imposed standardized tests limits teachers’

choices for class materials and methods of instruction and encourages “teaching to the

test.” Because schools are held accountable for their students’ performance on

standardized tests, if the students perform poorly, then schools blame their teachers.

Andre Perry writes in an article titled “Education Reform Starts with Community

Reform” a succinct critique of standardized tests that is worth quoting in full:

We currently use standardized tests well beyond what they were designed to do, which is measure a few areas of academic achievement. Achievement tests were not designed for the purposes of promoting or grading students, evaluating teachers or evaluating schools. In fact, connecting these social functions to achievement test data corrupts what the tests are measuring. In statistics this is called Campbell's Law. In other words, what does a score measure after it has been connected to a teacher's pay or job status? In education talk, this is called teaching to the test, hiring to the test, and getting paid to the test (“Education Reform”).

Many of us are critical of standardized tests because not only do top-down imposed tests

infringe on local school boards’ ability to design curriculum, they also modify the very

nature of public schooling.

In addition, according to a 2012 New York Times article, the increased emphasis

on standardized test scores has played a large contributing role in the 20-year-low in

Page 18: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

13

teacher morale:

The slump in the economy, coupled with the acrimonious discourse over how much weight test results and seniority should be given in determining a teacher’s worth, have conspired to bring morale among the nation’s teachers to its lowest point in more than 20 years, according to a survey of teachers, parents and students released on Wednesday (Santos "Teacher Survey Shows Morale Is at a Low Point.”).

The article argues that this low morale is due to unprecedented economic hardship for

teachers. Tied in with this is the heated debate on the role of testing in education and how

students’ test results reflect on their teachers’ performance. Given these facts, low teacher

morale does not seem so surprising. It is perhaps, then, also not so surprising that there is

a widespread disdain for testing among the general populace and teachers today.

This chapter has introduced the wider context of testing and its contentious role in

education today. Though many teachers see testing as another example of the banking

model, there is no intrinsic deficiency in the practice of testing. By analyzing the role of

standardized tests in education today, we can see that tests are frequently misused and too

often they play too large a role in assessment. Though tests can be a useful form of

assessment, their greater purpose is as a means of retrieval practice. Learning through

retrieval has been an accepted practice since ancient history. Rather than using a few

high-stakes exams as an exclusive means of assessment, the best practice is frequently-

administering low-stakes tests to provide numerous retrieval opportunities for students.

This will encourage frequent studying rather than cramming on the part of students. This

Page 19: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

14

process of frequent studying and frequent testing will reinforce and enhance student

learning of course content.

Page 20: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

15

CHAPTER 2: THE STUDENT-CENTERED COMPOSITION CLASSROOM

Moving towards a Dialogic Classroom

“What kind of learning do we want?” This is a question that every teacher should

thoughtfully consider. To be more specific, this chapter is guided by the question “What

kind of learning do we want in a first-year college composition course?” A review of the

practices and ideologies of modern composition will help us understand how testing can

dovetail into that practice. Many composition courses use a student-centered model of

instruction where the classroom is a community in which students collaboratively work

and learn. Though classes in other disciplines are frequently lecture based, in composition

the dialogic class generates peer engagement, which helps to establish a spirit of

collaborative inquiry, which should foster creativity. Writing abilities are developed

gradually through successive drafts of major essay assignments with peer review between

drafts, and through in-class writing exercises. The focus of student-centered instruction

includes the use of peer work and collaborative learning. It is important to realize that, as

John-Steiner and Meehan, writing in the anthology Vygotskian Perspectives, conclude,

“Social interaction and mutual support lead to creativity in a multi-directional dynamic

exchange” (Lee 40). This creativity is fostered in the composition classroom, which uses

collaborative learning methods such as peer workshops, group and partner work, and

Page 21: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

16

group discussion.

Collaborative engagement is already supported in scholarly literature as an

effective learning method. For example, via the studies of Dr. M. L. J. Abercrombie,

Kenneth Bruffee, writing in “Collaboration and the Conversation of Mankind,” shows the

benefits of collaborative learning. Abercrombie's Anatomy of Judgment (1964)

synthesized ten years of observation and research on medical interns studying diagnosis.

Typically, the interns worked as individuals, but Abercrombie asked the interns to all

examine the patient together, discuss the symptoms as a group, and arrive at a consensus

on which all could agree. After observing the interns diagnose both individually and as a

group, Abercrombie concluded that the collaborative model resulted in a more accurate

diagnosis, and faster acquisition of diagnostic skills on part of the interns. Bruffee argues

that this same model of collaboration can also extend to the writing classroom, there too

bringing about better learning and, in this case, better writing.

In addition, Bruffee argues that writing is a conversation, not only with those

around us, but also with ourselves. In our daily lives, "we internalize conversation as

thought; and then by writing, we re-immerse conversation in its external, social medium"

(88). Many forms of thinking are acts of conversation in which our thoughts are revealed

either with the written or spoken word. The conversation in our minds is enhanced by

input from conversations outside and collaboration; following this maxim, composition

practice today favors group work and discussion over lecture as a means of instruction.

Page 22: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

17

One of the great advantages of these forms of collaborative learning is that they are an

active form of learning as opposed to passive forms of learning. As noted earlier, active

forms of learning, such as when students take a test or participate in a discussion are

more effective than passive forms. Using frequent tests in composition, then, can be seen

to accord with the current theory and practice in the field.

Discussion with the class at large, and in small groups, leads to better learning

than lecturing. This is the position espoused by Donald Finkel in his book, Teaching with

your Mouth Shut; he quotes a summary of research by the National Council on

Education, the NCE, as follows:

Research clearly favors discussion over the lecture as an instructional method when the variables studied are retention of information after a course is over, transfer of knowledge to novel situations, development of skill in thinking or problem solving, or achievement in affective outcomes, such as motivation for additional learning [...]—in other words, the kinds of learning we most care about (qtd. in Finkel 3).

The NCE’s research provides a persuasive argument in favor of a dialogic classroom—a

classroom in which dialogue is the primary means of instruction rather than lecture. In

the dialogic composition classroom, readings are discussed rather than lectured on, and

peer-review serves as a means of engaging in conversations about writing strategies.

Gordon Wells, of the University of Toronto, in his chapter in the anthology Vygotskian

Perspectives, describes a dialogic class as one where “the classroom is seen as a

collaborative community. The community works towards shared goals of achievement

Page 23: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

18

and its success is dependent on the group rather than individuals” (Lee 65). In this

collaborative community, Wells argues that “the teacher should be involved as a co-

inquirer with the students” rather than an exclusive authority in the class. The goal in

such a class is to create what Wells calls “an ethos of collaborative inquiry” (ibid).

The notion of decentralizing power challenges long-entrenched beliefs about

teaching; our cultural expectation of a teacher is somewhat akin to a performer: a brilliant

lecturer who captivates an audience of students. However, modern composition

recognizes that this cultural construct is limiting. The authority of the professor, in large

part, is derived from their extensive study and knowledge. Decentralizing authority in the

classroom requires that students have some understanding of course material. The

dialogic class provides more active recall opportunities for students than a lecture-based

class, which is great step in the right direction, but the teacher can even further enhance

student learning by providing a structure for recall of the key terms used in class through

frequently administering low-stakes tests. Frequent tests engage active learning strategies

with students, and since students need to have an understanding of disciplinary terms in

order to participate effectively in these discussions, testing can be used to enhance

student understanding, which will in turn increase student authority.

Creating a successful dialogic classroom requires the teacher to take certain steps

to facilitate collaboration. For example, the first day of class is extremely important—this

is when first impressions are formed, so it is important to make these impressions

Page 24: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

19

positive. An excellent way to immediately foster the spirit of collaborative inquiry is by

setting ground rules for discussion as a group. The instructor can solicit feedback from

the group, write suggestions on the board, and then, through a discussion of the purpose

and objectives of the class, the group as a whole can come to an agreement of what

ground rules are most important. Some instructors even take the process a step further

and involve students in syllabus formation. I know one composition instructor who

includes her students in the decision making process for how heavily weighted

assignments are and what reading material the class should cover. A student’s

involvement in the formulation of the rules will undoubtedly also provide motivation and

incentive to abide by these rules. In these dialogic classrooms then, students are not only

active learners, but also active participants in creating classroom rules. For example, one

method to improve the diversity of tests, increase student engagement with tests, and

encourage students to thoughtfully consider the test questions, would be having students

devise their own test questions as homework. Instructors can have each student write four

or five multiple choice questions as a homework assignment. Writing items will help

students learn the material, and get involved in the test-making process. The instructor

could select among this set of questions and, if appropriate, present them to the class as a

type of student-generated exam. In that way students would remain exposed to a variety

of authorities and the class would move closer towards student-centered practice while

also providing frequent retrieval opportunities of course content.

Page 25: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

20

In his book Teaching with your Mouth Shut, Donald Finkel shares some of his

teaching practices that create a dialogic classroom. Though he teaches literature, not

composition, there are many qualities of his dialogic class that instructors can directly

apply in their composition courses. Finkel’s literature class contains twenty-five students

and a teacher who meet around a single round table. There is no clear locus of attention in

the class—no lectern or podium—and the teacher sits at a different seat every day. A

typical day begins with a student soliciting the other students’ discussion questions that

they have brought to class and writing them on a whiteboard. The class then decides in

what order they would like to discuss the different questions. Once the decision is

reached, the facilitating student sits down and the discussion begins. The benefit of

having a student solicit questions rather than the teacher is that it helps direct focus away

from the teacher and on to the questions the class will be discussing—like frequent tests;

these strategies require students to be active, not passive learners.

Finkel argues that there are four additional advantages of this process of

beginning the class by soliciting questions. The first is that it prevents the first student

willing to talk from taking the reins for the rest of the class. This is a crucial step towards

creating a space for individuals with different communicative styles—from shy students

who only chime in after lingering pauses to boisterous students who do not tolerate the

slightest conversational gap. The second advantage is that it allows students to hear a

number of questions about texts before discussing any one of them. Some students’

Page 26: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

21

questions will invariably be well-formed and others will not, which leads to the third

advantage: it emphasizes the importance of bringing thoughtful questions to class.

Hearing the multiple interpretations of the reading brought in by different students is a

good practice to develop divergent thinking—the ability to see multiple solutions to a

problem. Hearing a single interpretation, on the other hand, given to the students by the

authority of the teacher, may encourage the development of convergent thinking—

convergence on a single possible answer. Additionally, when students know that their

discussion questions will be evaluated by their peers, it provides motivation to write well-

thought and well-formed questions to bring to class. The fourth advantage Finkel

describes is that the ritual of writing questions on the board helps transition students from

the hustle and bustle of wherever they were before class into an atmosphere of thoughtful

inquiry.

With all of this focus on the students, one may ask what the teacher’s role is in

this dialogic class. If the class functions seamlessly without direction from the teacher,

then is there no need for one? Finkel answers this question thusly: “’Teaching with your

mouth shut’ does not entail teacher passivity; it requires different kinds of activities from

teachers” (17). Finkel further argues that “Good teaching is the creating of those

circumstances that lead to significant learning in others” (8). To create those

circumstances during class discussion, the teacher should serve more like a facilitator

than a director. A facilitator coaxes students towards learning and has an idea of how that

Page 27: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

22

will happen, but is not overly attached to a single course of action so long as learning is

taking place. A facilitator is open to “teachable moments” in class, and is willing to cede

authority when it is appropriate. A director, on the other hand, is at the top of a clear

hierarchy and seeks to impose his or her own particular order. Some of the ways a

facilitator might benefit class include, if class discussion begins to die down or becomes

unfocused, the teacher can recommend that they discuss a different topic, or if an

insightful comment is ignored, then the teacher can function as a spotlight to redirect

attention towards it. Additionally the teacher is involved in the conversation and can ask

his or her own well-formed questions, which can serve as a model for students. Finally, at

the end of the day, the instructor can summarize the results of the discussion and

emphasize some key points that were made.

The dialogic classroom de-centers focus from the instructor and creates a student-

centered class. Since group interaction is a focus in composition, students frequently

interact with their peers without the teacher being directly involved. For these

interactions to be successful, students must have some competency with the disciplinary

content of composition. As teachers, we want to prepare our students for peer review and

group discussion by teaching them the terms that allow them to critically evaluate and

discuss writing. The dialogic classroom puts students “at the wheel,” so to speak, which

requires disciplinary fluency from students. To facilitate the most productive

conversations in composition, teachers want to ensure that students have a solid

Page 28: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

23

understanding of course content, and are able to reproduce it in their own words. Students

may initially be exposed to terms of critical analysis through a mini-lecture, or through

readings, or class exercises, and ideally they will continue to use those terms during

group discussion and peer review. However, in addition to these teaching techniques, the

teacher can provide frequently-administered, low-stakes tests to help ensure students’

long-term retention of course content. As can be seen, then, the practice of teaching

through frequent low-stakes tests accords with current theory and practice in

composition. It is an active learning strategy that will enhance student knowledge and

authority and support the student-centered classroom.

The Writing Process in a Dialogic Classroom

Lad Tobin writing in A Guide to Composition Pedagogies describes two modern

approaches to writing instruction and defines them as “process” and “post-process”

writing pedagogy. Process-based pedagogy urges instructors to devote much class time to

peer review of student works-in-progress. The writing process is also explicitly taught,

and the methods that successful writers use for invention, focus, and organization are

taught as models for students to follow. Much time is devoted to in-class writing

exercises to help students practice free-formed writing on demand. The process-oriented

class is guided by an emphasis on pre-writing strategies, and revision after feedback.

Page 29: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

24

Many instructors teach the process by requiring multiple drafts and conducting peer

review between successive drafts, which models the process as recursive. Tobin

distinguishes the post-process model of instruction, which differs in some key aspects.

Post-process teachers typically assign more reading and devote class time to group

discussion rather than peer review. Readings are analyzed in class to identify the effective

characteristics of each piece, and class time is also devoted to the teaching of rhetorical

conventions and genre analysis in a post-process model. Regardless of whether one

focuses on process or post-process pedagogies, we can, in addition to testing students on

grammatical rules and writing vocabulary, also use tests to ensure students understand the

different stages of the writing process and the kinds of rhetorical moves writers use in

different genres.

In addition to being dialogic, the modern composition classroom tends to include

a large amount of active writing time, and most composition instructors teach the writing

process as a recursive process. Whereas the linear model of (1) Outline, (2) Draft, (3)

Revise, (4) Proofread, (5) Publish, has historically been the prevalent model taught in

schools, it is today accepted that this linear process does not reflect the reality of writing

for most people. The linear model encourages a “once-and-done” method of writing.

Rather than this linear model, composition instructors model the writing process as

recursive by requiring multiple drafts of a single paper. Requiring multiple drafts ensures

Page 30: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

25

that students actually engage in the writing process rather than wait for “inspiration,” or

the night before the deadline.

Requiring student papers to be submitted in multiple drafts encourages

incremental improvement over time and helps students manage their time and prevent

procrastination. Ralph Keyes, author of The Writer’s Book of Hope (2003), advises us to

strive towards consistency in our writing practice: “Serious writers write, inspired or not.

Over time they discover that routine is a better friend to them than inspiration” (49). I

argue that this concept can be extended to testing—just as writers benefit from schedule

and routine, so too will students’ learning be enhanced through frequent tests. Over the

course of a semester, as students learn more, and work with feedback on their writing,

their writing will gradually improve. The essay becomes a living thing that grows over

time. The linear model of writing leads to the common reality of students writing their

essays during all-night sessions the evening before they are due. If an instructor does not

require further submission, then students take that to mean that no further revision is

necessary.

Finkel's multi-centered or dialogic form of discussion is a natural outgrowth of

the peer-review workshop famously known as the Iowa Writer's workshop. The

workshop is a common instructional approach in composition, and, like the dialogic

classroom, it relies on the teacher as a facilitator and on student-generated knowledge.

Classrooms structured around a community of equivalent peers benefit students' learning

Page 31: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

26

because they allow for students to have conversations about writing. Like other

specializations, composition has specialized terms and knowledge, and facility with these

terms is critical if workshops are to be effective. By testing on these terms and the writing

process itself we can ensure that our students get the most benefit from our workshops

that is possible.

This external conversation also benefits our internal conversations and enhances

our cognition. Mastery of jargon allows meta-cognition, “the process of reflecting on and

directing one’s own thinking” (National Research Council 78), a key skill which

enhances learning. The language which students practice during peer review also allows

them to engage in meta-cognition about their own writing. Peer review between

successive drafts helps students improve their writing gradually. The practice of peer

review acknowledges that fellow students can be sources of knowledge in a classroom as

well as teachers. Students are only in composition classes for a semester or two and

assumedly we want them to grow past needing the teacher—that will only happen if

students are allowed to retrieve and reproduce class material by expressing it in their own

words. Conducting peer-review and maintaining a student-centered class should help

produce better writers and encourage creative and critical thinking. Frequently

administered tests, given alongside writing assignments over the course of the semester,

will supplement the peer-review process by enhancing student learning and retention of

new terms and writing techniques, making students more capable at peer-review.

Page 32: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

27

Building Writing Fluency with Journal Writing

Gerald Fleming and Meredith Pike-Baky together authored Rain, Steam, and

Speed, a book devoted to improving writing skills through journal writing. In the book

they define fluency as “the ease with which one communicates in each of the language

skills” (14). For an orator, fluency is the ability to deliver a good speech, whereas for a

reader, fluency is the ability to read steadily and understand what is read. Writing fluency

is described in the book as “practiced, prolific writing [that] keeps language and

perceptions flowing past the fidgets, self-distractions, and bogeys that the mind

occasionally throws out when it doesn’t care to work” (14). Practicing sustained writing

is an important step to improving written communication skills; it is something like

endurance training with words. The training metaphor helps us envision writing as a

developmental process. In the same way that an athlete does strength training to build

muscles, it seems obvious that practice with sustained, focused writing is a necessary step

to improve writing fluency. In addition to journals, students can be directed to do in-class

writing through short-answer or essay questions on frequently administered tests. In this

way, the same training metaphor applies to tests, which shows that testing dovetails with

our current practice.

In-class writing through prompts and journal writing is a valuable method of

instruction—this is in line with Expressivist theory. Expressivist pedagogy maintains that

all students have interesting things to say and only need the teacher to “get out of the

Page 33: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

28

way” so to speak, so they can write with their own authentic voice. For the expressivist,

top-down rules serve as blocks to progress—an Expressivist might say that teachers tell

students that they cannot write with every stroke of the red pen, and then teachers

complain "students don't want to write. How can we motivate them?” Lad Tobin, writing

in A Guide to Composition Pedagogies argues,

Children want to write. They want to write the first day they attend school. This is no accident. Before they went to school they marked up walls, pavements, newspapers with crayons, chalk, pens or pencils . . . anything that makes a mark. The child’s mark says, "I am." "No, you aren't," say most school approaches to the teaching of writing (Tate 19).

Expressivist pedagogy is about rekindling this passion seen in children and getting

students excited about writing.

Expressivists maintain that teaching writing is not about teaching new strategies

or rules, but facilitating the unbound and unrestricted production of copious amounts of

text. Peter Elbow, a front-figure of the Expressivist movement writes in his book Writing

Without Teachers,

I try for two things: (1) to help you actually generate words better--more freely, lucidly, and powerfully: not make judgments about words but generate them better; (2) to help you improve your ability to make your own judgment about which parts of your own writing to keep and which parts to throw away (vii-viii).

Tobin summarizes Elbow’s suggestions for how to accomplish these goals as follows:

“Elbow suggests that writers free-write (write non-stop without worrying about

correctness, form, logic, etc.); play with words and ideas; form writing groups; and rely

Page 34: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

29

less on doubting and more on believing, less on criticism, more on imagination” (Tate 3).

This “believing game,” as Elbow calls it, begins first with students believing in

themselves, and with teachers encouraging that belief.

Journal writing and timed writing exercises are common practices in modern

composition. The seminar-based class is a great place for students to develop critical

thinking skills and come up with innovative ideas to write about. In-class free-writes and

journal writing, in which students simply practice writing without any assessment, make

students comfortable with the process of writing. Frequent repetitive writing as in journal

writing can also be done in class with directed writing prompts and short or long answer

questions on quizzes. In a similar manner, frequently used tests can actually be seen as a

way to formalize and or supplement some of our current practice in composition such as

the use of journals or in-class writing.

The Dialogic Classroom and the Academic Discourse Community

The student-centered dialogic classroom requires students have enough

knowledge to work together without the single authority of a professor. In order to

collaborate, students need a common vocabulary and basic understanding of course

content. This project argues that teaching through frequently-administered low-stakes

tests can help ensure fluency with these concepts through active retrieval. In a similar

manner, I contend that composition instructors need to teach the conventions of Standard

Page 35: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

30

American English (SAE) and can use tests to do this.

The explicit instruction of SAE has been the subject of much debate in

composition. Expressivist theorists argue that we should have students focus solely on

writing without constraints and rules. Though some expressivist pedagogues argue that

we should focus exclusively on the generation of content, I argue that we do students a

disservice by neglecting to teach the rules and conventions of academia and SAE. In our

own composition program at Humboldt State University, students need to submit a

writing portfolio, which is assessed to determine if students pass the course or not.

Adherence to style guidelines like MLA and the conventions of SAE are a necessary

component for passing the portfolio. Additionally, students will no doubt be expected to

write fluently in SAE in the rest of the academic and the business world outside the

composition course. Part of the goal of a first-year composition course therefore should

be to integrate students into this aspect of the larger academic community as well.

Teachers should view fluency with SAE as an ingredient of good writing, but not the only

ingredient, though a valuable one nonetheless.

For many students, mastery of these academic arts can be challenging. The

authors of They Say, I Say, Gerald Graff and Cathy Birkenstein, consider academic

writing as a process that can be learned in stages. The book serves to, as the title page

describes, "demystify the moves that matter in academic writing." Each chapter discusses

different rhetorical strategies that authors can use and offers templates that utilize those

Page 36: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

31

strategies. Writing templates are a great way to interpret your own internally persuasive

dialogue—developed through practice with journal writing or free-writes—into academic

language. The templates in this book help students learn how to make logical transitions

and make their sentences work together in cohesion to convey a larger idea. Many first-

year college composition classes now use sentence templates as models. Students’

development as writers is assisted by these templates, which show them how to turn their

own language into academic writing. They Say, I Say is only one popular example, but

there are many composition text books containing sentence templates that could easily be

adapted to fit in tests.

Connors and Lunsford (1988) published a study of the historical frequency of

errors made in freshmen papers from colleges around the United States. They compared

these results with a review of the most common errors from 1917 to 1988. The study

found that the frequency of errors has remained consistent historically at an average of

2.2 errors per 100 words. The study reports that the most common contemporary errors

(at publication) include spelling, missing inflections, apostrophes and commas, and

misused homophones. Though we can use this list of frequent errors, and the many others

like it, as guidelines, good grammar instruction is effective only when it is individualized

and adapted to context. For example, it is standard practice that if a teacher sees that a

large portion of the class has difficulties with comma splices, then he or she runs a mini-

lesson on sentence boundary rules. After the lecture, students then workshop to search for

Page 37: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

32

errors related to that rule during peer review. Later the teacher could administer tests with

questions on sentence boundaries and comma splices and with incorrect examples that

need correction. This additional use of tests could enhance student learning and dovetail

easily with current teaching practices.

Like spoken language, writing requires continuous adjustment to the customs,

constraints and expectations of different discourse communities. The language that is

effective in the classroom is not the same as the language that is used in the courtroom,

the home, or the sports bar, and will not be effective in those contexts. As speakers, we

intuitively switch between these various registers and modify our speech to meet the

needs of the situation. In writing, these conventions and formal expectations are

described as genres. Critical analysis of different genres requires a specialized language

(e.g. audience, purpose, use of logos, ethos, pathos, etc.). Internalization of that language

is a key step for adequate performance within a given genre.

Michael Carter and colleagues, in a white paper published by North Carolina

State University, argue that students learn best from genre models when instruction

includes explicit analysis of the features of a genre. They write the “students may learn

these genres through repeated exposure and trial and error, but explicit instruction can

help them negotiate a variety of genres much more quickly and effectively” (6). They

conclude that “there is little evidence to suggest that students will notice relevant features

and apply them to their own writing situations without such intervention” (9). To help

Page 38: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

33

successful transfer of skills between different genres, composition courses focus on

analyzing the conventions of these diverse discourses and replicating them in written

assignments. However, as Carter and his colleagues note, this may not be enough. Each

genre is made up of multiple components, some of which include audience and purpose.

Fluency in one genre does not necessarily translate to fluency in another genre. Therefore

it is important to teach students that there are different expectations for writing and

thinking across different genres and disciplinary and professional cultures. To aid a

student’s transfer of skills from one writing context to another, many composition

instructors have adopted a genre-based approach to writing instruction. If we want to

prepare students as best we can to meet the diverse writing needs of the university in the

short course of a college semester, then we should focus instruction on genres that tend to

cross disciplinary boundaries—such as the research paper, the summary of assigned

reading material, the professional letter, or the persuasive essay. Time spent on

instruction of these types of writing will be well spent because these skills can easily be

adapted for transfer across disciplines, and, of course, the conventions of these genres can

be explicitly taught through lecture and student retention can be reinforced through

frequent tests.

Some degree of fluency with SAE is a basic expectation for published works,

along with the ability to conform to research style guidelines (MLA, APA, etc.) and the

conventions of different genres of academic writing (a formal research paper, an

Page 39: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

34

annotated bibliography, a business letter, etc.). A student-centered classroom can help

foster creative exchange and frequent writing will help build familiarity with the writing

process. However, these ingredients are not enough. To become capable of effective

peer-review and to meet the needs to the university, students also have to internalize a

great deal of disciplinary knowledge. The research in the following chapters will show

that our ability to learn and retain the disciplinary knowledge described relies in large

part on our active retrieval and usage of these terms. Learning and retention of the arts of

logic, rhetoric, and grammar, and an understanding of the various stages of the writing

process can all be facilitated through the administration of frequent, low-stakes tests, and,

as I’ve argued in this chapter, the use of such tests accords well with current composition

theory and practice and can easily be adopted in today’s composition classroom.

Page 40: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

35

CHAPTER 3: HUMAN LEARNING

The Physiology of Learning and Memory

Any conversation about effective teaching must include a consideration of how

students learn. This chapter provides an interdisciplinary overview of human learning

with an up-to-date description of how we understand learning to occur from a

neurological perspective, as well as from cognitive, and social-constructivist approaches.

Knowing how the brain captures, retains, and retrieves information will help teachers to

design assignments and activities that are instructionally effective. The focus of this

chapter will be on those aspects of human learning that help us understand why retrieval

practice (which tests facilitate) is essential for learning. Neurology studies the physical

structure of the brain and nervous system, and cognitive neuroscience studies our thought

processes. The brain is the center of learning and memory, so it seems obvious that

educators, who are primarily concerned with learning and memory, would want to stay

abreast of whatever discoveries have been made in this burgeoning field. Of course

wanting to and having time to keep up with the deluge of new research that is constantly

being generated are two very different things. The sheer volume of research available can

be overwhelming, so this chapter will give an overview of some of this new research in

an accessible format.

Page 41: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

36

Neuronal communication is the basis for learning and memory. The brain is made

up of billions of cells called neurons. Neurons can be split into two distinct parts: the cell

body, and the axon. In Figure 1 On the far left is the cell body, which has multiple

dendrites protruding from it. The axon is connected to the cell body at the axon hillock

and ends at the axon terminals that web out into a network of up to 10,000 other neurons.

The axon terminals connect to other neurons’ dendrites. Axon terminals send out

neurotransmitters and the dendrites listen for them, so the dendrites could be considered

the ears of the neuron and the axon terminals the mouth.

Figure 1: “Neuron”

Source: (Sapolsky 10)

In order to communicate, neurons expend a great deal of energy redistributing

ions to maintain what is called a “resting potential” and an “action potential.” To

Page 42: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

37

communicate, neurons concentrate on contrasts in electrical activity between these states.

The differences between these two states allow neurons to communicate. This is similar

to binary logic—like a light switch that is either on or off, a neuron is either

communicating or not communicating. When a neuron has something to say, so to speak,

it goes into action potential, sending out an electrical signal to the axon terminals, which

is then relayed to the neural network.

There are two methods that neurons use to trigger an action potential: temporal

summation and spatial summation ("Synaptic Transmission in the Central Nervous

System"). Temporal summation occurs when the same input is triggered over and over.

Spatial summation occurs when numerous dendrites are stimulated at once. Either

method will produce enough concentration of power in the axon hillock, triggering an

action potential. When action potentials are received by the dendrites, channels open and

ions begin to move, causing a change in the electrical state of the neuron. When an action

potential reaches the axon terminals, the neuron sends out a flood of chemical messengers

called neurotransmitters. These neurotransmitters are picked up by the dendrites of the

surrounding neurons and form what is called a synapse—a connection between the axon

of the previous neuron and the dendrite of the receiving neuron. So long as a neuron is in

action potential, it will continue to send out neurotransmitters and form synapses. If the

original event that sparked the first action potential is strong enough, then a chain

reaction is sustained.

Page 43: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

38

Figure 2: “Neuronal Communication”

Source: ("The Basics: Ion Channels Underlie Neuron Communication")

The neurons which sustain these chain reactions are referred to as a neural

network. A single neuron can connect to 10,000 neurons through the axon terminals, and

can receive transmissions from up to 10,000 neurons through the dendrites in the cell

body. Thus, neural networks are capable of an enormous degree of complexity. Neural

Page 44: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

39

networks could be thought of as the “screenshots” of a particular moment of cognition or

perception (Sapolsky 10). Patterns of neural activity are thought to correspond to

particular mental states or mental representations. Under this model, learning can broadly

be defined as being comprised of changes in connectivity, either via changes in

potentiation at the synapse or via the strengthening or pruning of connections in a neural

network. Neural networks are constantly assembled and disassembled in our brains as we

learn and forget.

Neural networks can be made long-lasting through repeated stimulation. Every

time a neuron has an action potential, it causes a physical change in the neuron, making it

more excitable in a given network, meaning that less of an excitation is required to induce

later action potential. Repeated stimulation results in a neuron becoming hypersensitive,

reducing the amount of excitation needed to cause an action potential, meaning that a

weaker stimulation in the future will activate the associated neural network. When less

excitation is required that means that it is easier to recall something. The state of the

synapse becoming hyper-responsive or potentiated for long periods is called Long-Term

Potentiation (LTP). Increasing the strength of synaptic communication through LTP is

the physiological basis for learning and memory as we understand it today (Sapolsky 15).

Unless they undergo the physical process of LTP, neural networks will gradually

dissipate. However, networks that do undergo LTP decay very little over time, and, with

sufficient cues, can be retrieved many years later (“Remembering and Forgetting”).

Page 45: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

40

Forgetting occurs in long-term memory when the formerly strengthened synaptic

connections among the neurons in a neural network become weakened, or when the

activation of a new network is superimposed over an older one, thus causing interference

in the older memory (ibid).

Earlier I described two ways to initiate action potential: spatial and temporal

summation. You’ll remember that action potential is a necessary step towards LTP,

which is the process that makes memories stick. Considering these two forms of

summation from a teacher’s perspective, spatial summation may be caused by the

student’s interest, motivation, and comprehension. We can influence these variables to

some degree, but they are always dependent on the student. However, activating temporal

summation, on the other hand, is entirely within a teacher’s control—by administering

tests. If tests are given frequently to students so that temporal summation is regularly

stimulated through repetition, then LTP can be achieved. This framework from

neuroscience helps us understand that test-enhanced learning can physically change the

structure of the neural networks associated with course material, making it more likely

that students will permanently retain course information. Test-enhanced learning can be

seen as a way of providing retrieval opportunities for students in the classroom. Therefore

frequently-administered low-stakes tests which are spaced out over time can be a

beneficial teaching practice that will help students retain course information.

Page 46: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

41

Cognitive Definitions of Learning and Memory

Writing in the textbook Learning and Memory: An Integrated Approach, John

Anderson explains that “Learning refers to the process of adaptation of behavior to

experience, and memory refers to the permanent records that underlie this adaptation"

(6). Learning is a process, and memory is a record. Memory is divided into three types:

sensory, short-term, and long-term. A dominant paradigm for describing these different

types of memory is called “stage theory” and comes to us from Atkinson and Shiffrin

(1968). The relationship between these three stages is shown in the flow chart below,

which begins with an external stimulus activating sensory memory. There it is either

forgotten, or it goes through initial processing and enters short-term memory. With

repetition it will stay in short term memory and with elaboration and coding it will enter

long-term memory—otherwise it will be forgotten. After a short duration that memory

will leave STM and can only be brought back from LTM with retrieval. The image also

shows that our response to a situation can only be aided by information in our short-term

memory—it cannot jump immediately from LTM into current use.

Page 47: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

42

Figure 3: “Stage Theory of Memory”

Source: (“Three Stages of Memory”)

The following analogy will help illustrate these three types of memory. Imagine

three types of writing: one is drawn directly on the surface of water, the next is written on

wet sand at the beach, and the third is chiseled into the stone of a mountain. When we

draw directly on water, the surface will immediately change and erase whatever we drew.

The message will disappear in a moment. This is similar to sensory memory. We take in

vast amounts of information every moment through our senses, and a great deal of it we

don’t pay attention to after the moment passes—sensory memory is soon forgotten, so

these “recordings” have no retrieval strength and no storage strength. Short-term memory

is like drawing in the wet sand at the beach. The record is clear and accessible but it will

fade over time when the tide washes it away—it has high retrieval strength, but low

Page 48: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

43

storage strength. Finally, long-term memories that have undergone significant encoding

and retrieval are deeply etched in our mind, like words chiseled in stone.

Bjork and Bjork (1992) distinguish two qualities of long-term memory which

together determine the likelihood of successful retrieval (e.g. answering a test question or

remembering relevant information when cued): (1) retrieval strength and (2) storage

strength. Retrieval strength is the accessibility of a memory at a given moment. Storage

strength is how deeply a memory is embedded in the mind. Imagine you just learned a

stranger’s name. If you clearly heard it and repeated it, then at that moment it is very

fresh in your mind, so it has high retrieval strength. However, if you do not meet that

person again or have a chance to retrieve their name again, then it is likely that you will

not remember their name because the storage strength is weak. On the other hand,

consider the name of a close relative who died long ago. As time passes, it is likely you

will not think about them as frequently, and so their name will have low retrieval

strength. However, their name is deeply embedded and you are unlikely to forget it even

though you have not used it recently because it has high storage strength.

As described earlier with the physiological description of memory, the storage

strength of memories is improved through repeated retrieval actions, which bring about

LTP. The retrieval strength of memories is contingent on the retrieval cues that bring that

knowledge to the forefront (such as being asked the question on a test, or a sensory

experience that triggers remembering). A third quality we have not yet considered is the

Page 49: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

44

encoding strength of memories, which is best understood through schema theory.

The term “schema” was first used by Piaget in 1926. R. C. Anderson, a respected

educational psychologist, expanded the meaning and developed schema theory. This

learning theory views organized knowledge as an elaborate network of abstract mental

structures which represent one’s understanding of the world. Schemata are prior

knowledge linkages, and they influence the amount and proficiency of our learning.

Schemata can be added to, and, as an individual gains experience, schemata develop to

include more variables and more specificity. Each schema is embedded in other schemata

and itself contains subschema. Schemata change moment by moment as information is

received. They may also be reorganized when incoming data reveals a need to restructure

the concept. Schema theory shows us that the encoding strength of memories should be

increased when meaningful connections are made between various schemata, and when

information is retrieved in new situations and transferred to new circumstances.

How students organize knowledge influences how they learn and apply what they

know. As humans, when we are paying attention, we naturally make connections between

new knowledge and existing schema in our minds. When those connections form

knowledge structures that are accurately and meaningfully organized, we are better able

to retrieve and apply that knowledge effectively and efficiently. In contrast, when

knowledge is connected in inaccurate or random ways, we can fail to retrieve or apply it

appropriately. In this way, our prior knowledge can help or hinder learning (Ambrose, et

Page 50: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

45

al. 4). Teachers work to help students create meaningful connections and connect new

knowledge with prior learning; teachers also strive to ensure that students create accurate

and efficient schema, interconnecting disciplinary content and writing practice. Teachers

can enhance these practices and can help structure student learning through tests.

Writing in How Learning Works: Seven Research-Based Principles for Smart

Teaching, Ambrose et al. define learning as the process that leads to change resulting

from experience and increased potential performance for future learning (3). Learning is

the result of how students interpret and respond to their experiences, and therefore

learning can bring about changes in knowledge, behaviors, beliefs, and attitudes. Herbert

Simon, Nobel Laureate and one of the founders of the field of cognitive science, argues

that “Learning results from what the student does and thinks and only from what the

student does and thinks. The teacher can advance learning only by influencing what the

student does to learn” (qtd in Ambrose et al. 3). Learning is a process, not a product, and

because it happens within each student, instructors can only infer that learning has taken

place from students’ products or performance—learning is not something that an

instructor can do to students, but rather it is a process that students themselves do.

David Ausubel (1968) coined the term “meaningful learning.” In Ausubel’s view,

to learn meaningfully, students must relate new knowledge (concepts and propositions) to

what they already know. Under this model new knowledge must be internalized in

relation to what is already understood. Undesirable learning, on the other hand, consists

Page 51: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

46

of repetition of an item without full understanding of its meaning or how it connects to

other knowledge. According to Ausubel, when meaningful learning occurs, disparate

facts are understood in relation to each other and therefore recollection of any single fact

will prime the mind for recollection of the related facts. This is similar to schema theory

described earlier. In practical terms in the classroom, meaningful learning occurs when

learners construct their knowledge in their own words. It requires that teachers give

students the opportunity to engage in personally meaningful written and verbal

expression.

Simple recollection without real comprehension, also referred to as “parroting,” is

an example of undesirable learning. For example, with enough cramming any person

could memorize the questions and answers to a test administered in a language he or she

does not understand. This type of mimicry does not indicate legitimate understanding and

internalization, because if the questions were re-phrased, or given in a different order, or

the parameters of the test changed in any other way, then regardless of the amount of

cramming one has done, he or she wouldn’t be able to complete the test. This is because

that person never actually understood the questions or answers—rather he or she had just

learned to provide a particular response to a specific stimulus. True learning, on the other

hand, is demonstrated by the usage of learned material in new and meaningful contexts.

This model of meaningful learning is helpful for our purposes because it provides

a framework for how instructors should model their teaching. It is easy to think of tests as

Page 52: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

47

simple things that encourage rote memorization out of context; however, with proper

care, teachers can make tests that are meaningful and relevant to course content and

present them in ways to students that encourage meaningful learning. Concerns that

testing merely requires parroting can be addressed by designing tests that require more

depth than just recognition, like short answer or essay responses. Test-enhanced learning

will ensure students have a grasp of disciplinary content; this will set the foundation for

meaningful learning as students will be able to competently use course content during

group engagement, and during the writing process.

Three Theories of Learning

Three theories of learning will further help us understand the learning process and

how testing can be a beneficial practice: the encoding-specificity principle, transfer-

appropriate processing, and desirable difficulty. The encoding-specificity principle holds

that a retrieval cue (i.e. an external stimulus that induces retrieval) will be effective if it

overlaps with features in the original memory trace. The encoding-specificity principle

theorizes that the best memory performance is generally found when the processes

engaged and cues given at retrieval are similar to those engaged in during encoding (see

Fisher & Craik 1977 and Moscovitch & Craik 1976). The following familiar story

illustrates the principle: you may be unable to remember the name of a neighbor’s dog

Page 53: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

48

until the moment when you are watching a television program about show-dogs, which

reminds you that your neighbor’s dog is a show-dog named Tess. In other words, we

cannot always predict what type of external stimulus may spark retrieval in the future. As

another example, it may be more difficult to remember a classmate or a co-worker’s face

if you see them in an unfamiliar context like walking on the street or at a park. However,

once you meet this person a few times in different settings, it is more likely you will

remember them. Where and how information is encoded affects our ability to retrieve it.

The implication of the encoding-specificity principle is that encoding variability

(i.e. encoding under numerous circumstances) should produce better retention because it

increases the number of potential retrieval routes, thereby increasing the probability of a

match with whatever cue is presented at retrieval. According to the theory, encoding

variability of any sort should also increase the probability of successful retrieval. As an

example, I know a professor who brings citrus to class on test days for the aroma to help

put students at ease. The aroma should also increase encoding variability for the students

and hypothetically create a connection between the smell of citrus and the content of that

class. It could be years later for one of her students that the smell of citrus triggers

memories of that exam and the information on it. For the purposes of analogy, let us

consider the mind as a labyrinth, with successful retrieval representing successfully

navigating the labyrinth from beginning to end. Every memory trace enters and leaves the

mind through a vast network, like a labyrinth. Imagine that as it does so it leaves a mark

Page 54: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

49

behind on the walls, and the mark gets clearer every time the same route is followed;

eventually a clear path is laid through the labyrinth. This “mark” was described

physiologically as long-term potentiation (LTP). To follow the analogy, you could be

dropped at any point in the vast labyrinth, and by establishing multiple routes that all

converge on the same memory, or connected schema, the likelihood of successful

retrieval will be increased. This is in line with the practice of frequently administering

low-stakes tests. Repeated tests which use different questions and test on different parts

of the question stem should promote encoding variability, because every time students

take a test there will be differences in mood, activated schema, and a variety of other

external factors. This variety of encoding circumstances, when compounded by the

variation in tests questions, will produce multiple retrieval routes, which should improve

memory performance.

Transfer-appropriate processing theorizes that successful retrieval is dependent on

the overlap between the cognition engaged in during encoding, and the cognition engaged

in during retrieval (see Kolers & Roediger 1984; Morris, Bransford, & Franks 1977;

Roediger 1990; Roediger et al. 2002). In terms of students learning on tests, Roediger and

Karpicke (2006) argue that retrieval practice through active testing is more effective than

passive review because the cognition engaged in during testing more closely matches the

necessary cognition for later retrieval than passive learning does. For example, if

someone wanted to learn how to swim, they could conceivably do so without ever getting

Page 55: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

50

in the water by suspending the body with ropes and practicing the arm strokes and foot

motions. This practice and the act of swimming are transfer-appropriate to some degree;

however, the training would be more effective if it took place while actually swimming.

In the same way, listening to a lecture on writing effective transitions may prepare some

students to apply that material to their own writing, but the best way to train for long-

term retention, and personal reproduction of course material is through active

reproduction on tests. Performance on a test requires similar cognition as remembering

that same information in a different context such as in conversation or while writing.

Because the two actions are transfer-appropriate, testing can be seen as a more effective

training method for later retrieval than learning through lecture.

In the case of composition instruction, the material that students learn through the

course will be more readily understood and internalized if there is a strongly embedded

schema of related terms and concepts. Frequently-administered tests provide retrieval

opportunities which increase storage strength, and also improve encoding strength. When

students retrieve information as the course progresses, it allows them to relate old

information to the new schemata that they are developing through internalization of the

new course content.

One theory of learning holds that more challenging retrieval produces greater

benefits for long-term retention; Bjork (1992) refers to this principle as “desirable

difficulty” (see Bjork 1999; Karpicke & Roediger 2007; McDaniel, et al. 2007; Roediger

Page 56: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

51

& Karpicke 2006. This theory holds that when retrieval strength is high and information

is easily accessible, the retrieval of that information produces small gains in storage

strength. In contrast, more difficult retrieval actions, such as remembering in a different

environment, or with fewer cues, or after a long period of time, all produce greater

increments in storage strength. To follow the analogy used earlier with the three types of

writing—imagine that time passes in the mountains of our minds and the stones we

chiseled words into are covered by moss and leaves. The words are still there etched in

stone, but they are difficult to access—they have high storage strength but low retrieval

strength. Desirable difficulty theorizes that the greatest gains in storage strength will be

made when retrieval strength is low. By analogy, that means that every time you clear off

the moss and leaves and retrieve the words, they also become more deeply etched in the

stone beneath.

The theory of desirable difficulty compels teachers to form tests which require

active reproduction from students. The predictions of desirable difficulty are confirmed

by differing rates of long-term retention as a result of taking recognition or recall tests. In

a recognition test like multiple choice, the right answer is presented among others, which

provides a strong cue for recollection and greatly increases the likelihood of giving a

correct answer. Alternatively, short-answer questions require the taker to provide the

answer in their own words, which requires more effort. This means it typically takes

longer to answer a short answer question, and it means that the taker has to produce the

Page 57: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

52

answer without having the cue for recollection. Although both tests benefit memory, the

more effortful recall produces better long-term retention. Many studies show that recall

tests promote better long-term retention than recognition tests (see Jacoby 1978; Butler &

Roediger 2007; Glover 1989; Kang, McDermott, & Roediger 2007; McDaniel, Anderson,

Derbish, & Morrisette 2007). Further support for desirable difficulty comes from

Agarwal et al. (2008) who studied student learning comparing open-book and closed-

book tests. Closed-book tests require more difficult, challenging processing than

restudying a passage, yet difficult processing benefits long-term retention according to

the theory of desirable difficulty. The study found that open-book tests increase retrieval

strength, as evidenced by high initial performance, but produce small increments in

storage strength. In contrast the more difficult closed-book tests produced greater long-

term retention. Conditions that require more difficult and challenging processing may

slow initial learning but ultimately enhance long-term retention relative to less-

challenging learning conditions that produce rapid initial learning but poor retention.

These three theories of learning help us understand the testing effect and converge

on the notion that we should administer frequent low-stakes tests. In regards to

composition instruction, we can achieve desirable difficulty through administering short-

answer questions and essay questions. Short-answer questions are perfect to ensure

students are retaining key course concepts that will be useful in peer collaboration, class

discussion, and the writing process.

Page 58: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

53

Social Constructivist Theories of Learning and Testing

One way to make learning more meaningful and memorable is with the use of

social constructivist theories of learning. This section will explain learning from a socio-

cultural perspective and further emphasize the importance of collaborative conversation

in the classroom, and tests will be discussed as a tool for helping students join that

conversation and the academic discourse community. While behaviorists theorize

learning as a series of stimulus-response pairs, and cognitivists theorize behavior as a

complex formula resulting from each individual’s cognition, social constructivists expand

upon these models by focusing on interaction between groups rather than just focusing on

individual behavior. This theory dictates that language is at its core a social act, and in

order to communicate with various “discourse communities,” we adopt jargon and

communicative patterns that are appropriate to the community we are a part of (Lee 2). In

order to be a member of any discourse community—be it academic, professional, or

personal—the individual must learn the conventions and jargon of that particular

community; thus any instructional tool—like tests—that can help students join the

academic discourse community will be of benefit..

Social constructivism is distinguished by the belief that language and the mind are

inseparable, because any individual needs language in order to think and encode their

long-term memories. Language, in Vygotskian terms, is a psychological tool that humans

utilize uniquely among all other animals. Further corroborating the key position of

Page 59: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

54

language to development, linguist Michael Halliday has argued that “language is the

essential condition of knowing, the process by which experience becomes knowledge”

(57). It is language, and other psychological tools (such as mathematical symbols, the

alphabet, and scientific diagrams), that allow humans to perform the unique activities that

we do—from building rockets to writing sonnets.

In his work The Dialogic Imagination (English translation 1986), Mikhail Bakhtin

describes a socio-cultural model in which the individual and society interact to influence

personal development. Every individual exists in a society with a history of complex

interaction between language and power. From our youth we are influenced by social

discourses, and by our parents and role models, for example, whose views we interact

with and selectively internalize. Through our interaction with society, Bakhtin maintains

that “not only are the meanings of words and expressions ‘borrowed’ from the speech of

others, but each utterance is a link in a very complexly organized chain of other

utterances” (337). Within this “chain of utterances,” every individual is involved in a

dynamic process of self-discovery and creation of identity in relation to their larger

discourse community. Bakhtin describes this social enculturation as “ideological

becoming,”—“the process of selectively assimilating the words of others” (341). This

process of ideological becoming involves redefining the words of others into our own

“internally persuasive discourse.” In Bakhtin’s words: “internally persuasive discourse,

as opposed to one that is externally authoritarian is…tightly interwoven with ‘one’s own

Page 60: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

55

word’” (346). Bakhtin explains how these dialogic interrelations preexist and shape each

individual utterance: "The living utterance, having taken meaning and shape at a

particular historical moment in a socially specific environment, cannot fail to brush up

against thousands of living dialogic threads, woven by socio-ideological consciousness

around the given object of an utterance; it cannot fail to become an active participant in

social dialogue" (276). Through our interaction with the external social world, we

mediate outside influences through language into ideologies that are personally

meaningful.

One model that has been useful for describing collaborative learning is

Vygotsky’s Zone of Proximal Development, or ZPD. While studying children’s learning,

Vygotsky found that each child had an achievement potential that they could realize

unassisted, and one that was higher if they were aided by someone more knowledgeable

(Vygotsky 84). He calls the space in which this learning happens the Zone of Proximal

Development. Wells, writing in Vygotskian Perspectives, describes the ZPD as the “use

of language between novices and more expert others as a tool for mediating

misconceptions and consolidating understandings" (5). He argues that rather than only

viewing the ZPD as existing between a single expert and a less knowledgeable peer, we

should also consider the ZPD as a collaborative model between a group of peers.

According to Wells, effective learning is not unidirectional (as is assumed by a lecture

model), but rather understanding is both mutually constructed and reciprocal. By

Page 61: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

56

grouping students with their peers, the ZPD is changed from a unidirectional exchange

into a multi-dimensional one, where every student has something to contribute, and

learning happens as a group. Through collaboration, concepts that were only vaguely

understood before can coalesce into coherent thoughts; individuals can work to an

agreement on core meaning, and develop their own informed opinions through critical

engagement with class material.

In composition and creative writing, if we follow the idea of ideological

becoming, then students, in part, form themselves and their intellectual development

through what they write, and through discussing their writing. This process of ideological

becoming through learning and writing is one of the very reasons that we value higher

education. As educators we seek to guide this process of subject formation and self-

expression by teaching the disciplinary content that we argue will enhance writing

abilities. Based on social constructivist theory, it stands to reason that our students will be

more capable peers (a term coined by Vygotsky) if they know the vocabulary and

conventions of the academic discourse community. Because composition teachers don’t

use lectures as a primary means of instruction, but instead use student-directed methods

and group work, it means that students need to have done the reading and internalized the

course material so they have the vocabulary and knowledge to engage as capable peers.

One way we can ensure students have this level of understanding is by testing for content.

This will lay the foundation for students to engage in group work without the teacher.

Page 62: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

57

The practice of administering frequent, low-stakes tests as a method of teaching is

well in line with the theoretical stance of social constructivism. With the rise of social

constructivist theory as a guiding paradigm in composition, there should also come a

recognition that tests can be used to help students gain fluency in the academic discourse

community. Bringing students into the field of academic discourse requires that they

learn certain disciplinary knowledge. In order to participate effectively in this discourse

community, students must internalize genre-specific conventions, communication styles

such as Standard American English, and research guidelines such as MLA or APA.

Testing can help students to do this effectively. This is especially true for first-year

composition students who tend to be new to the discourse community of academia. The

conventions of academia will gradually be internalized by most students, but their

integration can be assisted via testing.

Some people view testing as a prescriptive method of instruction that stifles

creativity. They might argue argue that testing encourages convergent thinking—

imparting on the student that there is only one right answer which comes from an

authority. Similarly, many would question the purpose of memorizing information in our

digital age when answers are a mere Google search away. What is the significance of

remembering? Some would argue that we don’t need to memorize details; after all, that’s

what computers are for. They might argue that what really matters is how things fit

together. However, Robert Bjork, a prominent researcher in the field of memory asks “the

Page 63: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

58

people who criticize memorization—how happy would they be to spell out every letter of

every word they read?” (qtd in Wolf’s “Want to Remember Everything You’ll Ever

Learn?”). It is an inescapable fact that to participate in new fields we must learn new

things. For example, children can only learn to read whole words through dedicated

practice. Every time we enter a new field we have to go through the same process—we

become children again. Every field has its own language and conventions that must be

internalized. The process of learning requires repetition and verbal encoding of new

concepts into one’s own words, and testing can help with this process.

But let’s return to the question of creativity. Creativity is the ability to view the

world through a variety of models (or paradigms) and since testing can increase the

amount of memorized information we have to work with, it can enhance rather than

detract from the number of ways we can apply what we know. Creativity is not the

opposite of memorization; it is the useful application of memorized information. The

human brain is a marvel of associative processing, but in order to make associations, data

must be loaded into memory. We need to internalize disciplinary information through

encoding and repetition in order to utilize it creatively. The goal of the composition

classroom should be to facilitate both learning and memorization of content, and the

application of that knowledge towards creative goals. However, my research leads me to

believe that currently there is a lack of emphasis on memorization in composition

instruction.

Page 64: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

59

In both Bakhtinian and Vygotskian theory, language is at the heart of

development, and the social world is the arena where language is exercised and

developed. Between the individual and the group, new ground can be broken. When

engaged in collaboration, the exchange of multiple interlocutors improves understanding

and retention of class material through retrieval and repetition. Additionally, dynamic

collaborative dialogue can lead to the co-creation of new meaning—a creation that would

not have been possible individually. John-Steiner and Meehan put it this way:

“knowledge therefore is both re-constructed and co-constructed in the course of dialogic

interaction… [members in a dialogue] actively restructure their knowledge both with

each other and within themselves” (35). The separation of memorization and creativity

into distinct categories represents a false dichotomy. Creativity is the application of what

you know to new effect. Because tests enhance knowledge, they can also be used to

enhance creativity.

This chapter has given an overview of human learning from the theoretical

perspectives of brain physiology, cognitivism, and social constructivism. We have seen

that we can increase encoding strength of long-term memories through meaningful

learning and by connecting the things we learn with our pre-existing schemata. Also

important is the role of collaboration and conversation. Though some may argue that

creativity springs from an independent emergence in the mind of a genius, I have argued

that creativity is enhanced through internalization of disciplinary content, and emerges

Page 65: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

60

from collaborative dialogue between groups of capable peers. Teachers can help ensure

that students have internalized the course material necessary to engage as a group of

capable peers through giving frequently-administered low-stakes tests. An exclusive

emphasis on testing cannot fulfill all of the needs of composition. I argue that tests are an

essential but insufficient method of instruction. Our current practice in composition

instruction, which includes a variety of group activities and collaborative discussions, is

also a necessary, but insufficient method of instruction. By combining the two methods

we can both ensure that students are adequately internalizing course content and

creatively using it to new effect.

Page 66: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

61

CHAPTER 4: THE TESTING EFFECT AND THE SPACING EFFECT

What is the Testing Effect?

Although many people associate tests with the assessment or measuring of

knowledge rather than with learning, research shows that a test can serve a far greater

purpose than mere assessment. The studies discussed in this chapter show that tests can

also enhance learning and improve long-term retention, and they show that the act of

retrieving information from memory on tests increases the probability of successful

retrieval in the future. This phenomenon of enhanced learning as a result of testing has

come to be known as “the testing effect.” H. L. Roediger, a writes that

The testing effect represents a conundrum, a small version of the Heisenberg uncertainty principle in psychology: Just as measuring the position of an electron changes that position, so the act of retrieving information from memory changes the mnemonic representation underlying retrieval, and enhances later retention of the tested information (“The Power of Testing Memory” 182).

This is in line with the physiological description of memory through neural networks

given earlier. Every act of retrieval brings about a physical and structural change to the

network, which results in long-term potentiation (LTP), making future retrieval actions

easier.

The testing effect has considerable implications for composition instruction and

the field of education at large. As discussed in chapter 2, our current practice in

Page 67: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

62

composition instruction uses collaborative, student-directed methods of learning.

However, in order for students to effectively collaborate, they need to have adequately

internalized the disciplinary content of composition, including the conventions of

Standard Academic English, research conventions like those put forth by the MLA, and a

wide variety of terms and definitions (“thesis”, “topic sentence”, “analysis”, etc.). What

the testing effect and the spacing effect show us is that, without retrieval opportunities

spaced throughout the semester, it is unlikely that students will retain this content.

Therefore, this project argues for the use of frequently-spaced low-stakes tests to help

students internalize the knowledge of composition—as argued in Chapter 2, this will

make students more capable of group work and gradually improve their writing.

A brief review of this phenomenon and the contexts in which it has been found is

given by Mark McDaniel and colleagues in the article “Testing the Testing Effect in the

Classroom”:

Testing effects are observed with word lists (Hogan & Kintsch, 1971; McDaniel & Masson, 1985), paired associate lists (Allen, Mahler, & Estes, 1969; Carrier & Pashler, 1992), pictures (Wheeler & Roediger, 1992), and prose material (Glover, 1989; Roediger & Karpicke, 2006b). Testing effects surface when the intervening tests are different from the final tests: intervening recall tests improve subsequent recognition (Glover, 1989; Lockhart, 1975; Wenger, Thompson & Bartling, 1980) and intervening recognition tests improve subsequent recall (Runquist, 1983). Taking a test is almost always a more potent learning device than additional study of the target material (see Carrier & Pashler, 1992, for recent experimental tests, and Roediger & Karpicke, 2006a, for a review). (495)

Page 68: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

63

Recent studies have examined the testing effect in middle school (McDaniel 2007, 2011)

and college (Butler 2012) contexts. As we will see in this chapter, the testing effect is

consistently found in diverse studies. The literature reviewed in this section shows that

testing reduces forgetting, especially if administered shortly after learning, and multiple

tests produce a greater effect in slowing forgetting than a single test. The studies

reviewed in this chapter also show that taking a test has a greater positive effect on future

retention than spending an equivalent amount of time restudying the material.

Many of the studies reviewed below test learning and retention of paired

associates; these are A-B connections, and when presented with the cue of A or B, the

test taker would have to recall its associate. Paired-associates can represent diverse

information, such names to go with faces, or a phone number for a friend, or translations

of words from L1 to L2, or that 8x9 = 72. The difficulty of pair-associates is in part

dictated by how logically they associate. For example, “chair-table” is easier to remember

than “chair-donkey,” which is in turn easier than “VFU-734.” At its core, this form of

memorization, or paired-associate learning, is identical to the memorization of the key

vocabulary terms and definitions that might be required to successfully function and

collaborate in a composition class. Any method that increases long-term retention of

paired associates would be a beneficial instructional technique in a composition class.

While these studies are not measuring skill formation (i.e. development of writing ability

over the course of a semester), they are examining the ability to recall key memorized

Page 69: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

64

information similar to the key concepts and terms taught in composition.

Studies Reporting a Testing Effect

The first large scale study of the testing effect was conducted by Arthur Gates and

published in 1917. This study compared the effectiveness of active recall (what they call

“reciting”) to passive review, and found that the active recall required by testing

improved retention of the concepts over study. Gates tested children in grades 1, 3, 4, 5,

6, and 8, using two types of materials, nonsense syllables and facts taken from prose

passages in the book Who’s Who in America. The nonsense syllables were simply three

letter groupings that do not form a word in English such as DAK, YRK, or CTR. The

children studied the materials in two phases, first reading to themselves, then looking

away from the materials and recalling (reciting) whatever answers they could, with

researchers recording the students’ performance during free-recall. Researchers instructed

students to read or recite for different amounts of time, and different groups of children,

separated by age level, spent 20, 40, 60, 80, or 90% of the time self-testing. At the end of

the period Gates administered a test to the children on the material they had learned, and

after a delay of 3 to 4 hours he retested them.

Presumably because of their early level of cognitive development, first graders,

children six to seven years old, were not able to perform very well in the study and they

Page 70: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

65

were not tested on the prose passage because of poor reading abilities. Their performance

alongside the other students’ on nonsense syllables can be seen in Figure 4, which shows

the proportion of test items recalled on the X-axis and proportion of time reciting in the

Y-axis. The increase in performance can be more clearly seen on the delayed test rather

than the immediate. The top two graphs are for performance on immediate tests (left) on

nonsense syllables and biographical facts (right). The bottom two graphs are for

performance on delayed tests on the same subjects and positions as the top two. With the

prose passages, the optimal amount of recitation seemed to be about 60% of the total

learning period, with the rest spent re-reading. Researchers found that the effect leveled

off and test scores began to drop at higher rates of recitation to re-reading. The benefits of

recitation do not level out for nonsense syllables, because they are nonsensical and

reading would not help encode the information. This data suggests that a balance between

studying and testing is best.

Page 71: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

66

Figure 4: “Performance on Immediate and Delayed Tests”

Source: (“The Power of Testing Memory” 184)

A second landmark study showing positive effects of testing was carried out by

H.F. Spitzer (1939). Spitzer's study demonstrated not only that testing improved

retention, but that a shorter delay between initial learning and testing is of greater benefit

than a longer interval between studying and testing. Spitzer and colleagues conducted a

large-scale experiment involving the entire population of sixth-grade students in 91

elementary schools in nine Iowa cities, for a total of 3,605 subjects. Students studied one

of two 600-word articles containing information about either peanuts or bamboo. The

Page 72: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

67

students were then split into eight groups and given a 25-item multiple-choice test on the

material over the course of the next 63 days—each group tested with a different retention

interval.

Spitzer also manipulated the number of tests taken by different groups, and the

delay between studying and testing. After studying the passage, each of the eight groups

of subjects was given one, two, or three tests on various schedules across the next 63

days. Some students took a single test 63 days after initial exposure to the material, while

others took earlier tests. Group 1 and Group 2 took an immediate test. All other groups

took their first test after a delay of days or weeks. For example, Group 6 did not take an

initial test until day 21. Figure 5 shows the proportion correct on multiple-choice tests

taken at various delays. The solid lines show results for repeated tests for particular

groups. The dashed line (a visual aid connecting each group by the day of their initial

test) shows that the longer the first test was delayed, the worse was the students’

performance on that test. In all cases, giving a test at some point either slowed, or stopped

forgetting. Groups 4, 5, and 6 all showed an increase in proportion correct after their first

test. The students who took a test sooner after learning the material demonstrated much

greater recollection than the students in those groups who took their first test after a

longer interval. By day 21, forgetting had already reached its peak, and Groups 6, 7, and

8 all show similar performance to each other.

Page 73: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

68

Figure 5: “Testing Schedule Shows a Forgetting Curve”

Source: “The Power of Testing Memory” 185

This figure shows that the longer the interval between initial exposure and the

first test, the worse the subjects’ performance was. The students who took a test shortly

after learning the material demonstrated much greater recollection than the students in

those groups who took their first test after a longer retention interval. Group 2, which had

the best recollection after a 63 day delay, was tested immediately after study, and then

tested again on the same material one week later. This study reveals the importance of

spacing tests out in order to improve retention of material. The landmark studies of Gates

and Spitzer together seems to suggest we administer a series of low-stakes tests

throughout the semester, thereby requiring students to actively recall key information a

number of times with increasingly spaced intervals between subsequent tests. This notion

Page 74: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

69

of an expanded test schedule will be described in more detail in the next section, “The

Spacing Effect.”

While the Gates and Spitzer studies provide support for the testing and spacing

effect, they were performed with elementary school students, a different demographic

than we find in composition classes at the university level. Other studies, however, have

worked with undergraduates. Endel Tulving, for example, examined the recall ability of

undergraduates at the University of Toronto. Tulving (1967) had subjects, three groups of

18 students, learn a list of 36 nouns presented in a random order each study trial. The

purpose of this study was compare retention between groups with various studying and

testing schedules. If S stands for a study trial, and T stands for a test trial, the three

groups were compared in the following manner: Group 1 went through a process of

STST, Group 2 followed a process of STTT, and Group 3 did SSST. During a study

session, subjects looked at the word list and tried to memorize it, and for the test

condition subjects verbally free-recalled as many items as they could in any order, which

the experimenter recorded (see Figure 6).

Page 75: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

70

Figure 6: “Study-Test-Study-Test (STST) Most Effective Learning Strategy”

Source: “The Power of Testing Memory” 185

Tulving showed that testing and studying can produce the same amount of

learning; however the subjects were tested with an immediate retention interval (tested

directly after studying). Later research shows that if long-term retention is measured after

a delay, repeated testing actually produces better recall than the repeated studying. For

example, Karpicke and Roediger (2006) replicated Tulving’s basic result that learning

curves for the three conditions are similar. However, unlike Tulving, they repeated the

test after a 1-week delay. Their results can be seen on the right side of Figure 6. In the

2006 study, subjects returned one week later, and were given 10 minutes to recall as

many words as they could. Their performance was recorded at the end of each minute.

Page 76: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

71

The comparison between the three conditions reveals a positive benefit for the STST

learning condition when long-term retention is the studied variable. Despite the fact that

the subjects who studied repeatedly had studied the words 15 times a week earlier and

those who were tested repeatedly had only studied them 5 times, the recall was greater for

the STTT condition than the SSST group. These results show that, in this study at least, a

balanced mixture of studying and testing is the best method to ensure long-term retention.

Again, we find a study supporting the notion of incorporating frequent tests into the

semester. As instructors we can reproduce the STST condition for our students by having

a number of tests in the class spaced throughout the semester so students are engaged in

the process of repetitive studying and testing.

Roediger and Karpicke (2006b) are not the only researchers to show a clear

benefit for testing over studying as the retention interval increases. Thompson, Wenger,

and Bartling (1978) further confirm these findings. However, they demonstrate that

selectively studying only the material missed on a previous test is more efficient than

restudying all of the material in general. This study also used 40-word lists, but used

different learning conditions, a four-study trial (SSSS), a three-test trial (STTT), and a

condition which personalized the study schedule for each subject (STrTrTr). The study

also included a final test 5 minutes after the learning phase and again 2 days later.

Subjects in the third (STrTrTr) condition studied the word list once, recalled it, studied

only those words they failed to recall, and then recalled the entire list again, and so on for

Page 77: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

72

three more study-test episodes, with the study lists becoming shorter as they performed

better on the tests. Though each study session was personalized, during each test the

subjects in this group recalled the entire list of items of each test trial, not just the items

they had restudied.

The results of Thompson et al. (1978) are printed below in Figure 7. For both the

five-minute and the two-day retention intervals, the group with selective restudying

performed best. At a five minute interval, the SSSS group scored 50%, but fared far

worse after a 48-hour delay, scoring only 22%. The STTT group scored less (28%) on the

initial test than both other groups. However, the STTT group also showed very little

forgetting and 48 hours later the group scored 22% on the test. With a retention interval

of 5 minutes, the STTT group had the poorest performance, but with a retention interval

of 2 days, the SSSS group had the poorest performance. The percentage forgetting was

calculated as follows: [(recall at 5 min – recall at 48 hours)/recall at 5 min] x 100.

Applying this formula to each group (see Figure 7) will show that the repeated study

condition resulted in much greater forgetting as time passed. In line with the other studies

in this chapter, these results show that massed-study helps immediate recall, but

performance declines as the retention interval increases. In addition this study shows that

selective restudying and repeated testing is the most effective combination for ensuring

long-term retention. Once information has been learned and successfully recalled on a

test, it is best for students to spend their time studying the material that they failed to

Page 78: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

73

recall. These findings further support incorporating a number of low-stakes tests

throughout the semester in our composition classes. Additionally this study shows that

the most efficient means of learning seems to be to personalize your study based on those

answers you missed on the previous test, but to continue testing on all items every time.

As composition instructors, we can best help our students learn by administering frequent

tests and directing each student to personalize his or her study, and restudy what they

missed on the last test.

Figure 7: “Proportion Correct in Immediate and Delayed Recall”

Source: “The Power of Testing Memory” 187

The findings that massed study improves initial recall, but loses effectiveness as

the retention interval increases found by Thompson et al. were replicated by Wheeler,

Ewers, and Buonanno (2003). In their study, subjects studied a 40-word list in a repeated

study condition (SSSSS) or with one study session followed by four consecutive recall

tests (STTTT). Consistent with previous research, in a final free-recall test given 5

minutes later or 1 week later, the researchers found an advantage for massed study on an

immediate test, but the massed-study group performed poorly with a retention interval of

Page 79: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

74

one week (see Figure 8). Comparatively the subjects in the study-only condition were re-

exposed to the material 5 times more than repeated-test condition subjects, who were

only re-exposed to those words that they were able to recall after only one study session

(about 11 out of 50 words in the experiment). Figure 8 shows the proportion of words

recalled on immediate (5-min) and delayed (7-day) tests after repeated studying or

repeated testing. The repeated test condition produced better retention than the repeated

study condition. What is most noteworthy about this is the comparative decrease in

forgetting between the two groups. Though the repeated study group performed better

initially, they had a much greater rate of forgetting, and after a 7-day delay, had forgotten

most of what they had learned. These results confirm the power of testing for long-term

retention.

Figure 8: “Word Recall on Immediate and Delayed Tests”

Source: “The Power of Testing Memory” 188

Page 80: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

75

Though there has been a great deal of research on the testing effect in a laboratory

setting, in these studies testing intervals and the amount of time and conditions of testing

are carefully controlled or manipulated. Contrary to the lab, in a class there is great

variability between students’ retention intervals and across students’ study time and

effort. In the laboratory, long retention intervals are typically 1 or 2 days (e.g. Carrier &

Pashler 1992; Hogan & Kintsch 1971; Masson & McDaniel 1981; McDaniel & Masson

1985); in much research, the intervals are on the order of minutes or hours (e.g. Bartlett

1977). However, in a class, the delays between quizzes and exams are typically weeks or

months. According to the article “Testing the Testing Effect in the Classroom,” as of

2007, very few experiments had studied the testing effect at 1-week or longer intervals.

McDaniel cites Roediger & Karpicke 2006, and Wheeler, Ewers, & Buonanno 2003, for

1-week delay, and Butler & Roediger 2007 for a 1-month delay; additionally, Spitzer

(1939) tested students at an interval of 63 days.

In their 2007 study, McDaniel and colleagues tried to create experimental

conditions which would test the applicability of the testing effect in a practical setting

outside of a laboratory. This study was conducted during six weeks of a web-based Brain

and Behavior course at the University of New Mexico, with 35 participants. Each week

all students in the class were assigned approximately 40 pages of textbook reading in the

course. All participants completed weekly quizzes, two unit exams, and a final exam that

Page 81: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

76

were constructed for the experiment. Weekly quizzes included 10 items that were

generated from the content of the reading. Each week, participants received their “quiz”

in a different test format (multiple choice (MC), short answer (SA), or read only (RO)).

On the week when the participants received the RO condition, they simply read facts and

clicked a button marked ‘‘I have read the above statement.’’ Participants were allowed 10

minutes to complete each quiz; immediately after finishing they were provided access to

feedback. Because the quizzes were online, whether or not the students used this

feedback was dependent on their own volition. After 3 weeks of quizzes (one of each

format) participants were instructed to take the first unit test, with all participants asked

the same questions. The same method was repeated for the following three weeks.

Several weeks after completing the second unit test, participants were instructed to take

the final cumulative exam, which combined material from units one and two.

Students’ performance on the unit exams is compared and summarized below in

Figure 9, which shows performance of quizzed versus not-quizzed items collapsed across

units. These results show that testing, but not additional reading, improved performance

on the unit exams for the material which was targeted during previous tests.

Page 82: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

77

Figure 9: “Student Performance Averaged across Unit Exams”

Source: (“Testing the Testing Effect” 508)

These findings demonstrate that tests enhance learning and retention even in the face of

the variable conditions found in a college course setting. This experiment and those

experiments conducted in social studies classes (Roediger, et al. 2010) are, according to

the authors, the first to show the effectiveness of low-stakes quizzing in promoting

retention of course content on summative assessments used in actual classrooms. The

present research shows that the benefits of the testing effect can clearly transfer to the

classroom.

A common concern of testing is whether students are learning complete

conceptual relation of facts or whether they are parroting a particular answer to a

particular question. To address this concern and assess if students had a deeper

understanding of the tests, questions from the course readings alternated between the

Page 83: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

78

weekly quizzes and unit tests so that an alternative portion of the fact was required for the

answer. In the present study, short-answer and multiple-choice quizzes improved

performance more than recognition quizzes did on a subsequent test in which the retrieval

cues had been altered (i.e. a different question stem was provided than during previous

tests). The increase in performance connected with the different wordings which this

study reports is in line with the theory of desirable difficulty, which posits that effortful

retrieval is more beneficial towards long-term retention, because recognition is a less

demanding retrieval task than recall.

Another issue addressed in the basic memory literature is the relative benefit of

cued recall tests (e.g. short answer, essay) over recognition (e.g. multiple choice) tests.

McDaniel (2007) writes

Studies with simple laboratory materials (word or paired associate lists) have found that retrieval through recall benefits subsequent test performance more so than retrieval processes associated with recognition (Cooper & Monk, 1976; Darley & Murdock, 1971; Mandler & Rabinowitz, 1981; McDaniel & Masson, 1985; Wenger, Thompson, & Bartling, 1980. (201)

In fact, in McDaniel (2007), multiple-choice quizzes produced results that were only

slightly better than repeated reading without quizzes. An initial test consisting of

multiple-choice questions often fails to produce a testing effect, presumably because such

questions require little or no retrieval (e.g., Kang et al., 2007). In this present study, the

greatest testing effect is demonstrated with short answer questions rather than multiple

Page 84: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

79

choice questions. In a previous study using word lists, McDaniel and Masson (1985)

found that cued recall produced significantly better performance on a subsequent cued

recall test than did recognition, but importantly half of the time the cues that prompted

recall on the final test were different than those that were provided for earlier study and

testing. This pattern prompted McDaniel and Masson to suggest that retrieval through

recall produces enriched, variable encoding of the target information, more so than

retrieval through recognition. McDaniel cites further studies showing positive transfer

between testing and studying when the wording in the question stem differs between

studying and testing (Glover 1989; Lockhart 1975; Wenger, Thompson & Bartling 1980)

and argues that testing on multiple aspects of a question should produce a deeper

relational understanding of the question. The findings in this study fit with a larger body

of research, including those studies reviewed in this chapter, showing that recall tests are

more beneficial than recognition tests for subsequent memory performance.

This section has reviewed the results of studies demonstrating a testing effect in

multiple learning conditions. While the studies reviewed here are small in number, they

are uniform in suggesting that a balance of studying and testing appears to be the best

method to ensure long-term retention. The purpose of testing should be to gradually

shape production of the desired response so that it can be retrieved out of context, after a

long delay. The landmark studies of Gates and Spitzer established the testing effect as a

paradigm of learning, which has remained consistent in the later studies reviewed in this

Page 85: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

80

chapter. The implications of this research for composition are that student learning and

retention of course material can be facilitated through frequently-administered low-stakes

tests.

The Spacing Effect

The spacing effect is the principle that spacing study sessions is better for

retention than massed study. The spacing effect has two components: 1) spaced study is

better than massed study, and 2) the most efficient method to ensure long-term retention

is through increasingly spaced repetitions of the original material. German psychologist

Herman Ebbinghaus, who used himself as the sole subject, conducted a landmark study

on human memory in the late 1800s which laid the empirical underpinnings for the

spacing effect. This study is widely recognized and extensively cited. He memorized

thirteen sets of nonsense syllables, and then tested himself at various retention intervals

and measured how long it took to forget and then relearn them. These nonsensical three

letter sets of consonant-vowel-consonant words (such as YOP, SEP, XAP, etc.) were

chosen to avoid contaminating the experiment through prior learned associations. In

experiments of astonishing rigor and tedium, Ebbinghaus practiced and recited from

memory 2.5 nonsense syllables a second, then rested for a bit and started again.

Ebbinghaus trained this way for more than a year. He then repeated the entire set of

Page 86: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

81

experiments three years later to further confirm his findings. Finally, in 1885, he

published a monograph called Memory: A Contribution to Experimental Psychology.

Ebbinghaus’s findings and his book established the theoretical foundation for the study of

memory that psychology has relied on since. His results have been replicated in

numerous studies and serve as a foundational precept for our understanding of human

memory and learning.

Ebbinghaus identified some important empirical relationships in memory, such as

the retention and learning curves. He studied the amount of time it took to learn the list

initially and then how long it took to relearn the list, with “learning” defined as the ability

to perfectly recall the list twice. In one study he found that it took 1156 seconds to

initially learn the set, but later it took only 467 seconds to relearn the list. He found initial

forgetting was rapid but the rate of forgetting slows down over time. This was the first

expression of what has been found in virtually all studies of human learning since: the

negatively accelerated learning curve.

Page 87: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

82

Figure 10: “Hypothetical Forgetting Curve 1”

Source: (“Learning by Spaced Repetition”)

Robert Bjork, working with Thomas Landauer (1978) of Bell Labs, published the

results of two experiments involving nearly 700 undergraduate students learning names.

Each student was given a rearranged deck of cards bearing — for initial presentation trials — first

and last names of fictitious people or — for test trials — first names only. Subjects turned

through the cards at a 9 sec. rate in time to a signal, studying and writing last—name answers as

appropriate. Next there was a 30 min. retention interval filled with a distracting lecture, followed

by a final retention test. Landauer and Bjork were looking for the optimal moment to

rehearse something so that it would later be remembered. To determine this, they studied

the effectiveness of an expanding retrieval schedule (i.e. an increase in the retention

interval after every act of retrieval) compared to an equally-spaced retrieval schedule.

Landauer and Bjork found that the expanding-interval schedule produced similar recall as

equal-interval testing on a final test at the end of the session, and both produced better

Page 88: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

83

recall than did initial massed testing. Their results led them to theorize that the best time

to study something is at the moment you are about to forget it: retrieval right on the

threshold of forgetting produces the greatest gains in retention. In their words: “Successful

tests are more effective than repetitions. This could either be because tests induce greater

encoding effort, or because they are more similar to the performance required at eventual recall”

(631). This is in line with the theory of desirable difficulty described earlier. They found

that the expanding retrieval schedule produced a 10% increase in retention over the

equal-interval schedule. In practical terms, this finding suggests that tests should be

administered shortly after learning to ensure initial encoding, and then repeatedly

administered at increasingly spaced intervals.

An implication of the spacing effect is that spaced study will be more effective

than massed study. Massed study is cramming a lot of material in a short amount of time.

As we know, we can pass tests by cramming if the test is taken shortly after cramming,

but that kind of knowledge has very little storage strength. Dempster (1987) conducted a

study in which subjects were showed paired-associate English vocabulary words and

their definitions three times. One control group did immediate massed study (cramming),

while the other group did spaced practice with other items in between. The second

condition resulted in much greater retention than the massed schedule. Carpenter and

DeLosh (2005) also found in their article “Application of the Testing and Spacing Effects

to Name Learning,” where subjects learned paired-associates of names and faces, that

Page 89: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

84

spaced study resulted in better retention than massed study. Both experiments in that

study showed that final retention was better for the spaced conditions than the massed

conditions, and this held true for different spacing intervals and for both studied and

tested items. All of these studies are in line with a much larger body of literature which

reports that retention is better for spaced study than massed study (e.g. Hintzman 1974;

Melton 1970; Dempster 1987) and better for spaced than massed testing (e.g. Cull et al.

1996; Cull 2000; Glover, 1989; Izawa, 1992; Landauer & Eldridge, 1967; Modigliani &

Hedges, 1987). Ruch (1928) published a review of dozens of studies of the spacing

effect; for a more recent review, see Cepeda, Pashler, Vul, Wixted, & Rohrer (2006).

H.L. Reodiger summarized these findings in the field when he was asked in a

2012 interview “How many times should one get people to retrieve things, and how soon

after learning?”:

F. Mary Pyc and Katherine Rawson at Kent State University showed that for simple things like foreign language vocabulary, retrieving about five to seven times is about right —if you test people a week later you wouldn’t see much difference between having tested people seven times or ten times, but you do see gains going up to the range of five to seven times. After that it just levels off. But most people would only practice once or twice, so the idea of going up to five or seven retrievals seems like too much to many people. Of course, to keep knowledge at your mental fingertips, you would need continued spaced retrieval practice, too. (Kleeman “Professor Roddy Roediger…”)

Five to seven repetitions over the course of a sixteen week semester should be enough. If

teachers administered tests once a week, this would provide ample opportunities for five

Page 90: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

85

to seven repetitions of each test item, which in turn would induce long-term potentiation.

Spaced repetition relies on the principle that information does not have to be

repeated every day in order to ensure long-term retention. While repeatedly studying the

same information every day would indeed foster long-term retention, it would also be

boring and inefficient. Spacing study is a more efficient way of studying. Figures 11 and

12 conceptualize the learning process through spaced repetition. The figures do not depict

direct findings from a study, but they do help illustrate the learning curves described in

this chapter. This modified graph of a forgetting curve conceptualizes the learning

process through spaced repetition. It shows that, from the time when information is first

introduced, if no reminder is given, then the likelihood of remembering it dramatically

drops in days. The likelihood of correctly remembering an item of information is

expressed on the Y axis in terms of 0 to 100%. Time elapsed since the original learning

event is represented on the X axis. In the image, a horizontal bar extends from the 90%

chance of correct recall—near perfect memory. A negatively sloping curve represents the

average forgetting curve and show that forgetting increases rapidly over time. However if

a reminder is given before the slope of the curve drops below the horizontal bar which

represents 90% likelihood of remembering, then long-term retention can be maintained.

Interestingly, the spacing effect shows us that the period of time needed between future

reminders to maintain a memory stability of 90% will increase after each subsequent

repetition.

Page 91: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

86

Figure 11: “Hypothetical Forgetting Curve 2”

Source: (“Want to Remember Everything You’ll Ever Learn?”)

Page 92: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

87

Figure 12: “Hypothetical Forgetting Curve 3”

Source: (“Spaced Repetition and the CFA Exam”)

The studies in this section illustrate that spaced testing is more effective for long-

term retention than massed practice, and that it is more efficient than equally-spaced

testing. The studies in this chapter reveal the benefits for frequently administered low-

stakes tests; these tests would provide structure and support for repetition. If our goal is to

maximize learning, then we should design curriculum based on the numerous studies that

show repetition is the best way to optimize learning. If we agree that long-term retention

of classroom content is a desirable thing, then it becomes clear that we should provide

repetitions of class content at spaced intervals to best ensure long-term retention. Because

Page 93: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

88

spacing is a more efficient means of studying than massed study, we can see a clear

benefit to designing tests that use the spacing effect. The studies reviewed in this section

show that teachers should repeatedly test on the same items, and gradually reduce the

frequency of those items on subsequent tests as they introduce new items. The studies

support the use of frequent tests in composition to ensure students' long-term retention of

key course content.

Page 94: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

89

CHAPTER 5: TEST-ENHANCED LEARNING IN COMPOSITION

Indirect Benefits of Testing

Chapter 5 offers a number of specific examples of how our current composition

practice can be enhanced through testing. The composition textbooks which we already

use contain testable content—in fact many rhetoric textbooks contain ready-made tests

under the guise of “exercises.” These kinds of “exercises,” as we call them, are structured

similarly to tests and can be administered in class to enhance learning. Devising tests and

administering them effectively is a complicated art, but also a skill that can be developed.

This chapter presents some general guidelines for how to form tests and how to

administer them. The testing effect on its own provides a strong argument for enhancing

our current practice by administering frequent, low-stakes tests. However, in addition to

the testing effect, there are indirect benefits of testing, which this section will review.

The dialogic class that Finkel portrays (described in Chapter 2) assumes that all of

the students have come to class totally prepared by having done the reading, having

thought about the reading, and are interested and ready to engage in an animated

discussion about it. According to authors Jacobs and Chase, writing in their book

Developing and Using Tests Effectively, “If we do not ask questions on the content of

Page 95: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

90

outside readings, then most students will not read the materials” (17). Without proper

preparation on part of the teacher, Finkel’s class is a beautiful fantasy. To turn this

fantasy into a reality, it is the teacher's job to create the conditions that lead to optimal

learning. In a collaborative model of learning, learners of all skill levels improve as a

result of collaborating within the Zone of Proximal Development but only if those

learners are cognitively capable of being in that ZPD; effective collaboration requires

capable peers. When students neglect to do the reading or other coursework, then it is

impossible to maintain an effective ZPD for everyone in the class. Teachers can help

ensure that students are capable peers by requiring them to express course content in their

own words through frequent tests.

Frequent testing requires students to space their study efforts throughout the

semester rather than concentrating them on cramming right before an exam—a method

which research and common experience shows is utterly ineffective in facilitating long-

term retention. Roediger and Karpicke (2006) write “To state an obvious point, if

students know they will be tested regularly (say, once a week, or even every class

period), they will study more and will space their studying throughout the semester rather

than concentrating it just before exams (see Bangert-Drowns 1991; Leeming, 2002)”

(249). Frequent tests will prevent students from cramming for high-stakes exams. We all

know that with dedicated cramming the night before a test many individuals are able to

pass their exams. By cramming, students are able to effectively hold data in their short

Page 96: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

91

term memory; however, they cannot perform as well on that same exam a week later or a

month later. Because of the possibility of cramming, tests should not be used as an

exclusive means of assessment; the results will become more reliable if tests are frequent

rather than infrequent. Frequent testing in classrooms encourages students to study

continuously throughout a course, rather than bunching massive study efforts before a

few isolated tests. This process simulates an STST model, the learning model which was

most effective in the studies reviewed in Chapter 4.

Frequent quizzing might also reduce test anxiety, a trouble that plagues many

students. Test anxiety generally comes from how heavily exam grades can be weighted ,

but frequent testing is a low-stakes means of retrieval. Instructors could even start with an

ungraded quiz that can serve as a model of future and serve as an assessment of current

student knowledge. Roediger and Karpicke (2006), Pooja et al. (2008), and McDaniel

(2007) all report increased student confidence from taking frequent low-stakes tests. In

their 2006 study, Roediger and Karpicke found that students self-reported increased

levels of confidence as a result of frequent low-stakes quizzes in class. Similarly,

McDaniel and colleagues (2011) write that 64% of the subjects (out of 139 eighth-grade

science students) reported that tests reduced their anxiety of taking the unit exam and

89% reported that the tests increased learning (404). The researchers in that study also

reported that they observed disappointment on behalf of the students on days when the

quizzes were not included in class.

Page 97: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

92

Another benefit of testing is that it helps include shy students who avoid joining

discussion. Many teachers are uncomfortable about calling out students in class, and

because there are students who do not participate in class discussions as much as their

peers, those students have fewer chances to reproduce class content in their own words.

Providing tests can vary the format of presentation to accommodate those students who

do not participate in class discussion or group work as often. Rather than excluding those

who are too shy to pipe in, tests provide alternative retrieval opportunities.

In addition, frequently administered tests provide a more holistic approach to

assessment, with assessment occurring over the course of the semester rather than

infrequently through high-stakes tests. Frequent tests give feedback to an instructor which

can help assess student capability and identify misunderstandings. This frequent

assessment provides teachers with the information they need to update the course

curriculum and maximize student learning. If many students are unclear on a particular

topic, then test results would make that misunderstanding apparent and would allow

instructors to modify their teaching accordingly. In the same way, tests can be used at the

beginning of a semester to see what skills students bring to a class. A writing test

administered at the beginning and the end of the course can be instructive for

composition instructors by demonstrating students’ growth over the semester.

Transparency is another indirect benefit of testing. Tests show students exactly

what content the teacher considers important, which makes learning goals more explicit.

Page 98: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

93

Making our expectations for the course clear for students is important because

transparency helps direct student study efforts and generates expectancy. If students

know what questions they will be expected to answer, then they will read with that goal

in mind. Corroborating this, Roediger and Karpicke (2006) report that when students

know they will be tested, they will come to class more prepared. Typically students are

assigned large amounts of reading, and for many students this is problematic because

they do not know what material is important to remember. Obviously they won’t

remember the entire book. Generally students will highlight as they read, but without

guidelines for what material is important, students often find themselves staring at pages

covered in yellow, which undermines the entire purpose of highlighting. With specific

reading questions, the teacher is essentially highlighting the reading for the student, and

demonstrating the importance of that material by re-exposing it through tests. Reading

questions help scaffold student’s learning by emphasizing what they should focus on.

Goal-oriented reading with a study guide and reading questions will help ensure a higher

degree of retention of key concepts.

Many instructors utilize web-based instruction such as Moodle or Blackboard, or

present content through their own websites. These mediums are well suited to integrate

test-based learning because the technology allows for easier managing and conducting of

tests. As a graduate student, I taught a section of English 104: “Accelerated Composition

and Rhetoric” at Humboldt State University. Tests could be posted on a regular

Page 99: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

94

schedule—for example, every Friday, as I did in my class. Moodle already has testing

software built in that can automatically grade many types of test items and provide

feedback immediately after testing. It also logs user information that can be made

transparent to the students, such as class participation and grades. In addition, utilizing

these web-based mediums for test administration does not require any valuable class

time. Test-enhanced learning does not require any substantial change in our current

education system, and it works very efficiently with our current web technology.

The capacity for web-based mediums to give instant feedback is truly a boon,

because feedback from frequent tests will also help students guide their study efforts.

Feedback of course means knowing if you answered correctly or incorrectly. Feedback

can be instantly provided for questions with a simple answer, but for short-answer or

essay questions, the instructor will have to give feedback manually. If students test

themselves periodically while they are studying, they may use the outcome of these tests

as a guide for future study. McDaniel, Mark A. et al. in “Testing the Testing Effect in the

Classroom” corroborates the importance of feedback:

The results are compelling for feedback effects after missing a short-answer quiz item. Clearly, learning and retention were better when students were given feedback after missing a short-answer question than reading the fact (twice) without being quizzed. (505)

Frequent test results allow students to self-assess where they are in comparison with the

expectations of the teacher and the course, and this feedback gives students the

Page 100: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

95

information they need to adjust their studying accordingly. On the other hand, a lack of

feedback can result in continuation of errors—that is, when students answer a question

incorrectly on a test, but think they answered it correctly. The research in previous

chapters has shown that retrieval helps build long-term retention, so, if students respond

and make an error, but did not receive feedback, they may have stamped that error into

memory. In other words, because retrieval enhances learning, it is likely they will

continue remembering that error. Agarwal, Pooja K. et al. in “Examining the Testing

Effect with Open- and Closed-book Tests.” write

Prior research on the testing effect has shown that if students make errors of commission on an initial test and do not receive corrective feedback, they may retain those errors on later tests and run the risk of incorporating false information into their general knowledge (see Butler, Marsh, Goode, & Roediger, 2006; Roediger & Marsh, 2005). (862)

Giving feedback is as worthy of care, intelligence, and imagination as making up the test

in the first place.

With high-stakes exams, it can become a very tragic and emotional situation as

students plea for exceptions and makeups. However, low-stakes tests spread out the

grading weight over time such that students can miss the occasional test and not have

serious concern for passing the course. A further benefit of frequent low-stakes tests is

that it allows professors to have a very simple makeup policy for tests: never.

As we have seen, there are many reasons to test. In addition to these indirect

benefits, the research reviewed in this project has shown that active retrieval is the best

Page 101: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

96

method to ensure long-term retention. Assumedly we attend school in order to learn—if

we are interested in learning material, isn’t it worth remembering it? If this is the case,

then, as teachers, we should try to facilitate long-term retention of course material. I

argue throughout this project that frequent, low-stakes tests are the best way to achieve

this.

Testing and Grades

In order for students to study for tests, they must believe that these tests are

important. Teachers can ensure that students take tests seriously by making tests a

required and graded component of the course. Grades help structure student effort, and

grading tests imparts on students that understanding course concepts is important. I know

from experience (both as a student and as a teacher) that if an assignment doesn't have

some graded value, it is easy to let it slip to the wayside and neglect to do it. Frequently

this is done with the intention of going back to it later only to find that "later" is so full of

critical assignments that there is no time to do the ones that are not required. Reading

assignments can end up being quickly skimmed rather than thoroughly considered.

Writing assignments that are not worth a graded value may be hastily written by students

the night before, or directly prior to class. If we consider that students have busy lives

outside of the classroom, then students neglecting assignments without a grade value

Page 102: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

97

makes more sense. Students are indeed busy outside of the classroom; they are whole

people with complex lives—many students have to juggle a job, their personal or family

lives, and school all at once. Grades can be seen as a system that allows students to assign

value to assignments and prioritize. Assigning a grade value to tests sends a message to

students: “here are the really essential things to learn and remember from this course.”

Despite the ubiquity of grades in the university, typically there is no grade value

assigned to thoroughly reading and understanding assigned texts. Rather, readings are

assigned, and they may be briefly discussed in class, but time constraints prevent

thoroughly discussing all assigned material, which means may often not be held

accountable for completing assigned readings. Performance in many other aspects of the

class is contingent upon an understanding of the assigned texts—class discussions, essay

writing, homework assignments, and so on. So we as teachers must ask, “How can we

ensure that students have adequately read and internalized class material that will be the

focus of discussions?” Professors John-Stiener and Meehan, writing in Vygotskian

Perspectives, have considered this issue as well. They argue that “Shallow

internalizations leads to a facile combination of ideas. In contrast, working with, through,

and beyond what one has internalized and appropriated is part of the dialectic of creative

synthesis” (35). To ensure adequate internalization, we need to develop ways to ensure

that students are keeping up with their reading assignments. Otherwise a student who has

not engaged with the material may end up just hiding in the crowd during class

Page 103: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

98

discussion, and group work will suffer as a result. One method of achieving consistent

completion of assigned reading is through frequently-administered quizzes. As teachers

we can encourage students to study the assigned readings by holding them accountable

during tests.

The Benefits of Tests According to Three HSU Professors

For the purposes of this project, I conducted interviews with many professors at

Humboldt State University. I received feedback from many professors, all of which has

been influential in this project. For purposes of brevity, I have narrowed down these

interviews to three professors. These were personal interviews and by no means should

these reports be considered authoritative or empirical. This is anecdotal information.

Professor Corey Lewis, an instructor of composition with more than thirteen years

of experience, thinks we’ve moved away from testing as part of a pendulum swing away

from top-down pedagogy to Expressivism and student-centered teaching. Current

composition practices recognize that students need to write and work with writing to

improve, but he argues that with the abandonment of tests we’ve lost what is clearly an

effective instructional technique, stating, “I don’t know anyone who is on a regular

systematic basis testing students on skills and content that directly relate to writing in

composition” (Lewis Interview). He describes having an epiphany while lecturing in

Page 104: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

99

class; while discussing some key terms for the class and writing them up on the board, he

looked around the class and noticed that out of 22 students maybe 3 of them were taking

notes. Without taking notes and reviewing them, there is just no way students could

remember a long list of terms like those he had written on the board. Professor Lewis

tells me he assumed that students were trying to learn the terms, but it became clear from

observing students that they were not putting forth the effort necessary to memorize.

Today, it seems clear that many students are not putting forth the diligence

necessary to learn classroom material. In part this lack of effort may be a result of many

students needing to work while goind to school. Today students are faced with tuition

that costs a fortune. There are budget cuts, furloughs, and fewer resources available,

which means that most students work in addition to going to school. Many students work

full-time in addition to being full-time students. Though this may not be the case for

every student, it is useful for instructors to adopt the mindset that students are not

neglecting their studies because they do not care, but for various and complex personal

reasons. This gives educators the choice of either complaining about the current situation,

or creating the structure that students need to learn. Teachers can help students by

providing a structure for them to succeed.

Professor Lewis told me that it was our discussions about testing that brought the

need for testing to mind, and he believes that other professors are probably in the same

situation. They realize that students aren’t internalizing classroom content, but they are

Page 105: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

100

not sure how best to facilitate that learning. Because most current composition and

English methods of instruction train us to use class discussions, workshops, and writing

conferences rather than tests, many composition instructors have not been trained to

effectively use tests in the same way they have been taught to use other pedagogical

methods.

Professor Janet Winston, another professor whom I interviewed for this project,

told me that the biggest challenge she faces as a literature teacher at HSU is getting

students to actually do the assigned reading. Though the focus of this project is teaching

composition, not literature, the two fields both necessitate that students put forth the

effort to thoroughly read and understand assigned readings. Janet’s success in a literature

classroom can be used as an example for the composition classroom. According to

Professor Winston, many of her students come from a lower middle class background--

frequently they are the first in their family to go to college, and generally they have to

hold jobs to support themselves on top of their academic duties. When students are

stretched so thin, it is understandable that they would ignore assignments that they see as

unimportant—namely, assignments without a grade value. Professor Winston reasons

that we’ve become so enamored with grades that if an assignment doesn’t have a grade

value then students regard it as unimportant. From this perspective, she argues that the

lack of a grade value for the assigned reading was the problem. The lack of a grade value

assigned to reading comprehension can seem to portray that reading assignments are not

Page 106: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

101

important. By utilizing regular quizzes which include questions related to reading

assignments, instructors can assign a specific grade value to a thorough understanding of

the reading material, which in turn tells students that reading is indeed important.

To fix this problem, Professor Winston began giving reading quizzes once a week

that covered topics in the assigned reading. She was reluctant to use reading quizzes

originally because they seemed too narrow-minded. She didn’t want students to think

about literature from a convergent thinking approach and assume that there was only one

correct interpretation. Indeed, Professor Winston found that she had trouble at first

formulating open-ended questions that didn’t lock students into a right or wrong

response. However, over time she learned how to formulate better questions. Today her

questions are generally open-ended and require a paragraph or so of response. Rather

than focusing on plot details, they instead require a demonstration of understanding of the

broader concepts in the reading and class content. Professor Winston tells me that based

on the quiz results she can say with confidence that at least 90% of the class is keeping up

with the reading—a participation rate that many instructors would find enviable. Winston

reports that this participation rate has resulted in enhanced class discussion.

I also interviewed Professor Robert Cliver, in the history department. Professor

Cliver tells me that history is all about reading and writing—it’s about narratives, not just

memorizing facts. He reports that indeed the facts are important, but understanding their

interrelations and discussing them is what makes a good historian. Rote memorization is

Page 107: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

102

what machines do; creative synthesis is the mark of a good scholar. These are the same

sort of qualities that we want to impress on students in a composition course as well.

Professor Cliver told me that when he first started, he didn’t give many quizzes. He

quickly learned from student feedback however that students wanted more tests. He

realized from student feedback and the results of giving quizzes, that tests can help

incentivize doing the work and encourage students to come to class. According to

Professor Cliver, a single final exam doesn’t help—“it just tests if students are good at

taking tests” (Cliver Interview). He came to learn that the traditional comprehensive end-

of-term exam is useless because having few high-stakes exams encourages cramming and

putting off work. For these reasons, Professor Cliver now gives frequent tests at the end

of class, which serve as a review for material covered during lectures and in the readings.

Though tests could be viewed as a “police measure” to simply ensure that students have

done the reading, Professor Cliver argues that tests are not just busy work, because the

activity teaches students what course content should be emphasized.

In addition to the quizzes, Professor Cliver also utilizes take home exams. He

reasons that they help students practice writing. The combination of these two activities is

what he finds most successful. Quizzes help students internalize fact-based course

content, and the take home quizzes necessitate students using that knowledge for a

creative synthesis in the form of writing. He shared an example question that might

appear on a take-home exam: “Describe how the end of WWII affected east Asia.” In

Page 108: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

103

order to answer this question, students must have a firm grounding of the historical facts,

such as the Communist party coming to power, the alliance of Japan and the US, and the

division of Korea. The facts are necessary, but simply listing these facts is not enough for

good writing. The combination of frequent in-class tests and take-home essay exams

gives Cliver's students multiple opportunities for retrieval, as well as for the creative

synthesis of facts.

Building, Administering, and Grading Tests

We don’t want students to just memorize facts; we want them to demonstrate they

understand the principles behind the material and apply that learning to new situations.

To achieve this goal with tests, we must have a systematic structure—something to tie the

test type and content to course objectives. Bloom (1956) describes six cognitive skills in

a hierarchy from simple to complex: (1) knowledge, (2) comprehension, (3) application,

(4) analysis, (5) synthesis, and (6) evaluation. Instructors should understand these six

categories, and clearly understand which one of these categories any particular test item

falls into. Developing and Using Tests Effectively provides a list of the kind of question

wording each of these types may use, which I have abridged and re-presented here. (1)

Knowledge questions involve the recall of learned material through activities such as

remembering facts, definitions, or principles. Common questions testing knowledge

Page 109: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

104

involve wording such as: “define,” “list,” “state,” “identify,” “label,” ”name,” “who,”

”when,” “where,” and “what.” (2) Comprehension questions require a more in-depth

understanding of learned material. Question wordings typically include: “explain,”

“predict,” “interpret,” “infer,” “summarize,” “convert,” “translate,” “give example,”

“account for,” or “paraphrase.” (3) Application is the ability to use (transfer) learned

material in a new situation or context. Application questions ask students to use concepts

to solve a problem. Typical wording or questions include: “apply,” “solve,” “show,”

“make use of,” “modify,” “demonstrate,” “compute.” (4) Analysis questions ask students

to break down material into component parts so that the organizational structure is

understood. Typical question wordings include: “differentiate,” “compare and contrast,”

“distinguish ___ from ____,” “how does ___ relate to ____,” or “why does ____ work?”

(5) Synthesis asks students to put parts together to form a new whole. Question wordings

include: “design,” “construct,” “develop,” “formulate,” “imagine,” “create,” “change,” or

“write a poem or short story.” Finally, (6) evaluation questions judge the value of

material for a given purpose using definite criteria. Typical question wordings include:

“appraise,” “evaluate,” “justify,” “judge,” “which would be better?” Considering their

local situation and their class needs, instructors should develop a table of the types of

questions they want to ask (e.g. 30% knowledge, 50% comprehension, 20% application).

This will help teachers plan and create tests, and ensure a balance of question types.

Having a mix of question types capitalizes on a balance of their various strengths.

Page 110: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

105

When generating tests, we want tests to match the students’ expectancy. Since we

are asking students to spend time studying for these tests, it is important to help students

understand what material is important, and what they can expect from a test. We can

improve student expectancy by clearly designating test dates and the approximate amount

of time the test will take. Expectancy will further be increased if the instructor ensures

that all students take the same test with the same questions at the same time. We should

also indicate the approximate worth of test questions and the amount of time that students

should spend on particular tests According to Developing and Using Tests Effectively, as

a rule of thumb, allow about one minute per item with multiple choice and half a minute

for each true/false item. Short answer questions requiring a sentence or two will take

about two minutes to answer, and we should allow ten to fifteen minutes for a short essay

and thirty for an essay requiring two to three pages (72).

There are also some important things to understand regarding grading tests. The

location of the student’s paper in the stack can have an influence on the score assigned to

it by the reader. According to Bracht and Hopkins, (1968) the first papers read tend to get

higher scores than later ones. The reader tends to judge a paper harshly if it is preceded

by a well-written paper; if the previous paper is poorly written, the essay is judged

generously. Several studies show that the quality of handwriting, grammar, and spelling

(James 1927; Sheppard, 1929; Chase, 1968; Marshall and Powers, 1969) can all have an

impact on the scores given to as essay. We can improve reliability of scoring by

Page 111: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

106

concealing student’s names until after a score has been assigned. This keeps instructors’

achievement expectations for their students from affecting their judgment of essays.

Other techniques include, reading only one item across all tests before going to the next

item, then reshuffling the stack of papers before going to the next. Reshuffling guarantees

that no paper will repeatedly suffer from following a good paper and none will reap the

advantage of repeatedly following a poor one.

According to Developing and Using Tests Effectively, instructors often give

inadvertent clues on tests which help students guess the correct answer. Because these

clues make questions easier, they also undermine the students’ learning process

(according to the theory of desirable difficulty). Specific determiners (e.g. “all” or

“always”) depict a situation as absolute or as qualified in a way that can lead a student to

guess that the question is probably false. Qualifying terms such as “sometimes,”

“usually,” or “typically” are uncertain enough to suggest that the question is more likely

to be true. A question wording like “The answer is a ______” has an embedded

grammatical clue. In this case only answers that begin with a consonant would be

grammatically correct, so we should modify the original to state “The answer is a(n).” By

the same token, we should make all blanks the same length so that they do not provide a

clue to the length of the answer. Other aspects to avoid include using “all of the above” or

“none of the above,” because these are too easy as distractors. Mentzer (1982) examined

thirty-five files of multiple choice test items for evidence of biases in the correct answers.

Page 112: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

107

The most frequently occurring bias in that set was the “all of the above” response, which

was correct more than 25% of the time. If you do use multiple-choice questions, make

sure you mix them so they are not always correct. Furthermore, the correct answer should

be placed randomly, rather than in a favored letter. For example, option ‘C’ is over-

utilized as the correct response according to the same study, and option ‘A’ is

underutilized. Place the correct answer in each of the alternative positions approximately

an equal number of times but in a random order. Furthermore, avoid vague indefinite

terms denoting degree or amount, such as in a question like “T/F: A long time ago trees

covered a very large part of present-day Wyoming.” “A long time ago” could mean

anything from 1850 to 10 million years ago. Indefinite terms will make scoring less

reliable and probably confuse the test taker. Instead, use definite terms which allow for

only one correct response.

Recognition and Free-Response Questions

It is commonly accepted that, as a part of teaching writing, critical thinking is a

main component or aim of instruction. Critical thinking is both a method of thought and a

complex set of varied skills. Part of critical thinking is the ability to consider multiple

positions and reason towards the most likely conclusion. Tests, if used properly, can

encourage the development of critical thinking. Some forms of testing are better at

Page 113: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

108

developing critical thinking skills than others such as essay or short-answer formats.

Short-answer questions are particularly effective for composition instruction and

developing critical thinking. For example, questions such as “What do you think was the

most effective form of evidence used in this article?” or “How did the author deploy

pathos to advance her argument?” not only require a sophisticated understanding of the

material, they also require critical evaluation of the author’s argument, and personal

articulation using the disciplinary language of composition.

There are two main types of test questions: recognition and free-response. Essay

questions and short answer questions are free-response. There are several advantages of

using free-response questions. They are best when assessing complex learning outcomes.

They are also relatively easy to construct. They also do not permit students to get a score

by guessing or bluffing (in most cases). However, the limitations are that they are

difficult to score, much more time consuming to grade than recognition questions,

scoring is more subjective, and a test consisting primarily of free response questions

limits the sampling of content due to time restraints. Recognition tests, such as matching,

cloze deletion, T/F, or multiple choice also have benefits and limitations. Recognition

tests allow for inclusion of much more content in a test than free response. Recognition

type items (multiple choice, T/F, matching) require students to select the correct answer

among several options. Recognition tests allow for more questions because each takes

less time to answer, which broadens the coverage of content. These qualities makes

Page 114: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

109

recognition tests difficult to construct, but easy to score. Because recognition items

narrow down the range of possible answers, they are susceptible to guessing. However

this also makes scoring more objective and reliable.

Free-response requires students to organize and express answers in their own

words. Limiting the breadth of the essay question allows the answer to be relatively brief

and specifically tied to a single objective. A broad question like “What were the

conditions that led to the Civil war?” isn’t as good as a narrower question like “Compare

and contrast the role of agriculture in the economies of the North and South at the

outbreak of the Civil War.” A narrower question will produce a narrower response. This

will in turn improve the reliability of scoring. Rather than assessing factual content, a

method much better suited to other test formats, the essay test should be used for

assessing outcomes that require higher-level cognitive functions. Some examples that are

appropriate for essay questions are the following prompts: “Present arguments for and

against _______,” “Illustrate how a principle explains facts,” “Illustrate cause and

effect,” “Describe an application of a rule or principle,” “Evaluate the adequacy,

relevance, or implication of this data,” “Form new inferences from data,” “Organize the

parts of a situation, event, or mechanism and show how they interrelate into a whole,” or

“Sort out the relevant parts as distinct entities from a total situation, event, or

mechanism.”

The “stem” is the heart of a recognition test question. The stem should present the

Page 115: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

110

problem with precision and clarity. Wordy problems need to be modified to reduce

unnecessary information. Always positively state the question stem; otherwise, call the

students’ attention to the negative. After writing the stem, write one correct or clearly

best answer, and three or four plausible distractors. This will help include as much of the

item material as possible in the stem, and prevent repeating words or phrases in each

distractor that could be put in the stem one time. Instruct students to choose either the

correct answer or the best answer. Some questions will have multiple possibilities but one

best answer that experts would agree on.

Writing distractors is probably the most difficult and most important part of

building multiple-choice items. Distractors should be designed around common errors

that students make or misconceptions they may have. A useful strategy in designing

distractors is to phrase an item in the form of a completion of a short-answer question.

Think of the incorrect response that students would be likely to make to the question and

let these be the distractors in the multiple-choice item. The distractors must be incorrect

but they should have enough plausibility to attract students who do not know the material

very well. Avoid writing absurd distractors. While they may be humorous or light

hearted, they increase the likelihood of students guessing the correct response because

they narrow the range of possible answers.

In cloze deletion, such as “There are _____ members of the U.S. house of

representatives and _____ members of the senate,” students get minimal cues and must

Page 116: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

111

construct the answer. Cloze deletion items should be answered with a single word or

prhase, and statements should be worded so that they have only one right answer. For

example “The battle of Lexington was fought in ______” can be answered in several

ways, and it should be reworded to “The battle of Lexington was fought in the year

_____.” While in multiple choice tests, the students must only recognize the correct

response and choose it among the responses given; therefore the demand on students is

greater for cloze deletion items.

Also bear in mind the saying “a picture is worth a thousand words.” Graphical

occlusion can work very well for charts or mind maps or images with captions. Graphic

deletion works like cloze deletion but instead of a missing phrase it uses a missing image

component. Mind maps, charts, and other diagrams can be effective learning tools and

graphical occlusion allows for the information in these images to be tests on.

Concluding Discussion

There is much disciplinary content that can and should be tested on in

composition. First, for example, we can test students on the various stages of the writing

process and the different strategies that can be used in each one. Similarly, there are

many disciplinary specific terms that students need to know, such as “diction,” “syntax,”

“thesis-driven,” “transitions,” “juxtaposition,” and so on. And, since we teach several

Page 117: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

112

different genres in composition the conventions of each genre such as the use of character

development and dialogue in personal essays, and logical arguments and textual evidence

in persuasive research writing can be tested on. And, of course, students can be tested on

their knowledge of grammar and punctuation rules and their ability to fix editorial errors.

I have argued in this project that tests can increase the knowledge that students have at

their disposal. Tests, alongside our current practices in composition (student-centered,

dialogic, multiple drafts, and active writing time) should improve creativity, and help

students gradually improve their writing. Unfortunately, tests are widely reviled for their

excessive role in assessment, which can result in a blanket dismissal of all forms of

testing. However, there is abundant research which shows that tests can be very effective

learning tools. Studies in cognitive science and neurophysiology both show that retrieval

practice is a necessary condition for long-term retention, and tests can provide retrieval

opportunities for students. It is my hope that this project will encourage instructors of all

kind, but especially composition instructors, to integrate frequent tests into their own

classes.

Page 118: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

113

WORKS CITED

Abercrombie, M. L. J. The Anatomy of Judgment; an Investigation into the Processes of

Perception and Reasoning. New York: Basic, 1960. Print.

Ambrose, Susan A. How Learning Works: Seven Research-based Principles for Smart

Teaching. San Francisco, CA: Jossey-Bass, 2010. Print.

Aristotle. Aristotle's Psychology: A Treatise on the Principle of Life. S.l.: Hardpress,

2013. Print.

Bacon, Francis. The New Organon. Ed. Lisa Jardine and Michael Silverthorne.

Cambridge: Cambridge UP, 2000. Print.

Bakhtin, Mikhail Mikhaĭlovich. The Dialogic Imagination: Four Essays. N.p.: U of

Texas, 1981. Print.

Ball, Arnetha F., and Sarah Warshauer. Freedman. Bakhtinian Perspectives on Language,

Literacy, and Learning. Cambridge, UK: Cambridge UP, 2004. Print.

Bruffee, Kenneth A. "Collaborative Learning and the ‘Conversation of Mankind’."

College English 46.7 (1984): 635-52. Web.

Byrne, John H. "Synaptic Transmission in the Central Nervous System." Neuroscience

Online: An Electronic Textbook for the Neurosciences. The University of Texas

Medical School at Houston, 12 Mar. 2014. Web.

Page 119: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

114

Carpenter, Shana K., and Edward L. DeLosh. "Application of the Testing and Spacing

Effects to Name Learning." Applied Cognitive Psychology 19.5 (2005): 619-36.

Web.

Carter, Michael, C. Miller, and A. Penrose. "Effective Composition Instruction: What

Does the Research Show?" Communication in Science, Technology and

Management 3rd ser. (1998): n. pag. Web.

Cepeda, Nicholas. ""Distributed Practice in Verbal Recall Tasks: A Review and

Quantitative Synthesis." Psychological Bulletin 132.3 (2006): 354. Web.

Chase, Clinton I. "The Impact of Some Obvious Variables on Essay Test Scores."

Journal of Educational Measurement 5.4 (1968): 315-318.

Cliver, Robert. "Discussions on Testing." Personal interview. 10 May. 2012.

Connors,, Robert J., and Andrea A. Lunsford. "Frequency of Formal Errors in Current

College Writing." College Composition and Communication 39.4 (1988): 395-

409. Web.

Cull, William L. "Untangling the Benefits of Multiple Study Opportunities and Repeated

Testing for Cued Recall." Applied Cognitive Psychology 14.3 (2000): 215-35.

Web.

Dempster, Frank N. "The Situation with Respect to the Spacing of Repetitions and

Memory." Journal of Verbal Learning and Verbal Behavior 9.5 (1970): 596-606.

Web.

Page 120: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

115

Dempster, Frank N. "The Spacing Effect: A Case Study in the Failure to Apply the

Results of Psychological Research." American Psychologist 43.8 (1988): 627.

Web.

Dempster, Frank N. "Spacing Effects and Their Implications for Theory and Practice."

Educational Psychology Review 1.4 (1989): 309-30. Web.

Donald, Morris C., John D. Bransford, and Jeffery J. Franks. "Levels of Processing

Versus Transfer Appropriate Processing." Journal of Verbal Learning and Verbal

Behavior 16.5 (1977): 519-33. Web.

Elbow, Peter. Writing Without Teachers. New York: Oxford UP, 1973. Print.

Fernanda, Santos. "Teacher Survey Shows Morale Is at a Low Point." The New York

Times, 7 Mar. 2012. Web.

Finkel, Donald L. Teaching with Your Mouth Shut. Portsmouth, NH: Boynton/Cook,

2000. Print.

Fisher, Ronald P., and Fergus I. Craik. "Interaction between encoding and retrieval

operations in cued recall." Journal of Experimental Psychology: Human Learning

and Memory 3.6 (1977): 701.

Fleming, Gerald J., and Meredith Pike-Baky. Rain, Steam, and Speed: Building Fluency

in Adolescent Writers. San Francisco: Jossey-Bass, 2005. Print.

Freire, Paulo. Pedagogy of the Oppressed. New York: Herder & Herder, 1971. Print.

Page 121: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

116

Glover, John A. "The 'Testing' Phenomenon: Not Gone but Nearly Forgotten." Journal of

Educational Psychology 81.3 (1989): 392. Web.

Graff, Gerald, Cathy Birkenstein, and Russel Durst. They Say / I Say: The Moves That

Matter in Academic Writing : With Readings. 2nd ed. New York: W.W. Norton,

2012. Print.

Halliday, Michael. "Towards a Language Based Theory of Learning." Linguistics and

Education 5.2 (1993): 93-116. Web.

Hintzman, Douglas L. "Judgments of Frequency and Recognition Memory in a Multiple-

Trace Memory Model." Psychological Review 95.4 (1988): 528. Web.

Jacobs, L. C., and C. I. Chase. Developing and Using Tests Effectively: A Guide for

Faculty. San Francisco, CA: Jossey-Bass, 1992. Print.

Jacoby, Larry L. "On interpreting the effects of repetition: Solving a problem versus

remembering a solution." Journal of verbal learning and verbal behavior 17.6

(1978): 649-667.

James, H. W. "The Effect of Handwriting upon Grading." The English Journal 16.3

(1927): 180-185.

Kang, Sean HK, Kathleen B. McDermott, and Henry L. Roediger III. "Test format and

corrective feedback modify the effect of testing on long-term retention."

European Journal of Cognitive Psychology 19.4-5 (2007): 528-558.

Page 122: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

117

Kastenbaum, Steve. "The High Stakes of Standardized Tests." Schools of Thought. CNN,

26 Mar. 2012. Web.

Keyes, Ralph. The Writer's Book of Hope: Getting from Frustration to Publication. N.p.:

Macmillan, 2003. Print.

Klein, Stephen B. Learning: Principles and Applications. N.p.: Sage Publications, 2011.

Print.

Knight, J. K., and W. B. Wood. "Teaching More by Lecturing Less." Cell Biology

Education 4.4 (2005): 298-310. Web.

Kolers, Paul A., and Henry L. Roediger, III. "Procedures of Mind." Journal of Verbal

Learning and Verbal Behavior 23.4 (1984): 425-49. Web.

Landauer, Thomas K., and Lynn Eldridge. "Effect of Tests without Feedback and

Presentation-Test Interval in Paired-Associate Learning." Journal of Experimental

Psychology 75.3 (1967): 290. Web.

Lee, Carol D., and Peter Smagorinsky. Vygotskian Perspectives on Literacy Research:

Constructing Meaning through Collaborative Inquiry. Cambridge: Cambridge

UP, 2000. Print.

Lewis, Corey. "Discussions on Testing." Personal interview. 4 Apr. 2012.

Marshall, Jon C., and Jerry M. Powers. "Writing Neatness, Composition Errors, and

Essay Grades." Journal of Educational Measurement 6.2 (1969): 97-101.

Page 123: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

118

Marsh, Elizabeth J. "The Memorial Consequences of Multiple-Choice Testing."

Psychonomic Bulletin & Review 14.2 (2007): 194-99. Web.

McDaniel, Mark A., Henry L. Roediger, and Kathleen B. McDermott. "Generalizing

Test-Enhanced Learning from the Laboratory to the Classroom." Psychonomic

Bulletin & Review 14.2 (2007): 200-06. Web.

McDaniel, Mark A. "Testing the Testing Effect in the Classroom." European Journal of

Cognitive Psychology 19.4 (2007): 494-513. Web.

Melton, Arthur W. "The Situation with Respect to the Spacing of Repetitions and

Memory."" Journal of Verbal Learning and Verbal Behavior 9.5 (1970): 596-606.

Web.

Mentzer, Thomas L. "Response biases in multiple-choice test item files." Educational

and Psychological Measurement 42.2 (1982): 437-448.

Morris, C. Donald, John D. Bransford, and Jeffery J. Franks. "Levels of processing

versus transfer appropriate processing." Journal of verbal learning and verbal

behavior 16.5 (1977): 519-533.

Moscovitch, Morris, and Fergus Craik. "Depth of processing, retrieval cues, and

uniqueness of encoding as factors in recall." Journal of Verbal Learning and

Verbal Behavior 15.4 (1976): 447-458.

Page 124: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

119

Pellegrino, Pellegrino, Naomi Chudowsky, and Robert Glaser, eds. Knowing What

Students Know: The Science and Design of Educational Assessment. N.p.:

National Academies, 2001. Print.

Perry, Andre. "Education Reform Starts With Community Reform." Drandreperry.com.

N.p., 25 Feb. 2012. Web.

Phillips, Cecilia. "The Basics: Ion Channels Underlie Neuron Communication." Whirling

Whips: News and Stories about Neurotoxins. N.p., 12 Mar. 2014. Web.

Robinson, Ken. "Changing Education Paradigms." Youtube.com. RSA Animate, The

Royal Society of Arts, London, 2010. Web.

Roediger, Henry L., David A. Gallo, and Lisa Geraci. "Processing approaches to

cognition: The impetus from the levels-of-processing framework." Memory 10.5-

6 (2002): 319-332.

Roediger, Henry L., and Jeffrey D. Karpicke. "The Power of Testing Memory: Basic

Research and Implications for Educational Practice." Perspectives on

Psychological Science 1.3 (2006): 181-210. Web.

Roediger, Henry L., and Jeffrey D. Karpicke. "Test-Enhanced Learning: Taking Memory

Tests Improves Long-Term Retention." Psychological Science 17.3 (2006): 249-

55. Web.

Roediger, Henry L. "Implicit Memory: Retention Without Remembering." American

Psychologist 45.9 (1990): 10-43. Web.

Page 125: THE TESTING EFFECT: APPLICATIONS IN COMPOSITION …

120

Runquist, Willard N. "Some effects of remembering on forgetting." Memory & Cognition

11.6 (1983): 641-650.

Ruch, Theodore C. "Factors Influencing the Relative Economy of Massed and

Distributed Practice in Learning." Psychological Review 35.1 (1928): 19. Web.

Sapolsky, Robert M. Biology and Human Behavior: The Neurological Origins of

Individuality. Chantilly, VA: Teaching, 2005. Print.

Shepherd, Everett M. "The Effect of the Quality of Penmanship on Grades." The Journal

of Educational Research (1929): 102-105.

Tate, Gary, Amy Rupiper, and Kurt Schick. A Guide to Composition Pedagogies. New

York: Oxford UP, 2001. Print.

Vygotsky, Semenovitch L. Mind in Society: The Development of Higher Psychological

Processes. Trans. Michael Cole and Vera John-Steiner. Cambridge, MA: Harvard

UP, 1978. Print.

William, Cull L., John J. Shaughnessy, and Eugene B. Zechmeister. "Expanding

Understanding of the Expanding-Pattern-of-Retrieval Mnemonic: Toward

Confidence in Applicability." Journal of Experimental Psychology: Applied 2.4

(1996): 365. Web.

Winston, Janet. "Discussions on Testing." Personal interview. Feb. 2012.