the testing effect: applications in composition …
TRANSCRIPT
THE TESTING EFFECT: APPLICATIONS IN COMPOSITION PEDAGOGY
By
Adam Danger Channel
A Project Presented to
The Faculty of Humboldt State University
In Partial Fulfillment of the Requirements for the Degree
Master of Arts in English: Teaching Writing
Committee Membership
Dr. Corey Lewis, Committee Chair
Dr. Suzanne Scott, Committee Member
Dr. Nikola Hobbel, Graduate Coordinator
May 2014
ABSTRACT
THE TESTING EFFECT: APPLICATIONS IN COMPOSITION PEDAGOGY
Adam Channel
This project advocates for the use of frequently-administered, low-stakes tests to
enhance student learning of the disciplinary content of composition. Though there is
widespread disdain for the role of standardized tests in education today, not all forms of
testing are the same, and some forms of testing can be very effective teaching tools. Tests
should ideally be locally generated with relevance to class content, frequently
administered, and low-stakes, with feedback provided shortly after testing. This project
lays the groundwork for how testing can dovetail into the student-centered dialogic
classroom, a common practice in composition today. Theories of human learning
(including the testing effect and the spacing effect) show that active retrieval is the best
way to ensure long-term retention and understanding. Frequent tests provide active-
retrieval opportunities for students, which should enhance learning and retention. The
project concludes with how to build tests specifically for the instruction of first-year
college composition.
ii
TABLE OF CONTENTS
ABSTRACT ........................................................................................................................ ii
TABLE OF CONTENTS ................................................................................................... iii
TABLE OF FIGURES ........................................................................................................ v
CHAPTER 1: INTRODUCING TEST-ENHANCED LEARNING .................................. 1
Retrieval-Based Learning ................................................................................................ 1
Composition History and the Move Away from Testing ................................................ 5
The Banking Model of Education ................................................................................... 8
Critiques of Standardized Tests and High-Stakes Assessment ..................................... 11
CHAPTER 2: THE STUDENT-CENTERED COMPOSITION CLASSROOM ............ 15
Moving towards a Dialogic Classroom ......................................................................... 15
The Writing Process in a Dialogic Classroom .............................................................. 23
Building Writing Fluency with Journal Writing ........................................................... 27
The Dialogic Classroom and the Academic Discourse Community ............................. 29
CHAPTER 3: HUMAN LEARNING ............................................................................... 35
The Physiology of Learning and Memory .................................................................... 35
Cognitive Definitions of Learning and Memory ........................................................... 41
Three Theories of Learning ........................................................................................... 47
Social Constructivist Theories of Learning and Testing ............................................... 53
CHAPTER 4: THE TESTING EFFECT AND THE SPACING EFFECT ...................... 61 iii
What is the Testing Effect? ........................................................................................... 61
Studies Reporting a Testing Effect................................................................................ 64
The Spacing Effect ........................................................................................................ 80
CHAPTER 5: TEST-ENHANCED LEARNING IN COMPOSITION ........................... 89
Indirect Benefits of Testing ........................................................................................... 89
Testing and Grades ........................................................................................................ 96
The Benefits of Tests According to Three HSU Professors.......................................... 98
Building, Administering, and Grading Tests .............................................................. 103
Recognition and Free-Response Questions ................................................................. 107
WORKS CITED ............................................................................................................. 113
iv
TABLE OF FIGURES
Figure 1: “Neuron” ........................................................................................................... 36
Figure 2: “Neuronal Communication” .............................................................................. 38
Figure 3: “Stage Theory of Memory” ............................................................................... 42
Figure 4: “Performance on Immediate and Delayed Tests” ............................................. 66
Figure 5: “Testing Schedule Shows a Forgetting Curve” ................................................. 68
Figure 6: “Study-Test-Study-Test (STST) Most Effective Learning Strategy” ............... 70
Figure 7: “Proportion Correct in Immediate and Delayed Recall” ................................... 73
Figure 8: “Word Recall on Immediate and Delayed Tests” ............................................. 74
Figure 9: “Student Performance Averaged across Unit Exams” ...................................... 77
Figure 10: “Hypothetical Forgetting Curve 1” ................................................................. 82
Figure 11: “Hypothetical Forgetting Curve 2” ................................................................. 86
Figure 12: “Hypothetical Forgetting Curve 3” ................................................................. 87
v
1
CHAPTER 1: INTRODUCING TEST-ENHANCED LEARNING
Retrieval-Based Learning
This project argues that student learning will be enhanced if teachers frequently
administer low-stakes tests. This chapter will first define what a “test” is. It will proceed
to describe a brief history of testing in composition, and finally it will rationalize the
current lack of testing seen in composition practice today. Primarily, this chapter argues
the lack of testing is due to three widely held misconceptions: (1) tests promote the
“banking model of education,” (2) tests do not encourage critical thinking, and (3) all
forms of tests are subject to the same deficiencies as standardized tests.
What we know about the cognition of learning has considerably advanced in
recent decades. Today we have a research-based theory of learning that is grounded in
physiological and empirical data gathered from brain-imaging and cognitive studies.
Despite this, there remains a schism between the laboratory and the field, a gap between
theory and practice (see . Recent advances in cognitive science and studies on memory
and learning do not seem to have had a significant impact on composition pedagogy. This
thesis argues that two theories of learning in particular are of the utmost importance but
remain widely unknown: the testing effect and the spacing effect. Put simply, these
theories hold that long-term retention is improved through repeated testing over time.
2
Composition practice today relies on a varied set of skills and knowledge; students’
learning and retention of this course content can be enhanced through the introduction of
frequently-administered low-stakes tests.
It is important to define what is meant by the word “test” in this project,
especially because for many people the term carries negative connotations. “Test” or
“testing” makes many people think of summative high-stakes assessment or top-down
administered standardized tests. Both of these types of testing have serious deficiencies
that will be elaborated on later in this chapter. Testing, however, can come in many
different forms. For the purposes of understanding how tests operate cognitively, in this
project the terms “tests” or “testing” are defined as “an induced act of retrieval”—any
sort of material or question which necessitates a “retrieval” action on the part of the
reader. Retrieval is the process of accessing information stored in the memory and
articulating it, generally in response to an inquiry. You are asked a question, and you
provide a written answer: that is the meaning of a test in this project.
In order for us to understand how taking a test can enhance learning, it is
important to distinguish between two types of learning: passive and active. Cognitive
research would describe reading a textbook or reviewing class notes as “passive
learning,” because reading is an input-only activity (Roediger and Karpicke 181; Knight
and Wood 298). Learning through a lecture is similarly defined as passive learning for
the same reason. However, if one was to convert their notes and annotations into flash
3
cards that they could test themselves on (with the possibility of failure to recall), that
would be considered “active learning” because it would require an act of retrieval on the
part of the learner. As another example, when we read the statement, “Black holes are the
remnants of the gravitational collapse of a star,” our brain connects that to existing
knowledge structures and schemata. We can read that same statement numerous times
and recognize it each time; this type of learning is passive because there is no demand
made on the brain to reproduce the information. On the other hand, active learning is
demonstrated when that same statement is turned into a question by occluding keywords,
like “_________ are the remnants of the gravitational collapse of a star” or “Black holes
are the remnants of _______.” In this case the brain must fill in the blank with the correct
information, necessitating an act of retrieval.
This act of retrieval will be successful if students remember the relevant
information (like the black hole example), but, as we all know, memory is not perfect.
There is a clear difference in difficulty between the two types of learning. Passive
learning only requires recognition and comprehension; active learning, on the other hand,
requires retrieval with an increased possibility of failure. The act of retrieval required to
answer the fill-in-the-blank questions (also called “cloze deletion”) will produce better
long-term retention than passive reading of the same statement. Veteran scholar on the
subject of learning, professor H. L. Roediger of Washington University in St. Louis,
describes it this way: “We are much more likely to remember something again if we
4
actively retrieve it than if we are passively exposed to it in restudying” (“Advice from
Cognitive Psychologist…”). This result has been found in many studies and is referred to
in the literature as “the testing effect” (these claims will be referenced and further
substantiated in Chapters 3 and 4).
Like all fields, composition has a set of specialized terms. One reason why
retrieval practice is important is that when students reach into memory and recall a term
or phrase from the class lectures or course readings, then it is no longer just the term that
was on the board or in the reading; it has moved toward becoming their own term. Once
students have internalized that term or phrase, they will be able to reproduce it in their
own writing and cognition. In composition, these terms help us communicate more
specifically about writing (comma splices, topic sentences, thesis, dependent clause, etc.).
Chapters 3 and 4 show that our ability to remember terms like these generally depends on
the number of times we have retrieved them. Peer review is an integral part of modern
composition practice, and internalizing these specialized terms will help students improve
their peers’ writing, in addition to their own. Since it is necessary—or at least
beneficial—to know these terms when discussing writing, a pedagogical method that is
effective in helping students remember them would be of great benefit in composition.
This project argues that teachers can best facilitate students’ successful learning and long-
term retention of class content in composition by giving retrieval opportunities to
students through frequently administered low-stakes tests.
5
Composition History and the Move Away from Testing
For most of human history, story tellers in oral cultures would verbally repeat the
extensive myths and epics of their people. This level of memorization required great skill
and tenacity, and could only be reinforced through continual testing and retrieval. In
classical times, Memoria, or memorization, was one of the five canons of rhetoric
described by Aristotle. In his essay on memory, he wrote, “Exercise in repeatedly
recalling a thing strengthens the memory” (202). Scholars have known the power of
testing, and written about it for centuries. Francis Bacon wrote in The New Organanon,
published in 1620, "If you read a piece of text through twenty times, you will not learn it
by heart so easily as if you read it ten times while attempting to recite from time to time
and consulting the text when your memory fails" (143). In The Principles of Psychology
(1890), William James also argued for the power of testing through active recitation,
writing:
A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. I mean that in learning (by heart, for example), when we almost know the piece, it pays better to wait and recollect by an effort from within, than to look at the book again. If we recover the words in the former way, we shall probably know them the next time; if in the latter way, we shall very likely need the book once more (646).
These famous authors and others have written about the power of retrieval. Retrieval
6
practice has long been an acknowledged part of learning. The understanding that retrieval
is an essential part of learning should be a guiding principle in our teaching practice.
This project urges for retrieval practice through the use of frequent tests; however,
the role of testing in education is currently a fiercely debated subject. The passage of the
2001 No Child Left Behind act and the use of high-stakes standardized testing have
pushed many people, including teachers, to reject testing. This rejection seems especially
apparent among teachers in composition programs, many of whom follow Expressivist
practices. With the Expressivist movement of the 60s and 70s, there came a rejection of
testing and drilling in composition practice. Christopher Burham, in his chapter in A
Guide to Composition Pedagogies describes the divergence of two composition
pedagogies: Expressivism and Current-Traditional Rhetoric (CTR). He describes
Expressivism as “The movement [that] originated […] as a set of values and practices
opposing current-traditional rhetoric” (Tate 21). Current-Traditional Rhetoric (CTR), a
school of composition instruction developed in the early nineteenth century, relied
heavily on a prescriptive notion that there is a syntactically and stylistically correct and
incorrect way to write. The goal of CTR was to teach students how to conform to those
standards through drilling and regular testing. The Expressivist movement in the 60-70s
took a sharp turn away from testing and focused on “writers writing” rather than “writer’s
writing”—or process rather than product.
Burnham describes two figureheads in the Expressivist movement, Donald
7
Murray and Ken Macrorie, as opponents of rules and directive feedback. He writes of
Murray's A Writer Teaches Writing (1968) that “Murray's use of non-directive feedback
from both teacher and students turns the responsibility for writing back to the student”
(23). Expressivism is characterized by a focus on language as a tool for personal rather
than social expression. The Expressivists believed that just getting students to write and
write and write and to do so uninhibited by top-down rules was the best teaching practice.
The movement recognized that good writing needs more than good mechanics and
syntax, so the focus of instruction shifted away from the then-prevalent practices of
drilling and testing on rules of grammar and style. Composition instruction today owes a
great deal to Expressivism, and many of our current practices stem directly from the
Expressivist school of thought. However, with the current lack of testing in composition,
this project poses the question: "Did we throw the baby out with bathwater?"
Though some expressivists would disagree, this project argues that there is value
in teaching rules of writing. This disciplinary content is valuable because it allows us to
engage in meta-discourse about the process and products of writing. Teaching the
disciplinary content of composition through test-enhanced learning isn’t guaranteed to
improve students’ writing abilities, but I argue that there should be positive transfer
between the two activities. For example, if students can identify and correct a run-on
sentence on a test, then they are more likely to be able to find run-on sentences and
prevent them in their own writing. As another example, composition classes teach that
8
writing is a process with different stages, and each of those stages has certain activities
that a writer can use. During pre-writing we can use free-writing, brainstorming
activities, mind mapping, and outlining to help our writing. We teach these activities, but
if students don’t have retrieval opportunities (i.e. if they do not practice outside class or
on tests) it is unlikely they will remember all of these activities. Teachers use retrieval
practice for multiplication tables, or learning a new language, but the idea that you can
use it for more complex ideas is not widely appreciated. For example, when students
learn words and verb conjugations for another language, Spanish for instance, they often
use flash cards with the English word on one side and the Spanish corollary on the other,
and teachers say, “Practice until you really know it. Practice until it’s completely
automatic.” In composition practice we want the same type of automaticity with writing
terms and practices. This project argues that this type of automaticity will be produced
with frequently-administered low-stakes tests.
The Banking Model of Education
In the 1970’s influential author Paolo Friere critiqued the common practice of
lecturing, arguing that contemporary academic systems teach “the banking model of
education.” In the banking model, the teacher is the exclusive authority who stands at the
front of the class while students sit in desks, all facing forward, posited as empty vessels
waiting to be filled by the teacher with knowledge. Friere critiques such authoritative
9
models of education, writing
Instead of communicating, the teacher issues communiques and makes deposits which the students patiently receive, memorize, and repeat. This is the "banking" concept of education, in which the scope of action allowed to students extends only as far as receiving, filing, and storing the deposits (58).
Friere argues that this model does not foster critical thinking skills in students and
conditions students to unquestioningly trust teachers and classroom content.
Unfortunately, many consider teaching through tests to be a variation on the banking
model, arguing that it leads to convergence on a single answer given by an authority
figure, or they argue that teaching through tests does not lead to the development of
critical thinking. The banking model of education positions students as the passive
recipients of education, and is therefore not structured in a manner that encourages the
development of critical thinking skills. This history of valid critiques of the banking
model of education has unfortunately led many to dismiss all forms of testing as falling
within the banking model. This blanket dismissal of all forms of testing is problematic
because it discards forms of testing that can improve critical thinking and enhance
student authority.
Our educational systems today, and the role of tests as a teaching tool, are
influenced by history. Gordon Wells of the University of Toronto, writing in the
anthology Vygotskian Perspectives of Literary Research, explains that universal public
education through mandatory attendance at school is a “historically and culturally
10
localized activity system that owes more to models of industrial mass production than to
that of development through assisted participation in social activity” (Lee 59). In his
widely recognized 2010 TED talk, “Changing Education Paradigms” (with over 13
million views to date on youtube and TED.com), Ken Robinson draws a parallel between
the school and the factory-line—both have hyper-specialization, ringing bells, and
separate facilities. In this model, testing is seen primarily as a means of assessment. Like
a quality control stamp in a factory, tests are commonly used as a means to categorize
and rank students by their performance. The factory and the banking models of
education, which many students today have experienced, position students as the
inheritors of knowledge, rather than the co-producers of understanding.
However, teaching through tests is not necessarily a banking model practice. If
done correctly, test-enhanced learning can encourage creativity and facilitate the student-
directed classroom that modern composition instruction relies on. Teaching with tests
may seem to be opposed to the development of critical thinking skills; perhaps this in part
explains the current lack of testing in composition. Indeed, tests should not be considered
the end-all solution, and instead should be seen as a supplement to a wide variety of other
instructional methods. Frequent tests will enhance learning, which will increase student
authority. When used alongside a variety of other teaching practices, this should enhance
the development of critical thinking.
11
Critiques of Standardized Tests and High-Stakes Assessment
In the abundant critiques of testing today, it appears that many are conflating the
issues seen with standardized tests to all forms of testing. There are many different forms
and uses of tests, however, which can eliminate the myriad disadvantages of top-down
standardized tests. The type of testing this project advocates avoids these problems
because every test is designed by the local teacher specifically for his or her students and
curriculum. In addition to being locally generated, tests should also be frequently
administered. This section will consider some of the most significant deficiencies of
standardized tests and the changes that standardized testing have brought to the education
system. Such a consideration is necessary because it will show us by example what not to
do, and help rationalize the current lack of testing seen in composition.
Federally administered standardized tests have been a compulsory part of K-12
education since the passage of the federal Elementary and Secondary Education Act of
1965. The 2001 revision, No Child Left Behind, increased the importance of
standardized-test results in determining allocation of federal funding for schools, and by
extension, the importance of receiving federal funding has driven schools to pressure
teachers to emphasize these standardized tests in their classrooms. According to Dianne
Ravitch, previous U.S. Assistant Secretary of Education, “So now we have schools being
closed and people getting bonuses all around the student test scores. It’s made testing,
somehow, the central activity of American public schools today, which is just so wrong”
12
(Kastenbaum “The high stakes of standardized testing”). Ravitch is a former political
figure and frequently seen pundit on the subject of testing and education in the media
today. She is representative of the many people who feel that standardized tests are an
encroachment on the rights of local teachers.
Focusing instruction around top-down imposed standardized tests limits teachers’
choices for class materials and methods of instruction and encourages “teaching to the
test.” Because schools are held accountable for their students’ performance on
standardized tests, if the students perform poorly, then schools blame their teachers.
Andre Perry writes in an article titled “Education Reform Starts with Community
Reform” a succinct critique of standardized tests that is worth quoting in full:
We currently use standardized tests well beyond what they were designed to do, which is measure a few areas of academic achievement. Achievement tests were not designed for the purposes of promoting or grading students, evaluating teachers or evaluating schools. In fact, connecting these social functions to achievement test data corrupts what the tests are measuring. In statistics this is called Campbell's Law. In other words, what does a score measure after it has been connected to a teacher's pay or job status? In education talk, this is called teaching to the test, hiring to the test, and getting paid to the test (“Education Reform”).
Many of us are critical of standardized tests because not only do top-down imposed tests
infringe on local school boards’ ability to design curriculum, they also modify the very
nature of public schooling.
In addition, according to a 2012 New York Times article, the increased emphasis
on standardized test scores has played a large contributing role in the 20-year-low in
13
teacher morale:
The slump in the economy, coupled with the acrimonious discourse over how much weight test results and seniority should be given in determining a teacher’s worth, have conspired to bring morale among the nation’s teachers to its lowest point in more than 20 years, according to a survey of teachers, parents and students released on Wednesday (Santos "Teacher Survey Shows Morale Is at a Low Point.”).
The article argues that this low morale is due to unprecedented economic hardship for
teachers. Tied in with this is the heated debate on the role of testing in education and how
students’ test results reflect on their teachers’ performance. Given these facts, low teacher
morale does not seem so surprising. It is perhaps, then, also not so surprising that there is
a widespread disdain for testing among the general populace and teachers today.
This chapter has introduced the wider context of testing and its contentious role in
education today. Though many teachers see testing as another example of the banking
model, there is no intrinsic deficiency in the practice of testing. By analyzing the role of
standardized tests in education today, we can see that tests are frequently misused and too
often they play too large a role in assessment. Though tests can be a useful form of
assessment, their greater purpose is as a means of retrieval practice. Learning through
retrieval has been an accepted practice since ancient history. Rather than using a few
high-stakes exams as an exclusive means of assessment, the best practice is frequently-
administering low-stakes tests to provide numerous retrieval opportunities for students.
This will encourage frequent studying rather than cramming on the part of students. This
14
process of frequent studying and frequent testing will reinforce and enhance student
learning of course content.
15
CHAPTER 2: THE STUDENT-CENTERED COMPOSITION CLASSROOM
Moving towards a Dialogic Classroom
“What kind of learning do we want?” This is a question that every teacher should
thoughtfully consider. To be more specific, this chapter is guided by the question “What
kind of learning do we want in a first-year college composition course?” A review of the
practices and ideologies of modern composition will help us understand how testing can
dovetail into that practice. Many composition courses use a student-centered model of
instruction where the classroom is a community in which students collaboratively work
and learn. Though classes in other disciplines are frequently lecture based, in composition
the dialogic class generates peer engagement, which helps to establish a spirit of
collaborative inquiry, which should foster creativity. Writing abilities are developed
gradually through successive drafts of major essay assignments with peer review between
drafts, and through in-class writing exercises. The focus of student-centered instruction
includes the use of peer work and collaborative learning. It is important to realize that, as
John-Steiner and Meehan, writing in the anthology Vygotskian Perspectives, conclude,
“Social interaction and mutual support lead to creativity in a multi-directional dynamic
exchange” (Lee 40). This creativity is fostered in the composition classroom, which uses
collaborative learning methods such as peer workshops, group and partner work, and
16
group discussion.
Collaborative engagement is already supported in scholarly literature as an
effective learning method. For example, via the studies of Dr. M. L. J. Abercrombie,
Kenneth Bruffee, writing in “Collaboration and the Conversation of Mankind,” shows the
benefits of collaborative learning. Abercrombie's Anatomy of Judgment (1964)
synthesized ten years of observation and research on medical interns studying diagnosis.
Typically, the interns worked as individuals, but Abercrombie asked the interns to all
examine the patient together, discuss the symptoms as a group, and arrive at a consensus
on which all could agree. After observing the interns diagnose both individually and as a
group, Abercrombie concluded that the collaborative model resulted in a more accurate
diagnosis, and faster acquisition of diagnostic skills on part of the interns. Bruffee argues
that this same model of collaboration can also extend to the writing classroom, there too
bringing about better learning and, in this case, better writing.
In addition, Bruffee argues that writing is a conversation, not only with those
around us, but also with ourselves. In our daily lives, "we internalize conversation as
thought; and then by writing, we re-immerse conversation in its external, social medium"
(88). Many forms of thinking are acts of conversation in which our thoughts are revealed
either with the written or spoken word. The conversation in our minds is enhanced by
input from conversations outside and collaboration; following this maxim, composition
practice today favors group work and discussion over lecture as a means of instruction.
17
One of the great advantages of these forms of collaborative learning is that they are an
active form of learning as opposed to passive forms of learning. As noted earlier, active
forms of learning, such as when students take a test or participate in a discussion are
more effective than passive forms. Using frequent tests in composition, then, can be seen
to accord with the current theory and practice in the field.
Discussion with the class at large, and in small groups, leads to better learning
than lecturing. This is the position espoused by Donald Finkel in his book, Teaching with
your Mouth Shut; he quotes a summary of research by the National Council on
Education, the NCE, as follows:
Research clearly favors discussion over the lecture as an instructional method when the variables studied are retention of information after a course is over, transfer of knowledge to novel situations, development of skill in thinking or problem solving, or achievement in affective outcomes, such as motivation for additional learning [...]—in other words, the kinds of learning we most care about (qtd. in Finkel 3).
The NCE’s research provides a persuasive argument in favor of a dialogic classroom—a
classroom in which dialogue is the primary means of instruction rather than lecture. In
the dialogic composition classroom, readings are discussed rather than lectured on, and
peer-review serves as a means of engaging in conversations about writing strategies.
Gordon Wells, of the University of Toronto, in his chapter in the anthology Vygotskian
Perspectives, describes a dialogic class as one where “the classroom is seen as a
collaborative community. The community works towards shared goals of achievement
18
and its success is dependent on the group rather than individuals” (Lee 65). In this
collaborative community, Wells argues that “the teacher should be involved as a co-
inquirer with the students” rather than an exclusive authority in the class. The goal in
such a class is to create what Wells calls “an ethos of collaborative inquiry” (ibid).
The notion of decentralizing power challenges long-entrenched beliefs about
teaching; our cultural expectation of a teacher is somewhat akin to a performer: a brilliant
lecturer who captivates an audience of students. However, modern composition
recognizes that this cultural construct is limiting. The authority of the professor, in large
part, is derived from their extensive study and knowledge. Decentralizing authority in the
classroom requires that students have some understanding of course material. The
dialogic class provides more active recall opportunities for students than a lecture-based
class, which is great step in the right direction, but the teacher can even further enhance
student learning by providing a structure for recall of the key terms used in class through
frequently administering low-stakes tests. Frequent tests engage active learning strategies
with students, and since students need to have an understanding of disciplinary terms in
order to participate effectively in these discussions, testing can be used to enhance
student understanding, which will in turn increase student authority.
Creating a successful dialogic classroom requires the teacher to take certain steps
to facilitate collaboration. For example, the first day of class is extremely important—this
is when first impressions are formed, so it is important to make these impressions
19
positive. An excellent way to immediately foster the spirit of collaborative inquiry is by
setting ground rules for discussion as a group. The instructor can solicit feedback from
the group, write suggestions on the board, and then, through a discussion of the purpose
and objectives of the class, the group as a whole can come to an agreement of what
ground rules are most important. Some instructors even take the process a step further
and involve students in syllabus formation. I know one composition instructor who
includes her students in the decision making process for how heavily weighted
assignments are and what reading material the class should cover. A student’s
involvement in the formulation of the rules will undoubtedly also provide motivation and
incentive to abide by these rules. In these dialogic classrooms then, students are not only
active learners, but also active participants in creating classroom rules. For example, one
method to improve the diversity of tests, increase student engagement with tests, and
encourage students to thoughtfully consider the test questions, would be having students
devise their own test questions as homework. Instructors can have each student write four
or five multiple choice questions as a homework assignment. Writing items will help
students learn the material, and get involved in the test-making process. The instructor
could select among this set of questions and, if appropriate, present them to the class as a
type of student-generated exam. In that way students would remain exposed to a variety
of authorities and the class would move closer towards student-centered practice while
also providing frequent retrieval opportunities of course content.
20
In his book Teaching with your Mouth Shut, Donald Finkel shares some of his
teaching practices that create a dialogic classroom. Though he teaches literature, not
composition, there are many qualities of his dialogic class that instructors can directly
apply in their composition courses. Finkel’s literature class contains twenty-five students
and a teacher who meet around a single round table. There is no clear locus of attention in
the class—no lectern or podium—and the teacher sits at a different seat every day. A
typical day begins with a student soliciting the other students’ discussion questions that
they have brought to class and writing them on a whiteboard. The class then decides in
what order they would like to discuss the different questions. Once the decision is
reached, the facilitating student sits down and the discussion begins. The benefit of
having a student solicit questions rather than the teacher is that it helps direct focus away
from the teacher and on to the questions the class will be discussing—like frequent tests;
these strategies require students to be active, not passive learners.
Finkel argues that there are four additional advantages of this process of
beginning the class by soliciting questions. The first is that it prevents the first student
willing to talk from taking the reins for the rest of the class. This is a crucial step towards
creating a space for individuals with different communicative styles—from shy students
who only chime in after lingering pauses to boisterous students who do not tolerate the
slightest conversational gap. The second advantage is that it allows students to hear a
number of questions about texts before discussing any one of them. Some students’
21
questions will invariably be well-formed and others will not, which leads to the third
advantage: it emphasizes the importance of bringing thoughtful questions to class.
Hearing the multiple interpretations of the reading brought in by different students is a
good practice to develop divergent thinking—the ability to see multiple solutions to a
problem. Hearing a single interpretation, on the other hand, given to the students by the
authority of the teacher, may encourage the development of convergent thinking—
convergence on a single possible answer. Additionally, when students know that their
discussion questions will be evaluated by their peers, it provides motivation to write well-
thought and well-formed questions to bring to class. The fourth advantage Finkel
describes is that the ritual of writing questions on the board helps transition students from
the hustle and bustle of wherever they were before class into an atmosphere of thoughtful
inquiry.
With all of this focus on the students, one may ask what the teacher’s role is in
this dialogic class. If the class functions seamlessly without direction from the teacher,
then is there no need for one? Finkel answers this question thusly: “’Teaching with your
mouth shut’ does not entail teacher passivity; it requires different kinds of activities from
teachers” (17). Finkel further argues that “Good teaching is the creating of those
circumstances that lead to significant learning in others” (8). To create those
circumstances during class discussion, the teacher should serve more like a facilitator
than a director. A facilitator coaxes students towards learning and has an idea of how that
22
will happen, but is not overly attached to a single course of action so long as learning is
taking place. A facilitator is open to “teachable moments” in class, and is willing to cede
authority when it is appropriate. A director, on the other hand, is at the top of a clear
hierarchy and seeks to impose his or her own particular order. Some of the ways a
facilitator might benefit class include, if class discussion begins to die down or becomes
unfocused, the teacher can recommend that they discuss a different topic, or if an
insightful comment is ignored, then the teacher can function as a spotlight to redirect
attention towards it. Additionally the teacher is involved in the conversation and can ask
his or her own well-formed questions, which can serve as a model for students. Finally, at
the end of the day, the instructor can summarize the results of the discussion and
emphasize some key points that were made.
The dialogic classroom de-centers focus from the instructor and creates a student-
centered class. Since group interaction is a focus in composition, students frequently
interact with their peers without the teacher being directly involved. For these
interactions to be successful, students must have some competency with the disciplinary
content of composition. As teachers, we want to prepare our students for peer review and
group discussion by teaching them the terms that allow them to critically evaluate and
discuss writing. The dialogic classroom puts students “at the wheel,” so to speak, which
requires disciplinary fluency from students. To facilitate the most productive
conversations in composition, teachers want to ensure that students have a solid
23
understanding of course content, and are able to reproduce it in their own words. Students
may initially be exposed to terms of critical analysis through a mini-lecture, or through
readings, or class exercises, and ideally they will continue to use those terms during
group discussion and peer review. However, in addition to these teaching techniques, the
teacher can provide frequently-administered, low-stakes tests to help ensure students’
long-term retention of course content. As can be seen, then, the practice of teaching
through frequent low-stakes tests accords with current theory and practice in
composition. It is an active learning strategy that will enhance student knowledge and
authority and support the student-centered classroom.
The Writing Process in a Dialogic Classroom
Lad Tobin writing in A Guide to Composition Pedagogies describes two modern
approaches to writing instruction and defines them as “process” and “post-process”
writing pedagogy. Process-based pedagogy urges instructors to devote much class time to
peer review of student works-in-progress. The writing process is also explicitly taught,
and the methods that successful writers use for invention, focus, and organization are
taught as models for students to follow. Much time is devoted to in-class writing
exercises to help students practice free-formed writing on demand. The process-oriented
class is guided by an emphasis on pre-writing strategies, and revision after feedback.
24
Many instructors teach the process by requiring multiple drafts and conducting peer
review between successive drafts, which models the process as recursive. Tobin
distinguishes the post-process model of instruction, which differs in some key aspects.
Post-process teachers typically assign more reading and devote class time to group
discussion rather than peer review. Readings are analyzed in class to identify the effective
characteristics of each piece, and class time is also devoted to the teaching of rhetorical
conventions and genre analysis in a post-process model. Regardless of whether one
focuses on process or post-process pedagogies, we can, in addition to testing students on
grammatical rules and writing vocabulary, also use tests to ensure students understand the
different stages of the writing process and the kinds of rhetorical moves writers use in
different genres.
In addition to being dialogic, the modern composition classroom tends to include
a large amount of active writing time, and most composition instructors teach the writing
process as a recursive process. Whereas the linear model of (1) Outline, (2) Draft, (3)
Revise, (4) Proofread, (5) Publish, has historically been the prevalent model taught in
schools, it is today accepted that this linear process does not reflect the reality of writing
for most people. The linear model encourages a “once-and-done” method of writing.
Rather than this linear model, composition instructors model the writing process as
recursive by requiring multiple drafts of a single paper. Requiring multiple drafts ensures
25
that students actually engage in the writing process rather than wait for “inspiration,” or
the night before the deadline.
Requiring student papers to be submitted in multiple drafts encourages
incremental improvement over time and helps students manage their time and prevent
procrastination. Ralph Keyes, author of The Writer’s Book of Hope (2003), advises us to
strive towards consistency in our writing practice: “Serious writers write, inspired or not.
Over time they discover that routine is a better friend to them than inspiration” (49). I
argue that this concept can be extended to testing—just as writers benefit from schedule
and routine, so too will students’ learning be enhanced through frequent tests. Over the
course of a semester, as students learn more, and work with feedback on their writing,
their writing will gradually improve. The essay becomes a living thing that grows over
time. The linear model of writing leads to the common reality of students writing their
essays during all-night sessions the evening before they are due. If an instructor does not
require further submission, then students take that to mean that no further revision is
necessary.
Finkel's multi-centered or dialogic form of discussion is a natural outgrowth of
the peer-review workshop famously known as the Iowa Writer's workshop. The
workshop is a common instructional approach in composition, and, like the dialogic
classroom, it relies on the teacher as a facilitator and on student-generated knowledge.
Classrooms structured around a community of equivalent peers benefit students' learning
26
because they allow for students to have conversations about writing. Like other
specializations, composition has specialized terms and knowledge, and facility with these
terms is critical if workshops are to be effective. By testing on these terms and the writing
process itself we can ensure that our students get the most benefit from our workshops
that is possible.
This external conversation also benefits our internal conversations and enhances
our cognition. Mastery of jargon allows meta-cognition, “the process of reflecting on and
directing one’s own thinking” (National Research Council 78), a key skill which
enhances learning. The language which students practice during peer review also allows
them to engage in meta-cognition about their own writing. Peer review between
successive drafts helps students improve their writing gradually. The practice of peer
review acknowledges that fellow students can be sources of knowledge in a classroom as
well as teachers. Students are only in composition classes for a semester or two and
assumedly we want them to grow past needing the teacher—that will only happen if
students are allowed to retrieve and reproduce class material by expressing it in their own
words. Conducting peer-review and maintaining a student-centered class should help
produce better writers and encourage creative and critical thinking. Frequently
administered tests, given alongside writing assignments over the course of the semester,
will supplement the peer-review process by enhancing student learning and retention of
new terms and writing techniques, making students more capable at peer-review.
27
Building Writing Fluency with Journal Writing
Gerald Fleming and Meredith Pike-Baky together authored Rain, Steam, and
Speed, a book devoted to improving writing skills through journal writing. In the book
they define fluency as “the ease with which one communicates in each of the language
skills” (14). For an orator, fluency is the ability to deliver a good speech, whereas for a
reader, fluency is the ability to read steadily and understand what is read. Writing fluency
is described in the book as “practiced, prolific writing [that] keeps language and
perceptions flowing past the fidgets, self-distractions, and bogeys that the mind
occasionally throws out when it doesn’t care to work” (14). Practicing sustained writing
is an important step to improving written communication skills; it is something like
endurance training with words. The training metaphor helps us envision writing as a
developmental process. In the same way that an athlete does strength training to build
muscles, it seems obvious that practice with sustained, focused writing is a necessary step
to improve writing fluency. In addition to journals, students can be directed to do in-class
writing through short-answer or essay questions on frequently administered tests. In this
way, the same training metaphor applies to tests, which shows that testing dovetails with
our current practice.
In-class writing through prompts and journal writing is a valuable method of
instruction—this is in line with Expressivist theory. Expressivist pedagogy maintains that
all students have interesting things to say and only need the teacher to “get out of the
28
way” so to speak, so they can write with their own authentic voice. For the expressivist,
top-down rules serve as blocks to progress—an Expressivist might say that teachers tell
students that they cannot write with every stroke of the red pen, and then teachers
complain "students don't want to write. How can we motivate them?” Lad Tobin, writing
in A Guide to Composition Pedagogies argues,
Children want to write. They want to write the first day they attend school. This is no accident. Before they went to school they marked up walls, pavements, newspapers with crayons, chalk, pens or pencils . . . anything that makes a mark. The child’s mark says, "I am." "No, you aren't," say most school approaches to the teaching of writing (Tate 19).
Expressivist pedagogy is about rekindling this passion seen in children and getting
students excited about writing.
Expressivists maintain that teaching writing is not about teaching new strategies
or rules, but facilitating the unbound and unrestricted production of copious amounts of
text. Peter Elbow, a front-figure of the Expressivist movement writes in his book Writing
Without Teachers,
I try for two things: (1) to help you actually generate words better--more freely, lucidly, and powerfully: not make judgments about words but generate them better; (2) to help you improve your ability to make your own judgment about which parts of your own writing to keep and which parts to throw away (vii-viii).
Tobin summarizes Elbow’s suggestions for how to accomplish these goals as follows:
“Elbow suggests that writers free-write (write non-stop without worrying about
correctness, form, logic, etc.); play with words and ideas; form writing groups; and rely
29
less on doubting and more on believing, less on criticism, more on imagination” (Tate 3).
This “believing game,” as Elbow calls it, begins first with students believing in
themselves, and with teachers encouraging that belief.
Journal writing and timed writing exercises are common practices in modern
composition. The seminar-based class is a great place for students to develop critical
thinking skills and come up with innovative ideas to write about. In-class free-writes and
journal writing, in which students simply practice writing without any assessment, make
students comfortable with the process of writing. Frequent repetitive writing as in journal
writing can also be done in class with directed writing prompts and short or long answer
questions on quizzes. In a similar manner, frequently used tests can actually be seen as a
way to formalize and or supplement some of our current practice in composition such as
the use of journals or in-class writing.
The Dialogic Classroom and the Academic Discourse Community
The student-centered dialogic classroom requires students have enough
knowledge to work together without the single authority of a professor. In order to
collaborate, students need a common vocabulary and basic understanding of course
content. This project argues that teaching through frequently-administered low-stakes
tests can help ensure fluency with these concepts through active retrieval. In a similar
manner, I contend that composition instructors need to teach the conventions of Standard
30
American English (SAE) and can use tests to do this.
The explicit instruction of SAE has been the subject of much debate in
composition. Expressivist theorists argue that we should have students focus solely on
writing without constraints and rules. Though some expressivist pedagogues argue that
we should focus exclusively on the generation of content, I argue that we do students a
disservice by neglecting to teach the rules and conventions of academia and SAE. In our
own composition program at Humboldt State University, students need to submit a
writing portfolio, which is assessed to determine if students pass the course or not.
Adherence to style guidelines like MLA and the conventions of SAE are a necessary
component for passing the portfolio. Additionally, students will no doubt be expected to
write fluently in SAE in the rest of the academic and the business world outside the
composition course. Part of the goal of a first-year composition course therefore should
be to integrate students into this aspect of the larger academic community as well.
Teachers should view fluency with SAE as an ingredient of good writing, but not the only
ingredient, though a valuable one nonetheless.
For many students, mastery of these academic arts can be challenging. The
authors of They Say, I Say, Gerald Graff and Cathy Birkenstein, consider academic
writing as a process that can be learned in stages. The book serves to, as the title page
describes, "demystify the moves that matter in academic writing." Each chapter discusses
different rhetorical strategies that authors can use and offers templates that utilize those
31
strategies. Writing templates are a great way to interpret your own internally persuasive
dialogue—developed through practice with journal writing or free-writes—into academic
language. The templates in this book help students learn how to make logical transitions
and make their sentences work together in cohesion to convey a larger idea. Many first-
year college composition classes now use sentence templates as models. Students’
development as writers is assisted by these templates, which show them how to turn their
own language into academic writing. They Say, I Say is only one popular example, but
there are many composition text books containing sentence templates that could easily be
adapted to fit in tests.
Connors and Lunsford (1988) published a study of the historical frequency of
errors made in freshmen papers from colleges around the United States. They compared
these results with a review of the most common errors from 1917 to 1988. The study
found that the frequency of errors has remained consistent historically at an average of
2.2 errors per 100 words. The study reports that the most common contemporary errors
(at publication) include spelling, missing inflections, apostrophes and commas, and
misused homophones. Though we can use this list of frequent errors, and the many others
like it, as guidelines, good grammar instruction is effective only when it is individualized
and adapted to context. For example, it is standard practice that if a teacher sees that a
large portion of the class has difficulties with comma splices, then he or she runs a mini-
lesson on sentence boundary rules. After the lecture, students then workshop to search for
32
errors related to that rule during peer review. Later the teacher could administer tests with
questions on sentence boundaries and comma splices and with incorrect examples that
need correction. This additional use of tests could enhance student learning and dovetail
easily with current teaching practices.
Like spoken language, writing requires continuous adjustment to the customs,
constraints and expectations of different discourse communities. The language that is
effective in the classroom is not the same as the language that is used in the courtroom,
the home, or the sports bar, and will not be effective in those contexts. As speakers, we
intuitively switch between these various registers and modify our speech to meet the
needs of the situation. In writing, these conventions and formal expectations are
described as genres. Critical analysis of different genres requires a specialized language
(e.g. audience, purpose, use of logos, ethos, pathos, etc.). Internalization of that language
is a key step for adequate performance within a given genre.
Michael Carter and colleagues, in a white paper published by North Carolina
State University, argue that students learn best from genre models when instruction
includes explicit analysis of the features of a genre. They write the “students may learn
these genres through repeated exposure and trial and error, but explicit instruction can
help them negotiate a variety of genres much more quickly and effectively” (6). They
conclude that “there is little evidence to suggest that students will notice relevant features
and apply them to their own writing situations without such intervention” (9). To help
33
successful transfer of skills between different genres, composition courses focus on
analyzing the conventions of these diverse discourses and replicating them in written
assignments. However, as Carter and his colleagues note, this may not be enough. Each
genre is made up of multiple components, some of which include audience and purpose.
Fluency in one genre does not necessarily translate to fluency in another genre. Therefore
it is important to teach students that there are different expectations for writing and
thinking across different genres and disciplinary and professional cultures. To aid a
student’s transfer of skills from one writing context to another, many composition
instructors have adopted a genre-based approach to writing instruction. If we want to
prepare students as best we can to meet the diverse writing needs of the university in the
short course of a college semester, then we should focus instruction on genres that tend to
cross disciplinary boundaries—such as the research paper, the summary of assigned
reading material, the professional letter, or the persuasive essay. Time spent on
instruction of these types of writing will be well spent because these skills can easily be
adapted for transfer across disciplines, and, of course, the conventions of these genres can
be explicitly taught through lecture and student retention can be reinforced through
frequent tests.
Some degree of fluency with SAE is a basic expectation for published works,
along with the ability to conform to research style guidelines (MLA, APA, etc.) and the
conventions of different genres of academic writing (a formal research paper, an
34
annotated bibliography, a business letter, etc.). A student-centered classroom can help
foster creative exchange and frequent writing will help build familiarity with the writing
process. However, these ingredients are not enough. To become capable of effective
peer-review and to meet the needs to the university, students also have to internalize a
great deal of disciplinary knowledge. The research in the following chapters will show
that our ability to learn and retain the disciplinary knowledge described relies in large
part on our active retrieval and usage of these terms. Learning and retention of the arts of
logic, rhetoric, and grammar, and an understanding of the various stages of the writing
process can all be facilitated through the administration of frequent, low-stakes tests, and,
as I’ve argued in this chapter, the use of such tests accords well with current composition
theory and practice and can easily be adopted in today’s composition classroom.
35
CHAPTER 3: HUMAN LEARNING
The Physiology of Learning and Memory
Any conversation about effective teaching must include a consideration of how
students learn. This chapter provides an interdisciplinary overview of human learning
with an up-to-date description of how we understand learning to occur from a
neurological perspective, as well as from cognitive, and social-constructivist approaches.
Knowing how the brain captures, retains, and retrieves information will help teachers to
design assignments and activities that are instructionally effective. The focus of this
chapter will be on those aspects of human learning that help us understand why retrieval
practice (which tests facilitate) is essential for learning. Neurology studies the physical
structure of the brain and nervous system, and cognitive neuroscience studies our thought
processes. The brain is the center of learning and memory, so it seems obvious that
educators, who are primarily concerned with learning and memory, would want to stay
abreast of whatever discoveries have been made in this burgeoning field. Of course
wanting to and having time to keep up with the deluge of new research that is constantly
being generated are two very different things. The sheer volume of research available can
be overwhelming, so this chapter will give an overview of some of this new research in
an accessible format.
36
Neuronal communication is the basis for learning and memory. The brain is made
up of billions of cells called neurons. Neurons can be split into two distinct parts: the cell
body, and the axon. In Figure 1 On the far left is the cell body, which has multiple
dendrites protruding from it. The axon is connected to the cell body at the axon hillock
and ends at the axon terminals that web out into a network of up to 10,000 other neurons.
The axon terminals connect to other neurons’ dendrites. Axon terminals send out
neurotransmitters and the dendrites listen for them, so the dendrites could be considered
the ears of the neuron and the axon terminals the mouth.
Figure 1: “Neuron”
Source: (Sapolsky 10)
In order to communicate, neurons expend a great deal of energy redistributing
ions to maintain what is called a “resting potential” and an “action potential.” To
37
communicate, neurons concentrate on contrasts in electrical activity between these states.
The differences between these two states allow neurons to communicate. This is similar
to binary logic—like a light switch that is either on or off, a neuron is either
communicating or not communicating. When a neuron has something to say, so to speak,
it goes into action potential, sending out an electrical signal to the axon terminals, which
is then relayed to the neural network.
There are two methods that neurons use to trigger an action potential: temporal
summation and spatial summation ("Synaptic Transmission in the Central Nervous
System"). Temporal summation occurs when the same input is triggered over and over.
Spatial summation occurs when numerous dendrites are stimulated at once. Either
method will produce enough concentration of power in the axon hillock, triggering an
action potential. When action potentials are received by the dendrites, channels open and
ions begin to move, causing a change in the electrical state of the neuron. When an action
potential reaches the axon terminals, the neuron sends out a flood of chemical messengers
called neurotransmitters. These neurotransmitters are picked up by the dendrites of the
surrounding neurons and form what is called a synapse—a connection between the axon
of the previous neuron and the dendrite of the receiving neuron. So long as a neuron is in
action potential, it will continue to send out neurotransmitters and form synapses. If the
original event that sparked the first action potential is strong enough, then a chain
reaction is sustained.
38
Figure 2: “Neuronal Communication”
Source: ("The Basics: Ion Channels Underlie Neuron Communication")
The neurons which sustain these chain reactions are referred to as a neural
network. A single neuron can connect to 10,000 neurons through the axon terminals, and
can receive transmissions from up to 10,000 neurons through the dendrites in the cell
body. Thus, neural networks are capable of an enormous degree of complexity. Neural
39
networks could be thought of as the “screenshots” of a particular moment of cognition or
perception (Sapolsky 10). Patterns of neural activity are thought to correspond to
particular mental states or mental representations. Under this model, learning can broadly
be defined as being comprised of changes in connectivity, either via changes in
potentiation at the synapse or via the strengthening or pruning of connections in a neural
network. Neural networks are constantly assembled and disassembled in our brains as we
learn and forget.
Neural networks can be made long-lasting through repeated stimulation. Every
time a neuron has an action potential, it causes a physical change in the neuron, making it
more excitable in a given network, meaning that less of an excitation is required to induce
later action potential. Repeated stimulation results in a neuron becoming hypersensitive,
reducing the amount of excitation needed to cause an action potential, meaning that a
weaker stimulation in the future will activate the associated neural network. When less
excitation is required that means that it is easier to recall something. The state of the
synapse becoming hyper-responsive or potentiated for long periods is called Long-Term
Potentiation (LTP). Increasing the strength of synaptic communication through LTP is
the physiological basis for learning and memory as we understand it today (Sapolsky 15).
Unless they undergo the physical process of LTP, neural networks will gradually
dissipate. However, networks that do undergo LTP decay very little over time, and, with
sufficient cues, can be retrieved many years later (“Remembering and Forgetting”).
40
Forgetting occurs in long-term memory when the formerly strengthened synaptic
connections among the neurons in a neural network become weakened, or when the
activation of a new network is superimposed over an older one, thus causing interference
in the older memory (ibid).
Earlier I described two ways to initiate action potential: spatial and temporal
summation. You’ll remember that action potential is a necessary step towards LTP,
which is the process that makes memories stick. Considering these two forms of
summation from a teacher’s perspective, spatial summation may be caused by the
student’s interest, motivation, and comprehension. We can influence these variables to
some degree, but they are always dependent on the student. However, activating temporal
summation, on the other hand, is entirely within a teacher’s control—by administering
tests. If tests are given frequently to students so that temporal summation is regularly
stimulated through repetition, then LTP can be achieved. This framework from
neuroscience helps us understand that test-enhanced learning can physically change the
structure of the neural networks associated with course material, making it more likely
that students will permanently retain course information. Test-enhanced learning can be
seen as a way of providing retrieval opportunities for students in the classroom. Therefore
frequently-administered low-stakes tests which are spaced out over time can be a
beneficial teaching practice that will help students retain course information.
41
Cognitive Definitions of Learning and Memory
Writing in the textbook Learning and Memory: An Integrated Approach, John
Anderson explains that “Learning refers to the process of adaptation of behavior to
experience, and memory refers to the permanent records that underlie this adaptation"
(6). Learning is a process, and memory is a record. Memory is divided into three types:
sensory, short-term, and long-term. A dominant paradigm for describing these different
types of memory is called “stage theory” and comes to us from Atkinson and Shiffrin
(1968). The relationship between these three stages is shown in the flow chart below,
which begins with an external stimulus activating sensory memory. There it is either
forgotten, or it goes through initial processing and enters short-term memory. With
repetition it will stay in short term memory and with elaboration and coding it will enter
long-term memory—otherwise it will be forgotten. After a short duration that memory
will leave STM and can only be brought back from LTM with retrieval. The image also
shows that our response to a situation can only be aided by information in our short-term
memory—it cannot jump immediately from LTM into current use.
42
Figure 3: “Stage Theory of Memory”
Source: (“Three Stages of Memory”)
The following analogy will help illustrate these three types of memory. Imagine
three types of writing: one is drawn directly on the surface of water, the next is written on
wet sand at the beach, and the third is chiseled into the stone of a mountain. When we
draw directly on water, the surface will immediately change and erase whatever we drew.
The message will disappear in a moment. This is similar to sensory memory. We take in
vast amounts of information every moment through our senses, and a great deal of it we
don’t pay attention to after the moment passes—sensory memory is soon forgotten, so
these “recordings” have no retrieval strength and no storage strength. Short-term memory
is like drawing in the wet sand at the beach. The record is clear and accessible but it will
fade over time when the tide washes it away—it has high retrieval strength, but low
43
storage strength. Finally, long-term memories that have undergone significant encoding
and retrieval are deeply etched in our mind, like words chiseled in stone.
Bjork and Bjork (1992) distinguish two qualities of long-term memory which
together determine the likelihood of successful retrieval (e.g. answering a test question or
remembering relevant information when cued): (1) retrieval strength and (2) storage
strength. Retrieval strength is the accessibility of a memory at a given moment. Storage
strength is how deeply a memory is embedded in the mind. Imagine you just learned a
stranger’s name. If you clearly heard it and repeated it, then at that moment it is very
fresh in your mind, so it has high retrieval strength. However, if you do not meet that
person again or have a chance to retrieve their name again, then it is likely that you will
not remember their name because the storage strength is weak. On the other hand,
consider the name of a close relative who died long ago. As time passes, it is likely you
will not think about them as frequently, and so their name will have low retrieval
strength. However, their name is deeply embedded and you are unlikely to forget it even
though you have not used it recently because it has high storage strength.
As described earlier with the physiological description of memory, the storage
strength of memories is improved through repeated retrieval actions, which bring about
LTP. The retrieval strength of memories is contingent on the retrieval cues that bring that
knowledge to the forefront (such as being asked the question on a test, or a sensory
experience that triggers remembering). A third quality we have not yet considered is the
44
encoding strength of memories, which is best understood through schema theory.
The term “schema” was first used by Piaget in 1926. R. C. Anderson, a respected
educational psychologist, expanded the meaning and developed schema theory. This
learning theory views organized knowledge as an elaborate network of abstract mental
structures which represent one’s understanding of the world. Schemata are prior
knowledge linkages, and they influence the amount and proficiency of our learning.
Schemata can be added to, and, as an individual gains experience, schemata develop to
include more variables and more specificity. Each schema is embedded in other schemata
and itself contains subschema. Schemata change moment by moment as information is
received. They may also be reorganized when incoming data reveals a need to restructure
the concept. Schema theory shows us that the encoding strength of memories should be
increased when meaningful connections are made between various schemata, and when
information is retrieved in new situations and transferred to new circumstances.
How students organize knowledge influences how they learn and apply what they
know. As humans, when we are paying attention, we naturally make connections between
new knowledge and existing schema in our minds. When those connections form
knowledge structures that are accurately and meaningfully organized, we are better able
to retrieve and apply that knowledge effectively and efficiently. In contrast, when
knowledge is connected in inaccurate or random ways, we can fail to retrieve or apply it
appropriately. In this way, our prior knowledge can help or hinder learning (Ambrose, et
45
al. 4). Teachers work to help students create meaningful connections and connect new
knowledge with prior learning; teachers also strive to ensure that students create accurate
and efficient schema, interconnecting disciplinary content and writing practice. Teachers
can enhance these practices and can help structure student learning through tests.
Writing in How Learning Works: Seven Research-Based Principles for Smart
Teaching, Ambrose et al. define learning as the process that leads to change resulting
from experience and increased potential performance for future learning (3). Learning is
the result of how students interpret and respond to their experiences, and therefore
learning can bring about changes in knowledge, behaviors, beliefs, and attitudes. Herbert
Simon, Nobel Laureate and one of the founders of the field of cognitive science, argues
that “Learning results from what the student does and thinks and only from what the
student does and thinks. The teacher can advance learning only by influencing what the
student does to learn” (qtd in Ambrose et al. 3). Learning is a process, not a product, and
because it happens within each student, instructors can only infer that learning has taken
place from students’ products or performance—learning is not something that an
instructor can do to students, but rather it is a process that students themselves do.
David Ausubel (1968) coined the term “meaningful learning.” In Ausubel’s view,
to learn meaningfully, students must relate new knowledge (concepts and propositions) to
what they already know. Under this model new knowledge must be internalized in
relation to what is already understood. Undesirable learning, on the other hand, consists
46
of repetition of an item without full understanding of its meaning or how it connects to
other knowledge. According to Ausubel, when meaningful learning occurs, disparate
facts are understood in relation to each other and therefore recollection of any single fact
will prime the mind for recollection of the related facts. This is similar to schema theory
described earlier. In practical terms in the classroom, meaningful learning occurs when
learners construct their knowledge in their own words. It requires that teachers give
students the opportunity to engage in personally meaningful written and verbal
expression.
Simple recollection without real comprehension, also referred to as “parroting,” is
an example of undesirable learning. For example, with enough cramming any person
could memorize the questions and answers to a test administered in a language he or she
does not understand. This type of mimicry does not indicate legitimate understanding and
internalization, because if the questions were re-phrased, or given in a different order, or
the parameters of the test changed in any other way, then regardless of the amount of
cramming one has done, he or she wouldn’t be able to complete the test. This is because
that person never actually understood the questions or answers—rather he or she had just
learned to provide a particular response to a specific stimulus. True learning, on the other
hand, is demonstrated by the usage of learned material in new and meaningful contexts.
This model of meaningful learning is helpful for our purposes because it provides
a framework for how instructors should model their teaching. It is easy to think of tests as
47
simple things that encourage rote memorization out of context; however, with proper
care, teachers can make tests that are meaningful and relevant to course content and
present them in ways to students that encourage meaningful learning. Concerns that
testing merely requires parroting can be addressed by designing tests that require more
depth than just recognition, like short answer or essay responses. Test-enhanced learning
will ensure students have a grasp of disciplinary content; this will set the foundation for
meaningful learning as students will be able to competently use course content during
group engagement, and during the writing process.
Three Theories of Learning
Three theories of learning will further help us understand the learning process and
how testing can be a beneficial practice: the encoding-specificity principle, transfer-
appropriate processing, and desirable difficulty. The encoding-specificity principle holds
that a retrieval cue (i.e. an external stimulus that induces retrieval) will be effective if it
overlaps with features in the original memory trace. The encoding-specificity principle
theorizes that the best memory performance is generally found when the processes
engaged and cues given at retrieval are similar to those engaged in during encoding (see
Fisher & Craik 1977 and Moscovitch & Craik 1976). The following familiar story
illustrates the principle: you may be unable to remember the name of a neighbor’s dog
48
until the moment when you are watching a television program about show-dogs, which
reminds you that your neighbor’s dog is a show-dog named Tess. In other words, we
cannot always predict what type of external stimulus may spark retrieval in the future. As
another example, it may be more difficult to remember a classmate or a co-worker’s face
if you see them in an unfamiliar context like walking on the street or at a park. However,
once you meet this person a few times in different settings, it is more likely you will
remember them. Where and how information is encoded affects our ability to retrieve it.
The implication of the encoding-specificity principle is that encoding variability
(i.e. encoding under numerous circumstances) should produce better retention because it
increases the number of potential retrieval routes, thereby increasing the probability of a
match with whatever cue is presented at retrieval. According to the theory, encoding
variability of any sort should also increase the probability of successful retrieval. As an
example, I know a professor who brings citrus to class on test days for the aroma to help
put students at ease. The aroma should also increase encoding variability for the students
and hypothetically create a connection between the smell of citrus and the content of that
class. It could be years later for one of her students that the smell of citrus triggers
memories of that exam and the information on it. For the purposes of analogy, let us
consider the mind as a labyrinth, with successful retrieval representing successfully
navigating the labyrinth from beginning to end. Every memory trace enters and leaves the
mind through a vast network, like a labyrinth. Imagine that as it does so it leaves a mark
49
behind on the walls, and the mark gets clearer every time the same route is followed;
eventually a clear path is laid through the labyrinth. This “mark” was described
physiologically as long-term potentiation (LTP). To follow the analogy, you could be
dropped at any point in the vast labyrinth, and by establishing multiple routes that all
converge on the same memory, or connected schema, the likelihood of successful
retrieval will be increased. This is in line with the practice of frequently administering
low-stakes tests. Repeated tests which use different questions and test on different parts
of the question stem should promote encoding variability, because every time students
take a test there will be differences in mood, activated schema, and a variety of other
external factors. This variety of encoding circumstances, when compounded by the
variation in tests questions, will produce multiple retrieval routes, which should improve
memory performance.
Transfer-appropriate processing theorizes that successful retrieval is dependent on
the overlap between the cognition engaged in during encoding, and the cognition engaged
in during retrieval (see Kolers & Roediger 1984; Morris, Bransford, & Franks 1977;
Roediger 1990; Roediger et al. 2002). In terms of students learning on tests, Roediger and
Karpicke (2006) argue that retrieval practice through active testing is more effective than
passive review because the cognition engaged in during testing more closely matches the
necessary cognition for later retrieval than passive learning does. For example, if
someone wanted to learn how to swim, they could conceivably do so without ever getting
50
in the water by suspending the body with ropes and practicing the arm strokes and foot
motions. This practice and the act of swimming are transfer-appropriate to some degree;
however, the training would be more effective if it took place while actually swimming.
In the same way, listening to a lecture on writing effective transitions may prepare some
students to apply that material to their own writing, but the best way to train for long-
term retention, and personal reproduction of course material is through active
reproduction on tests. Performance on a test requires similar cognition as remembering
that same information in a different context such as in conversation or while writing.
Because the two actions are transfer-appropriate, testing can be seen as a more effective
training method for later retrieval than learning through lecture.
In the case of composition instruction, the material that students learn through the
course will be more readily understood and internalized if there is a strongly embedded
schema of related terms and concepts. Frequently-administered tests provide retrieval
opportunities which increase storage strength, and also improve encoding strength. When
students retrieve information as the course progresses, it allows them to relate old
information to the new schemata that they are developing through internalization of the
new course content.
One theory of learning holds that more challenging retrieval produces greater
benefits for long-term retention; Bjork (1992) refers to this principle as “desirable
difficulty” (see Bjork 1999; Karpicke & Roediger 2007; McDaniel, et al. 2007; Roediger
51
& Karpicke 2006. This theory holds that when retrieval strength is high and information
is easily accessible, the retrieval of that information produces small gains in storage
strength. In contrast, more difficult retrieval actions, such as remembering in a different
environment, or with fewer cues, or after a long period of time, all produce greater
increments in storage strength. To follow the analogy used earlier with the three types of
writing—imagine that time passes in the mountains of our minds and the stones we
chiseled words into are covered by moss and leaves. The words are still there etched in
stone, but they are difficult to access—they have high storage strength but low retrieval
strength. Desirable difficulty theorizes that the greatest gains in storage strength will be
made when retrieval strength is low. By analogy, that means that every time you clear off
the moss and leaves and retrieve the words, they also become more deeply etched in the
stone beneath.
The theory of desirable difficulty compels teachers to form tests which require
active reproduction from students. The predictions of desirable difficulty are confirmed
by differing rates of long-term retention as a result of taking recognition or recall tests. In
a recognition test like multiple choice, the right answer is presented among others, which
provides a strong cue for recollection and greatly increases the likelihood of giving a
correct answer. Alternatively, short-answer questions require the taker to provide the
answer in their own words, which requires more effort. This means it typically takes
longer to answer a short answer question, and it means that the taker has to produce the
52
answer without having the cue for recollection. Although both tests benefit memory, the
more effortful recall produces better long-term retention. Many studies show that recall
tests promote better long-term retention than recognition tests (see Jacoby 1978; Butler &
Roediger 2007; Glover 1989; Kang, McDermott, & Roediger 2007; McDaniel, Anderson,
Derbish, & Morrisette 2007). Further support for desirable difficulty comes from
Agarwal et al. (2008) who studied student learning comparing open-book and closed-
book tests. Closed-book tests require more difficult, challenging processing than
restudying a passage, yet difficult processing benefits long-term retention according to
the theory of desirable difficulty. The study found that open-book tests increase retrieval
strength, as evidenced by high initial performance, but produce small increments in
storage strength. In contrast the more difficult closed-book tests produced greater long-
term retention. Conditions that require more difficult and challenging processing may
slow initial learning but ultimately enhance long-term retention relative to less-
challenging learning conditions that produce rapid initial learning but poor retention.
These three theories of learning help us understand the testing effect and converge
on the notion that we should administer frequent low-stakes tests. In regards to
composition instruction, we can achieve desirable difficulty through administering short-
answer questions and essay questions. Short-answer questions are perfect to ensure
students are retaining key course concepts that will be useful in peer collaboration, class
discussion, and the writing process.
53
Social Constructivist Theories of Learning and Testing
One way to make learning more meaningful and memorable is with the use of
social constructivist theories of learning. This section will explain learning from a socio-
cultural perspective and further emphasize the importance of collaborative conversation
in the classroom, and tests will be discussed as a tool for helping students join that
conversation and the academic discourse community. While behaviorists theorize
learning as a series of stimulus-response pairs, and cognitivists theorize behavior as a
complex formula resulting from each individual’s cognition, social constructivists expand
upon these models by focusing on interaction between groups rather than just focusing on
individual behavior. This theory dictates that language is at its core a social act, and in
order to communicate with various “discourse communities,” we adopt jargon and
communicative patterns that are appropriate to the community we are a part of (Lee 2). In
order to be a member of any discourse community—be it academic, professional, or
personal—the individual must learn the conventions and jargon of that particular
community; thus any instructional tool—like tests—that can help students join the
academic discourse community will be of benefit..
Social constructivism is distinguished by the belief that language and the mind are
inseparable, because any individual needs language in order to think and encode their
long-term memories. Language, in Vygotskian terms, is a psychological tool that humans
utilize uniquely among all other animals. Further corroborating the key position of
54
language to development, linguist Michael Halliday has argued that “language is the
essential condition of knowing, the process by which experience becomes knowledge”
(57). It is language, and other psychological tools (such as mathematical symbols, the
alphabet, and scientific diagrams), that allow humans to perform the unique activities that
we do—from building rockets to writing sonnets.
In his work The Dialogic Imagination (English translation 1986), Mikhail Bakhtin
describes a socio-cultural model in which the individual and society interact to influence
personal development. Every individual exists in a society with a history of complex
interaction between language and power. From our youth we are influenced by social
discourses, and by our parents and role models, for example, whose views we interact
with and selectively internalize. Through our interaction with society, Bakhtin maintains
that “not only are the meanings of words and expressions ‘borrowed’ from the speech of
others, but each utterance is a link in a very complexly organized chain of other
utterances” (337). Within this “chain of utterances,” every individual is involved in a
dynamic process of self-discovery and creation of identity in relation to their larger
discourse community. Bakhtin describes this social enculturation as “ideological
becoming,”—“the process of selectively assimilating the words of others” (341). This
process of ideological becoming involves redefining the words of others into our own
“internally persuasive discourse.” In Bakhtin’s words: “internally persuasive discourse,
as opposed to one that is externally authoritarian is…tightly interwoven with ‘one’s own
55
word’” (346). Bakhtin explains how these dialogic interrelations preexist and shape each
individual utterance: "The living utterance, having taken meaning and shape at a
particular historical moment in a socially specific environment, cannot fail to brush up
against thousands of living dialogic threads, woven by socio-ideological consciousness
around the given object of an utterance; it cannot fail to become an active participant in
social dialogue" (276). Through our interaction with the external social world, we
mediate outside influences through language into ideologies that are personally
meaningful.
One model that has been useful for describing collaborative learning is
Vygotsky’s Zone of Proximal Development, or ZPD. While studying children’s learning,
Vygotsky found that each child had an achievement potential that they could realize
unassisted, and one that was higher if they were aided by someone more knowledgeable
(Vygotsky 84). He calls the space in which this learning happens the Zone of Proximal
Development. Wells, writing in Vygotskian Perspectives, describes the ZPD as the “use
of language between novices and more expert others as a tool for mediating
misconceptions and consolidating understandings" (5). He argues that rather than only
viewing the ZPD as existing between a single expert and a less knowledgeable peer, we
should also consider the ZPD as a collaborative model between a group of peers.
According to Wells, effective learning is not unidirectional (as is assumed by a lecture
model), but rather understanding is both mutually constructed and reciprocal. By
56
grouping students with their peers, the ZPD is changed from a unidirectional exchange
into a multi-dimensional one, where every student has something to contribute, and
learning happens as a group. Through collaboration, concepts that were only vaguely
understood before can coalesce into coherent thoughts; individuals can work to an
agreement on core meaning, and develop their own informed opinions through critical
engagement with class material.
In composition and creative writing, if we follow the idea of ideological
becoming, then students, in part, form themselves and their intellectual development
through what they write, and through discussing their writing. This process of ideological
becoming through learning and writing is one of the very reasons that we value higher
education. As educators we seek to guide this process of subject formation and self-
expression by teaching the disciplinary content that we argue will enhance writing
abilities. Based on social constructivist theory, it stands to reason that our students will be
more capable peers (a term coined by Vygotsky) if they know the vocabulary and
conventions of the academic discourse community. Because composition teachers don’t
use lectures as a primary means of instruction, but instead use student-directed methods
and group work, it means that students need to have done the reading and internalized the
course material so they have the vocabulary and knowledge to engage as capable peers.
One way we can ensure students have this level of understanding is by testing for content.
This will lay the foundation for students to engage in group work without the teacher.
57
The practice of administering frequent, low-stakes tests as a method of teaching is
well in line with the theoretical stance of social constructivism. With the rise of social
constructivist theory as a guiding paradigm in composition, there should also come a
recognition that tests can be used to help students gain fluency in the academic discourse
community. Bringing students into the field of academic discourse requires that they
learn certain disciplinary knowledge. In order to participate effectively in this discourse
community, students must internalize genre-specific conventions, communication styles
such as Standard American English, and research guidelines such as MLA or APA.
Testing can help students to do this effectively. This is especially true for first-year
composition students who tend to be new to the discourse community of academia. The
conventions of academia will gradually be internalized by most students, but their
integration can be assisted via testing.
Some people view testing as a prescriptive method of instruction that stifles
creativity. They might argue argue that testing encourages convergent thinking—
imparting on the student that there is only one right answer which comes from an
authority. Similarly, many would question the purpose of memorizing information in our
digital age when answers are a mere Google search away. What is the significance of
remembering? Some would argue that we don’t need to memorize details; after all, that’s
what computers are for. They might argue that what really matters is how things fit
together. However, Robert Bjork, a prominent researcher in the field of memory asks “the
58
people who criticize memorization—how happy would they be to spell out every letter of
every word they read?” (qtd in Wolf’s “Want to Remember Everything You’ll Ever
Learn?”). It is an inescapable fact that to participate in new fields we must learn new
things. For example, children can only learn to read whole words through dedicated
practice. Every time we enter a new field we have to go through the same process—we
become children again. Every field has its own language and conventions that must be
internalized. The process of learning requires repetition and verbal encoding of new
concepts into one’s own words, and testing can help with this process.
But let’s return to the question of creativity. Creativity is the ability to view the
world through a variety of models (or paradigms) and since testing can increase the
amount of memorized information we have to work with, it can enhance rather than
detract from the number of ways we can apply what we know. Creativity is not the
opposite of memorization; it is the useful application of memorized information. The
human brain is a marvel of associative processing, but in order to make associations, data
must be loaded into memory. We need to internalize disciplinary information through
encoding and repetition in order to utilize it creatively. The goal of the composition
classroom should be to facilitate both learning and memorization of content, and the
application of that knowledge towards creative goals. However, my research leads me to
believe that currently there is a lack of emphasis on memorization in composition
instruction.
59
In both Bakhtinian and Vygotskian theory, language is at the heart of
development, and the social world is the arena where language is exercised and
developed. Between the individual and the group, new ground can be broken. When
engaged in collaboration, the exchange of multiple interlocutors improves understanding
and retention of class material through retrieval and repetition. Additionally, dynamic
collaborative dialogue can lead to the co-creation of new meaning—a creation that would
not have been possible individually. John-Steiner and Meehan put it this way:
“knowledge therefore is both re-constructed and co-constructed in the course of dialogic
interaction… [members in a dialogue] actively restructure their knowledge both with
each other and within themselves” (35). The separation of memorization and creativity
into distinct categories represents a false dichotomy. Creativity is the application of what
you know to new effect. Because tests enhance knowledge, they can also be used to
enhance creativity.
This chapter has given an overview of human learning from the theoretical
perspectives of brain physiology, cognitivism, and social constructivism. We have seen
that we can increase encoding strength of long-term memories through meaningful
learning and by connecting the things we learn with our pre-existing schemata. Also
important is the role of collaboration and conversation. Though some may argue that
creativity springs from an independent emergence in the mind of a genius, I have argued
that creativity is enhanced through internalization of disciplinary content, and emerges
60
from collaborative dialogue between groups of capable peers. Teachers can help ensure
that students have internalized the course material necessary to engage as a group of
capable peers through giving frequently-administered low-stakes tests. An exclusive
emphasis on testing cannot fulfill all of the needs of composition. I argue that tests are an
essential but insufficient method of instruction. Our current practice in composition
instruction, which includes a variety of group activities and collaborative discussions, is
also a necessary, but insufficient method of instruction. By combining the two methods
we can both ensure that students are adequately internalizing course content and
creatively using it to new effect.
61
CHAPTER 4: THE TESTING EFFECT AND THE SPACING EFFECT
What is the Testing Effect?
Although many people associate tests with the assessment or measuring of
knowledge rather than with learning, research shows that a test can serve a far greater
purpose than mere assessment. The studies discussed in this chapter show that tests can
also enhance learning and improve long-term retention, and they show that the act of
retrieving information from memory on tests increases the probability of successful
retrieval in the future. This phenomenon of enhanced learning as a result of testing has
come to be known as “the testing effect.” H. L. Roediger, a writes that
The testing effect represents a conundrum, a small version of the Heisenberg uncertainty principle in psychology: Just as measuring the position of an electron changes that position, so the act of retrieving information from memory changes the mnemonic representation underlying retrieval, and enhances later retention of the tested information (“The Power of Testing Memory” 182).
This is in line with the physiological description of memory through neural networks
given earlier. Every act of retrieval brings about a physical and structural change to the
network, which results in long-term potentiation (LTP), making future retrieval actions
easier.
The testing effect has considerable implications for composition instruction and
the field of education at large. As discussed in chapter 2, our current practice in
62
composition instruction uses collaborative, student-directed methods of learning.
However, in order for students to effectively collaborate, they need to have adequately
internalized the disciplinary content of composition, including the conventions of
Standard Academic English, research conventions like those put forth by the MLA, and a
wide variety of terms and definitions (“thesis”, “topic sentence”, “analysis”, etc.). What
the testing effect and the spacing effect show us is that, without retrieval opportunities
spaced throughout the semester, it is unlikely that students will retain this content.
Therefore, this project argues for the use of frequently-spaced low-stakes tests to help
students internalize the knowledge of composition—as argued in Chapter 2, this will
make students more capable of group work and gradually improve their writing.
A brief review of this phenomenon and the contexts in which it has been found is
given by Mark McDaniel and colleagues in the article “Testing the Testing Effect in the
Classroom”:
Testing effects are observed with word lists (Hogan & Kintsch, 1971; McDaniel & Masson, 1985), paired associate lists (Allen, Mahler, & Estes, 1969; Carrier & Pashler, 1992), pictures (Wheeler & Roediger, 1992), and prose material (Glover, 1989; Roediger & Karpicke, 2006b). Testing effects surface when the intervening tests are different from the final tests: intervening recall tests improve subsequent recognition (Glover, 1989; Lockhart, 1975; Wenger, Thompson & Bartling, 1980) and intervening recognition tests improve subsequent recall (Runquist, 1983). Taking a test is almost always a more potent learning device than additional study of the target material (see Carrier & Pashler, 1992, for recent experimental tests, and Roediger & Karpicke, 2006a, for a review). (495)
63
Recent studies have examined the testing effect in middle school (McDaniel 2007, 2011)
and college (Butler 2012) contexts. As we will see in this chapter, the testing effect is
consistently found in diverse studies. The literature reviewed in this section shows that
testing reduces forgetting, especially if administered shortly after learning, and multiple
tests produce a greater effect in slowing forgetting than a single test. The studies
reviewed in this chapter also show that taking a test has a greater positive effect on future
retention than spending an equivalent amount of time restudying the material.
Many of the studies reviewed below test learning and retention of paired
associates; these are A-B connections, and when presented with the cue of A or B, the
test taker would have to recall its associate. Paired-associates can represent diverse
information, such names to go with faces, or a phone number for a friend, or translations
of words from L1 to L2, or that 8x9 = 72. The difficulty of pair-associates is in part
dictated by how logically they associate. For example, “chair-table” is easier to remember
than “chair-donkey,” which is in turn easier than “VFU-734.” At its core, this form of
memorization, or paired-associate learning, is identical to the memorization of the key
vocabulary terms and definitions that might be required to successfully function and
collaborate in a composition class. Any method that increases long-term retention of
paired associates would be a beneficial instructional technique in a composition class.
While these studies are not measuring skill formation (i.e. development of writing ability
over the course of a semester), they are examining the ability to recall key memorized
64
information similar to the key concepts and terms taught in composition.
Studies Reporting a Testing Effect
The first large scale study of the testing effect was conducted by Arthur Gates and
published in 1917. This study compared the effectiveness of active recall (what they call
“reciting”) to passive review, and found that the active recall required by testing
improved retention of the concepts over study. Gates tested children in grades 1, 3, 4, 5,
6, and 8, using two types of materials, nonsense syllables and facts taken from prose
passages in the book Who’s Who in America. The nonsense syllables were simply three
letter groupings that do not form a word in English such as DAK, YRK, or CTR. The
children studied the materials in two phases, first reading to themselves, then looking
away from the materials and recalling (reciting) whatever answers they could, with
researchers recording the students’ performance during free-recall. Researchers instructed
students to read or recite for different amounts of time, and different groups of children,
separated by age level, spent 20, 40, 60, 80, or 90% of the time self-testing. At the end of
the period Gates administered a test to the children on the material they had learned, and
after a delay of 3 to 4 hours he retested them.
Presumably because of their early level of cognitive development, first graders,
children six to seven years old, were not able to perform very well in the study and they
65
were not tested on the prose passage because of poor reading abilities. Their performance
alongside the other students’ on nonsense syllables can be seen in Figure 4, which shows
the proportion of test items recalled on the X-axis and proportion of time reciting in the
Y-axis. The increase in performance can be more clearly seen on the delayed test rather
than the immediate. The top two graphs are for performance on immediate tests (left) on
nonsense syllables and biographical facts (right). The bottom two graphs are for
performance on delayed tests on the same subjects and positions as the top two. With the
prose passages, the optimal amount of recitation seemed to be about 60% of the total
learning period, with the rest spent re-reading. Researchers found that the effect leveled
off and test scores began to drop at higher rates of recitation to re-reading. The benefits of
recitation do not level out for nonsense syllables, because they are nonsensical and
reading would not help encode the information. This data suggests that a balance between
studying and testing is best.
66
Figure 4: “Performance on Immediate and Delayed Tests”
Source: (“The Power of Testing Memory” 184)
A second landmark study showing positive effects of testing was carried out by
H.F. Spitzer (1939). Spitzer's study demonstrated not only that testing improved
retention, but that a shorter delay between initial learning and testing is of greater benefit
than a longer interval between studying and testing. Spitzer and colleagues conducted a
large-scale experiment involving the entire population of sixth-grade students in 91
elementary schools in nine Iowa cities, for a total of 3,605 subjects. Students studied one
of two 600-word articles containing information about either peanuts or bamboo. The
67
students were then split into eight groups and given a 25-item multiple-choice test on the
material over the course of the next 63 days—each group tested with a different retention
interval.
Spitzer also manipulated the number of tests taken by different groups, and the
delay between studying and testing. After studying the passage, each of the eight groups
of subjects was given one, two, or three tests on various schedules across the next 63
days. Some students took a single test 63 days after initial exposure to the material, while
others took earlier tests. Group 1 and Group 2 took an immediate test. All other groups
took their first test after a delay of days or weeks. For example, Group 6 did not take an
initial test until day 21. Figure 5 shows the proportion correct on multiple-choice tests
taken at various delays. The solid lines show results for repeated tests for particular
groups. The dashed line (a visual aid connecting each group by the day of their initial
test) shows that the longer the first test was delayed, the worse was the students’
performance on that test. In all cases, giving a test at some point either slowed, or stopped
forgetting. Groups 4, 5, and 6 all showed an increase in proportion correct after their first
test. The students who took a test sooner after learning the material demonstrated much
greater recollection than the students in those groups who took their first test after a
longer interval. By day 21, forgetting had already reached its peak, and Groups 6, 7, and
8 all show similar performance to each other.
68
Figure 5: “Testing Schedule Shows a Forgetting Curve”
Source: “The Power of Testing Memory” 185
This figure shows that the longer the interval between initial exposure and the
first test, the worse the subjects’ performance was. The students who took a test shortly
after learning the material demonstrated much greater recollection than the students in
those groups who took their first test after a longer retention interval. Group 2, which had
the best recollection after a 63 day delay, was tested immediately after study, and then
tested again on the same material one week later. This study reveals the importance of
spacing tests out in order to improve retention of material. The landmark studies of Gates
and Spitzer together seems to suggest we administer a series of low-stakes tests
throughout the semester, thereby requiring students to actively recall key information a
number of times with increasingly spaced intervals between subsequent tests. This notion
69
of an expanded test schedule will be described in more detail in the next section, “The
Spacing Effect.”
While the Gates and Spitzer studies provide support for the testing and spacing
effect, they were performed with elementary school students, a different demographic
than we find in composition classes at the university level. Other studies, however, have
worked with undergraduates. Endel Tulving, for example, examined the recall ability of
undergraduates at the University of Toronto. Tulving (1967) had subjects, three groups of
18 students, learn a list of 36 nouns presented in a random order each study trial. The
purpose of this study was compare retention between groups with various studying and
testing schedules. If S stands for a study trial, and T stands for a test trial, the three
groups were compared in the following manner: Group 1 went through a process of
STST, Group 2 followed a process of STTT, and Group 3 did SSST. During a study
session, subjects looked at the word list and tried to memorize it, and for the test
condition subjects verbally free-recalled as many items as they could in any order, which
the experimenter recorded (see Figure 6).
70
Figure 6: “Study-Test-Study-Test (STST) Most Effective Learning Strategy”
Source: “The Power of Testing Memory” 185
Tulving showed that testing and studying can produce the same amount of
learning; however the subjects were tested with an immediate retention interval (tested
directly after studying). Later research shows that if long-term retention is measured after
a delay, repeated testing actually produces better recall than the repeated studying. For
example, Karpicke and Roediger (2006) replicated Tulving’s basic result that learning
curves for the three conditions are similar. However, unlike Tulving, they repeated the
test after a 1-week delay. Their results can be seen on the right side of Figure 6. In the
2006 study, subjects returned one week later, and were given 10 minutes to recall as
many words as they could. Their performance was recorded at the end of each minute.
71
The comparison between the three conditions reveals a positive benefit for the STST
learning condition when long-term retention is the studied variable. Despite the fact that
the subjects who studied repeatedly had studied the words 15 times a week earlier and
those who were tested repeatedly had only studied them 5 times, the recall was greater for
the STTT condition than the SSST group. These results show that, in this study at least, a
balanced mixture of studying and testing is the best method to ensure long-term retention.
Again, we find a study supporting the notion of incorporating frequent tests into the
semester. As instructors we can reproduce the STST condition for our students by having
a number of tests in the class spaced throughout the semester so students are engaged in
the process of repetitive studying and testing.
Roediger and Karpicke (2006b) are not the only researchers to show a clear
benefit for testing over studying as the retention interval increases. Thompson, Wenger,
and Bartling (1978) further confirm these findings. However, they demonstrate that
selectively studying only the material missed on a previous test is more efficient than
restudying all of the material in general. This study also used 40-word lists, but used
different learning conditions, a four-study trial (SSSS), a three-test trial (STTT), and a
condition which personalized the study schedule for each subject (STrTrTr). The study
also included a final test 5 minutes after the learning phase and again 2 days later.
Subjects in the third (STrTrTr) condition studied the word list once, recalled it, studied
only those words they failed to recall, and then recalled the entire list again, and so on for
72
three more study-test episodes, with the study lists becoming shorter as they performed
better on the tests. Though each study session was personalized, during each test the
subjects in this group recalled the entire list of items of each test trial, not just the items
they had restudied.
The results of Thompson et al. (1978) are printed below in Figure 7. For both the
five-minute and the two-day retention intervals, the group with selective restudying
performed best. At a five minute interval, the SSSS group scored 50%, but fared far
worse after a 48-hour delay, scoring only 22%. The STTT group scored less (28%) on the
initial test than both other groups. However, the STTT group also showed very little
forgetting and 48 hours later the group scored 22% on the test. With a retention interval
of 5 minutes, the STTT group had the poorest performance, but with a retention interval
of 2 days, the SSSS group had the poorest performance. The percentage forgetting was
calculated as follows: [(recall at 5 min – recall at 48 hours)/recall at 5 min] x 100.
Applying this formula to each group (see Figure 7) will show that the repeated study
condition resulted in much greater forgetting as time passed. In line with the other studies
in this chapter, these results show that massed-study helps immediate recall, but
performance declines as the retention interval increases. In addition this study shows that
selective restudying and repeated testing is the most effective combination for ensuring
long-term retention. Once information has been learned and successfully recalled on a
test, it is best for students to spend their time studying the material that they failed to
73
recall. These findings further support incorporating a number of low-stakes tests
throughout the semester in our composition classes. Additionally this study shows that
the most efficient means of learning seems to be to personalize your study based on those
answers you missed on the previous test, but to continue testing on all items every time.
As composition instructors, we can best help our students learn by administering frequent
tests and directing each student to personalize his or her study, and restudy what they
missed on the last test.
Figure 7: “Proportion Correct in Immediate and Delayed Recall”
Source: “The Power of Testing Memory” 187
The findings that massed study improves initial recall, but loses effectiveness as
the retention interval increases found by Thompson et al. were replicated by Wheeler,
Ewers, and Buonanno (2003). In their study, subjects studied a 40-word list in a repeated
study condition (SSSSS) or with one study session followed by four consecutive recall
tests (STTTT). Consistent with previous research, in a final free-recall test given 5
minutes later or 1 week later, the researchers found an advantage for massed study on an
immediate test, but the massed-study group performed poorly with a retention interval of
74
one week (see Figure 8). Comparatively the subjects in the study-only condition were re-
exposed to the material 5 times more than repeated-test condition subjects, who were
only re-exposed to those words that they were able to recall after only one study session
(about 11 out of 50 words in the experiment). Figure 8 shows the proportion of words
recalled on immediate (5-min) and delayed (7-day) tests after repeated studying or
repeated testing. The repeated test condition produced better retention than the repeated
study condition. What is most noteworthy about this is the comparative decrease in
forgetting between the two groups. Though the repeated study group performed better
initially, they had a much greater rate of forgetting, and after a 7-day delay, had forgotten
most of what they had learned. These results confirm the power of testing for long-term
retention.
Figure 8: “Word Recall on Immediate and Delayed Tests”
Source: “The Power of Testing Memory” 188
75
Though there has been a great deal of research on the testing effect in a laboratory
setting, in these studies testing intervals and the amount of time and conditions of testing
are carefully controlled or manipulated. Contrary to the lab, in a class there is great
variability between students’ retention intervals and across students’ study time and
effort. In the laboratory, long retention intervals are typically 1 or 2 days (e.g. Carrier &
Pashler 1992; Hogan & Kintsch 1971; Masson & McDaniel 1981; McDaniel & Masson
1985); in much research, the intervals are on the order of minutes or hours (e.g. Bartlett
1977). However, in a class, the delays between quizzes and exams are typically weeks or
months. According to the article “Testing the Testing Effect in the Classroom,” as of
2007, very few experiments had studied the testing effect at 1-week or longer intervals.
McDaniel cites Roediger & Karpicke 2006, and Wheeler, Ewers, & Buonanno 2003, for
1-week delay, and Butler & Roediger 2007 for a 1-month delay; additionally, Spitzer
(1939) tested students at an interval of 63 days.
In their 2007 study, McDaniel and colleagues tried to create experimental
conditions which would test the applicability of the testing effect in a practical setting
outside of a laboratory. This study was conducted during six weeks of a web-based Brain
and Behavior course at the University of New Mexico, with 35 participants. Each week
all students in the class were assigned approximately 40 pages of textbook reading in the
course. All participants completed weekly quizzes, two unit exams, and a final exam that
76
were constructed for the experiment. Weekly quizzes included 10 items that were
generated from the content of the reading. Each week, participants received their “quiz”
in a different test format (multiple choice (MC), short answer (SA), or read only (RO)).
On the week when the participants received the RO condition, they simply read facts and
clicked a button marked ‘‘I have read the above statement.’’ Participants were allowed 10
minutes to complete each quiz; immediately after finishing they were provided access to
feedback. Because the quizzes were online, whether or not the students used this
feedback was dependent on their own volition. After 3 weeks of quizzes (one of each
format) participants were instructed to take the first unit test, with all participants asked
the same questions. The same method was repeated for the following three weeks.
Several weeks after completing the second unit test, participants were instructed to take
the final cumulative exam, which combined material from units one and two.
Students’ performance on the unit exams is compared and summarized below in
Figure 9, which shows performance of quizzed versus not-quizzed items collapsed across
units. These results show that testing, but not additional reading, improved performance
on the unit exams for the material which was targeted during previous tests.
77
Figure 9: “Student Performance Averaged across Unit Exams”
Source: (“Testing the Testing Effect” 508)
These findings demonstrate that tests enhance learning and retention even in the face of
the variable conditions found in a college course setting. This experiment and those
experiments conducted in social studies classes (Roediger, et al. 2010) are, according to
the authors, the first to show the effectiveness of low-stakes quizzing in promoting
retention of course content on summative assessments used in actual classrooms. The
present research shows that the benefits of the testing effect can clearly transfer to the
classroom.
A common concern of testing is whether students are learning complete
conceptual relation of facts or whether they are parroting a particular answer to a
particular question. To address this concern and assess if students had a deeper
understanding of the tests, questions from the course readings alternated between the
78
weekly quizzes and unit tests so that an alternative portion of the fact was required for the
answer. In the present study, short-answer and multiple-choice quizzes improved
performance more than recognition quizzes did on a subsequent test in which the retrieval
cues had been altered (i.e. a different question stem was provided than during previous
tests). The increase in performance connected with the different wordings which this
study reports is in line with the theory of desirable difficulty, which posits that effortful
retrieval is more beneficial towards long-term retention, because recognition is a less
demanding retrieval task than recall.
Another issue addressed in the basic memory literature is the relative benefit of
cued recall tests (e.g. short answer, essay) over recognition (e.g. multiple choice) tests.
McDaniel (2007) writes
Studies with simple laboratory materials (word or paired associate lists) have found that retrieval through recall benefits subsequent test performance more so than retrieval processes associated with recognition (Cooper & Monk, 1976; Darley & Murdock, 1971; Mandler & Rabinowitz, 1981; McDaniel & Masson, 1985; Wenger, Thompson, & Bartling, 1980. (201)
In fact, in McDaniel (2007), multiple-choice quizzes produced results that were only
slightly better than repeated reading without quizzes. An initial test consisting of
multiple-choice questions often fails to produce a testing effect, presumably because such
questions require little or no retrieval (e.g., Kang et al., 2007). In this present study, the
greatest testing effect is demonstrated with short answer questions rather than multiple
79
choice questions. In a previous study using word lists, McDaniel and Masson (1985)
found that cued recall produced significantly better performance on a subsequent cued
recall test than did recognition, but importantly half of the time the cues that prompted
recall on the final test were different than those that were provided for earlier study and
testing. This pattern prompted McDaniel and Masson to suggest that retrieval through
recall produces enriched, variable encoding of the target information, more so than
retrieval through recognition. McDaniel cites further studies showing positive transfer
between testing and studying when the wording in the question stem differs between
studying and testing (Glover 1989; Lockhart 1975; Wenger, Thompson & Bartling 1980)
and argues that testing on multiple aspects of a question should produce a deeper
relational understanding of the question. The findings in this study fit with a larger body
of research, including those studies reviewed in this chapter, showing that recall tests are
more beneficial than recognition tests for subsequent memory performance.
This section has reviewed the results of studies demonstrating a testing effect in
multiple learning conditions. While the studies reviewed here are small in number, they
are uniform in suggesting that a balance of studying and testing appears to be the best
method to ensure long-term retention. The purpose of testing should be to gradually
shape production of the desired response so that it can be retrieved out of context, after a
long delay. The landmark studies of Gates and Spitzer established the testing effect as a
paradigm of learning, which has remained consistent in the later studies reviewed in this
80
chapter. The implications of this research for composition are that student learning and
retention of course material can be facilitated through frequently-administered low-stakes
tests.
The Spacing Effect
The spacing effect is the principle that spacing study sessions is better for
retention than massed study. The spacing effect has two components: 1) spaced study is
better than massed study, and 2) the most efficient method to ensure long-term retention
is through increasingly spaced repetitions of the original material. German psychologist
Herman Ebbinghaus, who used himself as the sole subject, conducted a landmark study
on human memory in the late 1800s which laid the empirical underpinnings for the
spacing effect. This study is widely recognized and extensively cited. He memorized
thirteen sets of nonsense syllables, and then tested himself at various retention intervals
and measured how long it took to forget and then relearn them. These nonsensical three
letter sets of consonant-vowel-consonant words (such as YOP, SEP, XAP, etc.) were
chosen to avoid contaminating the experiment through prior learned associations. In
experiments of astonishing rigor and tedium, Ebbinghaus practiced and recited from
memory 2.5 nonsense syllables a second, then rested for a bit and started again.
Ebbinghaus trained this way for more than a year. He then repeated the entire set of
81
experiments three years later to further confirm his findings. Finally, in 1885, he
published a monograph called Memory: A Contribution to Experimental Psychology.
Ebbinghaus’s findings and his book established the theoretical foundation for the study of
memory that psychology has relied on since. His results have been replicated in
numerous studies and serve as a foundational precept for our understanding of human
memory and learning.
Ebbinghaus identified some important empirical relationships in memory, such as
the retention and learning curves. He studied the amount of time it took to learn the list
initially and then how long it took to relearn the list, with “learning” defined as the ability
to perfectly recall the list twice. In one study he found that it took 1156 seconds to
initially learn the set, but later it took only 467 seconds to relearn the list. He found initial
forgetting was rapid but the rate of forgetting slows down over time. This was the first
expression of what has been found in virtually all studies of human learning since: the
negatively accelerated learning curve.
82
Figure 10: “Hypothetical Forgetting Curve 1”
Source: (“Learning by Spaced Repetition”)
Robert Bjork, working with Thomas Landauer (1978) of Bell Labs, published the
results of two experiments involving nearly 700 undergraduate students learning names.
Each student was given a rearranged deck of cards bearing — for initial presentation trials — first
and last names of fictitious people or — for test trials — first names only. Subjects turned
through the cards at a 9 sec. rate in time to a signal, studying and writing last—name answers as
appropriate. Next there was a 30 min. retention interval filled with a distracting lecture, followed
by a final retention test. Landauer and Bjork were looking for the optimal moment to
rehearse something so that it would later be remembered. To determine this, they studied
the effectiveness of an expanding retrieval schedule (i.e. an increase in the retention
interval after every act of retrieval) compared to an equally-spaced retrieval schedule.
Landauer and Bjork found that the expanding-interval schedule produced similar recall as
equal-interval testing on a final test at the end of the session, and both produced better
83
recall than did initial massed testing. Their results led them to theorize that the best time
to study something is at the moment you are about to forget it: retrieval right on the
threshold of forgetting produces the greatest gains in retention. In their words: “Successful
tests are more effective than repetitions. This could either be because tests induce greater
encoding effort, or because they are more similar to the performance required at eventual recall”
(631). This is in line with the theory of desirable difficulty described earlier. They found
that the expanding retrieval schedule produced a 10% increase in retention over the
equal-interval schedule. In practical terms, this finding suggests that tests should be
administered shortly after learning to ensure initial encoding, and then repeatedly
administered at increasingly spaced intervals.
An implication of the spacing effect is that spaced study will be more effective
than massed study. Massed study is cramming a lot of material in a short amount of time.
As we know, we can pass tests by cramming if the test is taken shortly after cramming,
but that kind of knowledge has very little storage strength. Dempster (1987) conducted a
study in which subjects were showed paired-associate English vocabulary words and
their definitions three times. One control group did immediate massed study (cramming),
while the other group did spaced practice with other items in between. The second
condition resulted in much greater retention than the massed schedule. Carpenter and
DeLosh (2005) also found in their article “Application of the Testing and Spacing Effects
to Name Learning,” where subjects learned paired-associates of names and faces, that
84
spaced study resulted in better retention than massed study. Both experiments in that
study showed that final retention was better for the spaced conditions than the massed
conditions, and this held true for different spacing intervals and for both studied and
tested items. All of these studies are in line with a much larger body of literature which
reports that retention is better for spaced study than massed study (e.g. Hintzman 1974;
Melton 1970; Dempster 1987) and better for spaced than massed testing (e.g. Cull et al.
1996; Cull 2000; Glover, 1989; Izawa, 1992; Landauer & Eldridge, 1967; Modigliani &
Hedges, 1987). Ruch (1928) published a review of dozens of studies of the spacing
effect; for a more recent review, see Cepeda, Pashler, Vul, Wixted, & Rohrer (2006).
H.L. Reodiger summarized these findings in the field when he was asked in a
2012 interview “How many times should one get people to retrieve things, and how soon
after learning?”:
F. Mary Pyc and Katherine Rawson at Kent State University showed that for simple things like foreign language vocabulary, retrieving about five to seven times is about right —if you test people a week later you wouldn’t see much difference between having tested people seven times or ten times, but you do see gains going up to the range of five to seven times. After that it just levels off. But most people would only practice once or twice, so the idea of going up to five or seven retrievals seems like too much to many people. Of course, to keep knowledge at your mental fingertips, you would need continued spaced retrieval practice, too. (Kleeman “Professor Roddy Roediger…”)
Five to seven repetitions over the course of a sixteen week semester should be enough. If
teachers administered tests once a week, this would provide ample opportunities for five
85
to seven repetitions of each test item, which in turn would induce long-term potentiation.
Spaced repetition relies on the principle that information does not have to be
repeated every day in order to ensure long-term retention. While repeatedly studying the
same information every day would indeed foster long-term retention, it would also be
boring and inefficient. Spacing study is a more efficient way of studying. Figures 11 and
12 conceptualize the learning process through spaced repetition. The figures do not depict
direct findings from a study, but they do help illustrate the learning curves described in
this chapter. This modified graph of a forgetting curve conceptualizes the learning
process through spaced repetition. It shows that, from the time when information is first
introduced, if no reminder is given, then the likelihood of remembering it dramatically
drops in days. The likelihood of correctly remembering an item of information is
expressed on the Y axis in terms of 0 to 100%. Time elapsed since the original learning
event is represented on the X axis. In the image, a horizontal bar extends from the 90%
chance of correct recall—near perfect memory. A negatively sloping curve represents the
average forgetting curve and show that forgetting increases rapidly over time. However if
a reminder is given before the slope of the curve drops below the horizontal bar which
represents 90% likelihood of remembering, then long-term retention can be maintained.
Interestingly, the spacing effect shows us that the period of time needed between future
reminders to maintain a memory stability of 90% will increase after each subsequent
repetition.
86
Figure 11: “Hypothetical Forgetting Curve 2”
Source: (“Want to Remember Everything You’ll Ever Learn?”)
87
Figure 12: “Hypothetical Forgetting Curve 3”
Source: (“Spaced Repetition and the CFA Exam”)
The studies in this section illustrate that spaced testing is more effective for long-
term retention than massed practice, and that it is more efficient than equally-spaced
testing. The studies in this chapter reveal the benefits for frequently administered low-
stakes tests; these tests would provide structure and support for repetition. If our goal is to
maximize learning, then we should design curriculum based on the numerous studies that
show repetition is the best way to optimize learning. If we agree that long-term retention
of classroom content is a desirable thing, then it becomes clear that we should provide
repetitions of class content at spaced intervals to best ensure long-term retention. Because
88
spacing is a more efficient means of studying than massed study, we can see a clear
benefit to designing tests that use the spacing effect. The studies reviewed in this section
show that teachers should repeatedly test on the same items, and gradually reduce the
frequency of those items on subsequent tests as they introduce new items. The studies
support the use of frequent tests in composition to ensure students' long-term retention of
key course content.
89
CHAPTER 5: TEST-ENHANCED LEARNING IN COMPOSITION
Indirect Benefits of Testing
Chapter 5 offers a number of specific examples of how our current composition
practice can be enhanced through testing. The composition textbooks which we already
use contain testable content—in fact many rhetoric textbooks contain ready-made tests
under the guise of “exercises.” These kinds of “exercises,” as we call them, are structured
similarly to tests and can be administered in class to enhance learning. Devising tests and
administering them effectively is a complicated art, but also a skill that can be developed.
This chapter presents some general guidelines for how to form tests and how to
administer them. The testing effect on its own provides a strong argument for enhancing
our current practice by administering frequent, low-stakes tests. However, in addition to
the testing effect, there are indirect benefits of testing, which this section will review.
The dialogic class that Finkel portrays (described in Chapter 2) assumes that all of
the students have come to class totally prepared by having done the reading, having
thought about the reading, and are interested and ready to engage in an animated
discussion about it. According to authors Jacobs and Chase, writing in their book
Developing and Using Tests Effectively, “If we do not ask questions on the content of
90
outside readings, then most students will not read the materials” (17). Without proper
preparation on part of the teacher, Finkel’s class is a beautiful fantasy. To turn this
fantasy into a reality, it is the teacher's job to create the conditions that lead to optimal
learning. In a collaborative model of learning, learners of all skill levels improve as a
result of collaborating within the Zone of Proximal Development but only if those
learners are cognitively capable of being in that ZPD; effective collaboration requires
capable peers. When students neglect to do the reading or other coursework, then it is
impossible to maintain an effective ZPD for everyone in the class. Teachers can help
ensure that students are capable peers by requiring them to express course content in their
own words through frequent tests.
Frequent testing requires students to space their study efforts throughout the
semester rather than concentrating them on cramming right before an exam—a method
which research and common experience shows is utterly ineffective in facilitating long-
term retention. Roediger and Karpicke (2006) write “To state an obvious point, if
students know they will be tested regularly (say, once a week, or even every class
period), they will study more and will space their studying throughout the semester rather
than concentrating it just before exams (see Bangert-Drowns 1991; Leeming, 2002)”
(249). Frequent tests will prevent students from cramming for high-stakes exams. We all
know that with dedicated cramming the night before a test many individuals are able to
pass their exams. By cramming, students are able to effectively hold data in their short
91
term memory; however, they cannot perform as well on that same exam a week later or a
month later. Because of the possibility of cramming, tests should not be used as an
exclusive means of assessment; the results will become more reliable if tests are frequent
rather than infrequent. Frequent testing in classrooms encourages students to study
continuously throughout a course, rather than bunching massive study efforts before a
few isolated tests. This process simulates an STST model, the learning model which was
most effective in the studies reviewed in Chapter 4.
Frequent quizzing might also reduce test anxiety, a trouble that plagues many
students. Test anxiety generally comes from how heavily exam grades can be weighted ,
but frequent testing is a low-stakes means of retrieval. Instructors could even start with an
ungraded quiz that can serve as a model of future and serve as an assessment of current
student knowledge. Roediger and Karpicke (2006), Pooja et al. (2008), and McDaniel
(2007) all report increased student confidence from taking frequent low-stakes tests. In
their 2006 study, Roediger and Karpicke found that students self-reported increased
levels of confidence as a result of frequent low-stakes quizzes in class. Similarly,
McDaniel and colleagues (2011) write that 64% of the subjects (out of 139 eighth-grade
science students) reported that tests reduced their anxiety of taking the unit exam and
89% reported that the tests increased learning (404). The researchers in that study also
reported that they observed disappointment on behalf of the students on days when the
quizzes were not included in class.
92
Another benefit of testing is that it helps include shy students who avoid joining
discussion. Many teachers are uncomfortable about calling out students in class, and
because there are students who do not participate in class discussions as much as their
peers, those students have fewer chances to reproduce class content in their own words.
Providing tests can vary the format of presentation to accommodate those students who
do not participate in class discussion or group work as often. Rather than excluding those
who are too shy to pipe in, tests provide alternative retrieval opportunities.
In addition, frequently administered tests provide a more holistic approach to
assessment, with assessment occurring over the course of the semester rather than
infrequently through high-stakes tests. Frequent tests give feedback to an instructor which
can help assess student capability and identify misunderstandings. This frequent
assessment provides teachers with the information they need to update the course
curriculum and maximize student learning. If many students are unclear on a particular
topic, then test results would make that misunderstanding apparent and would allow
instructors to modify their teaching accordingly. In the same way, tests can be used at the
beginning of a semester to see what skills students bring to a class. A writing test
administered at the beginning and the end of the course can be instructive for
composition instructors by demonstrating students’ growth over the semester.
Transparency is another indirect benefit of testing. Tests show students exactly
what content the teacher considers important, which makes learning goals more explicit.
93
Making our expectations for the course clear for students is important because
transparency helps direct student study efforts and generates expectancy. If students
know what questions they will be expected to answer, then they will read with that goal
in mind. Corroborating this, Roediger and Karpicke (2006) report that when students
know they will be tested, they will come to class more prepared. Typically students are
assigned large amounts of reading, and for many students this is problematic because
they do not know what material is important to remember. Obviously they won’t
remember the entire book. Generally students will highlight as they read, but without
guidelines for what material is important, students often find themselves staring at pages
covered in yellow, which undermines the entire purpose of highlighting. With specific
reading questions, the teacher is essentially highlighting the reading for the student, and
demonstrating the importance of that material by re-exposing it through tests. Reading
questions help scaffold student’s learning by emphasizing what they should focus on.
Goal-oriented reading with a study guide and reading questions will help ensure a higher
degree of retention of key concepts.
Many instructors utilize web-based instruction such as Moodle or Blackboard, or
present content through their own websites. These mediums are well suited to integrate
test-based learning because the technology allows for easier managing and conducting of
tests. As a graduate student, I taught a section of English 104: “Accelerated Composition
and Rhetoric” at Humboldt State University. Tests could be posted on a regular
94
schedule—for example, every Friday, as I did in my class. Moodle already has testing
software built in that can automatically grade many types of test items and provide
feedback immediately after testing. It also logs user information that can be made
transparent to the students, such as class participation and grades. In addition, utilizing
these web-based mediums for test administration does not require any valuable class
time. Test-enhanced learning does not require any substantial change in our current
education system, and it works very efficiently with our current web technology.
The capacity for web-based mediums to give instant feedback is truly a boon,
because feedback from frequent tests will also help students guide their study efforts.
Feedback of course means knowing if you answered correctly or incorrectly. Feedback
can be instantly provided for questions with a simple answer, but for short-answer or
essay questions, the instructor will have to give feedback manually. If students test
themselves periodically while they are studying, they may use the outcome of these tests
as a guide for future study. McDaniel, Mark A. et al. in “Testing the Testing Effect in the
Classroom” corroborates the importance of feedback:
The results are compelling for feedback effects after missing a short-answer quiz item. Clearly, learning and retention were better when students were given feedback after missing a short-answer question than reading the fact (twice) without being quizzed. (505)
Frequent test results allow students to self-assess where they are in comparison with the
expectations of the teacher and the course, and this feedback gives students the
95
information they need to adjust their studying accordingly. On the other hand, a lack of
feedback can result in continuation of errors—that is, when students answer a question
incorrectly on a test, but think they answered it correctly. The research in previous
chapters has shown that retrieval helps build long-term retention, so, if students respond
and make an error, but did not receive feedback, they may have stamped that error into
memory. In other words, because retrieval enhances learning, it is likely they will
continue remembering that error. Agarwal, Pooja K. et al. in “Examining the Testing
Effect with Open- and Closed-book Tests.” write
Prior research on the testing effect has shown that if students make errors of commission on an initial test and do not receive corrective feedback, they may retain those errors on later tests and run the risk of incorporating false information into their general knowledge (see Butler, Marsh, Goode, & Roediger, 2006; Roediger & Marsh, 2005). (862)
Giving feedback is as worthy of care, intelligence, and imagination as making up the test
in the first place.
With high-stakes exams, it can become a very tragic and emotional situation as
students plea for exceptions and makeups. However, low-stakes tests spread out the
grading weight over time such that students can miss the occasional test and not have
serious concern for passing the course. A further benefit of frequent low-stakes tests is
that it allows professors to have a very simple makeup policy for tests: never.
As we have seen, there are many reasons to test. In addition to these indirect
benefits, the research reviewed in this project has shown that active retrieval is the best
96
method to ensure long-term retention. Assumedly we attend school in order to learn—if
we are interested in learning material, isn’t it worth remembering it? If this is the case,
then, as teachers, we should try to facilitate long-term retention of course material. I
argue throughout this project that frequent, low-stakes tests are the best way to achieve
this.
Testing and Grades
In order for students to study for tests, they must believe that these tests are
important. Teachers can ensure that students take tests seriously by making tests a
required and graded component of the course. Grades help structure student effort, and
grading tests imparts on students that understanding course concepts is important. I know
from experience (both as a student and as a teacher) that if an assignment doesn't have
some graded value, it is easy to let it slip to the wayside and neglect to do it. Frequently
this is done with the intention of going back to it later only to find that "later" is so full of
critical assignments that there is no time to do the ones that are not required. Reading
assignments can end up being quickly skimmed rather than thoroughly considered.
Writing assignments that are not worth a graded value may be hastily written by students
the night before, or directly prior to class. If we consider that students have busy lives
outside of the classroom, then students neglecting assignments without a grade value
97
makes more sense. Students are indeed busy outside of the classroom; they are whole
people with complex lives—many students have to juggle a job, their personal or family
lives, and school all at once. Grades can be seen as a system that allows students to assign
value to assignments and prioritize. Assigning a grade value to tests sends a message to
students: “here are the really essential things to learn and remember from this course.”
Despite the ubiquity of grades in the university, typically there is no grade value
assigned to thoroughly reading and understanding assigned texts. Rather, readings are
assigned, and they may be briefly discussed in class, but time constraints prevent
thoroughly discussing all assigned material, which means may often not be held
accountable for completing assigned readings. Performance in many other aspects of the
class is contingent upon an understanding of the assigned texts—class discussions, essay
writing, homework assignments, and so on. So we as teachers must ask, “How can we
ensure that students have adequately read and internalized class material that will be the
focus of discussions?” Professors John-Stiener and Meehan, writing in Vygotskian
Perspectives, have considered this issue as well. They argue that “Shallow
internalizations leads to a facile combination of ideas. In contrast, working with, through,
and beyond what one has internalized and appropriated is part of the dialectic of creative
synthesis” (35). To ensure adequate internalization, we need to develop ways to ensure
that students are keeping up with their reading assignments. Otherwise a student who has
not engaged with the material may end up just hiding in the crowd during class
98
discussion, and group work will suffer as a result. One method of achieving consistent
completion of assigned reading is through frequently-administered quizzes. As teachers
we can encourage students to study the assigned readings by holding them accountable
during tests.
The Benefits of Tests According to Three HSU Professors
For the purposes of this project, I conducted interviews with many professors at
Humboldt State University. I received feedback from many professors, all of which has
been influential in this project. For purposes of brevity, I have narrowed down these
interviews to three professors. These were personal interviews and by no means should
these reports be considered authoritative or empirical. This is anecdotal information.
Professor Corey Lewis, an instructor of composition with more than thirteen years
of experience, thinks we’ve moved away from testing as part of a pendulum swing away
from top-down pedagogy to Expressivism and student-centered teaching. Current
composition practices recognize that students need to write and work with writing to
improve, but he argues that with the abandonment of tests we’ve lost what is clearly an
effective instructional technique, stating, “I don’t know anyone who is on a regular
systematic basis testing students on skills and content that directly relate to writing in
composition” (Lewis Interview). He describes having an epiphany while lecturing in
99
class; while discussing some key terms for the class and writing them up on the board, he
looked around the class and noticed that out of 22 students maybe 3 of them were taking
notes. Without taking notes and reviewing them, there is just no way students could
remember a long list of terms like those he had written on the board. Professor Lewis
tells me he assumed that students were trying to learn the terms, but it became clear from
observing students that they were not putting forth the effort necessary to memorize.
Today, it seems clear that many students are not putting forth the diligence
necessary to learn classroom material. In part this lack of effort may be a result of many
students needing to work while goind to school. Today students are faced with tuition
that costs a fortune. There are budget cuts, furloughs, and fewer resources available,
which means that most students work in addition to going to school. Many students work
full-time in addition to being full-time students. Though this may not be the case for
every student, it is useful for instructors to adopt the mindset that students are not
neglecting their studies because they do not care, but for various and complex personal
reasons. This gives educators the choice of either complaining about the current situation,
or creating the structure that students need to learn. Teachers can help students by
providing a structure for them to succeed.
Professor Lewis told me that it was our discussions about testing that brought the
need for testing to mind, and he believes that other professors are probably in the same
situation. They realize that students aren’t internalizing classroom content, but they are
100
not sure how best to facilitate that learning. Because most current composition and
English methods of instruction train us to use class discussions, workshops, and writing
conferences rather than tests, many composition instructors have not been trained to
effectively use tests in the same way they have been taught to use other pedagogical
methods.
Professor Janet Winston, another professor whom I interviewed for this project,
told me that the biggest challenge she faces as a literature teacher at HSU is getting
students to actually do the assigned reading. Though the focus of this project is teaching
composition, not literature, the two fields both necessitate that students put forth the
effort to thoroughly read and understand assigned readings. Janet’s success in a literature
classroom can be used as an example for the composition classroom. According to
Professor Winston, many of her students come from a lower middle class background--
frequently they are the first in their family to go to college, and generally they have to
hold jobs to support themselves on top of their academic duties. When students are
stretched so thin, it is understandable that they would ignore assignments that they see as
unimportant—namely, assignments without a grade value. Professor Winston reasons
that we’ve become so enamored with grades that if an assignment doesn’t have a grade
value then students regard it as unimportant. From this perspective, she argues that the
lack of a grade value for the assigned reading was the problem. The lack of a grade value
assigned to reading comprehension can seem to portray that reading assignments are not
101
important. By utilizing regular quizzes which include questions related to reading
assignments, instructors can assign a specific grade value to a thorough understanding of
the reading material, which in turn tells students that reading is indeed important.
To fix this problem, Professor Winston began giving reading quizzes once a week
that covered topics in the assigned reading. She was reluctant to use reading quizzes
originally because they seemed too narrow-minded. She didn’t want students to think
about literature from a convergent thinking approach and assume that there was only one
correct interpretation. Indeed, Professor Winston found that she had trouble at first
formulating open-ended questions that didn’t lock students into a right or wrong
response. However, over time she learned how to formulate better questions. Today her
questions are generally open-ended and require a paragraph or so of response. Rather
than focusing on plot details, they instead require a demonstration of understanding of the
broader concepts in the reading and class content. Professor Winston tells me that based
on the quiz results she can say with confidence that at least 90% of the class is keeping up
with the reading—a participation rate that many instructors would find enviable. Winston
reports that this participation rate has resulted in enhanced class discussion.
I also interviewed Professor Robert Cliver, in the history department. Professor
Cliver tells me that history is all about reading and writing—it’s about narratives, not just
memorizing facts. He reports that indeed the facts are important, but understanding their
interrelations and discussing them is what makes a good historian. Rote memorization is
102
what machines do; creative synthesis is the mark of a good scholar. These are the same
sort of qualities that we want to impress on students in a composition course as well.
Professor Cliver told me that when he first started, he didn’t give many quizzes. He
quickly learned from student feedback however that students wanted more tests. He
realized from student feedback and the results of giving quizzes, that tests can help
incentivize doing the work and encourage students to come to class. According to
Professor Cliver, a single final exam doesn’t help—“it just tests if students are good at
taking tests” (Cliver Interview). He came to learn that the traditional comprehensive end-
of-term exam is useless because having few high-stakes exams encourages cramming and
putting off work. For these reasons, Professor Cliver now gives frequent tests at the end
of class, which serve as a review for material covered during lectures and in the readings.
Though tests could be viewed as a “police measure” to simply ensure that students have
done the reading, Professor Cliver argues that tests are not just busy work, because the
activity teaches students what course content should be emphasized.
In addition to the quizzes, Professor Cliver also utilizes take home exams. He
reasons that they help students practice writing. The combination of these two activities is
what he finds most successful. Quizzes help students internalize fact-based course
content, and the take home quizzes necessitate students using that knowledge for a
creative synthesis in the form of writing. He shared an example question that might
appear on a take-home exam: “Describe how the end of WWII affected east Asia.” In
103
order to answer this question, students must have a firm grounding of the historical facts,
such as the Communist party coming to power, the alliance of Japan and the US, and the
division of Korea. The facts are necessary, but simply listing these facts is not enough for
good writing. The combination of frequent in-class tests and take-home essay exams
gives Cliver's students multiple opportunities for retrieval, as well as for the creative
synthesis of facts.
Building, Administering, and Grading Tests
We don’t want students to just memorize facts; we want them to demonstrate they
understand the principles behind the material and apply that learning to new situations.
To achieve this goal with tests, we must have a systematic structure—something to tie the
test type and content to course objectives. Bloom (1956) describes six cognitive skills in
a hierarchy from simple to complex: (1) knowledge, (2) comprehension, (3) application,
(4) analysis, (5) synthesis, and (6) evaluation. Instructors should understand these six
categories, and clearly understand which one of these categories any particular test item
falls into. Developing and Using Tests Effectively provides a list of the kind of question
wording each of these types may use, which I have abridged and re-presented here. (1)
Knowledge questions involve the recall of learned material through activities such as
remembering facts, definitions, or principles. Common questions testing knowledge
104
involve wording such as: “define,” “list,” “state,” “identify,” “label,” ”name,” “who,”
”when,” “where,” and “what.” (2) Comprehension questions require a more in-depth
understanding of learned material. Question wordings typically include: “explain,”
“predict,” “interpret,” “infer,” “summarize,” “convert,” “translate,” “give example,”
“account for,” or “paraphrase.” (3) Application is the ability to use (transfer) learned
material in a new situation or context. Application questions ask students to use concepts
to solve a problem. Typical wording or questions include: “apply,” “solve,” “show,”
“make use of,” “modify,” “demonstrate,” “compute.” (4) Analysis questions ask students
to break down material into component parts so that the organizational structure is
understood. Typical question wordings include: “differentiate,” “compare and contrast,”
“distinguish ___ from ____,” “how does ___ relate to ____,” or “why does ____ work?”
(5) Synthesis asks students to put parts together to form a new whole. Question wordings
include: “design,” “construct,” “develop,” “formulate,” “imagine,” “create,” “change,” or
“write a poem or short story.” Finally, (6) evaluation questions judge the value of
material for a given purpose using definite criteria. Typical question wordings include:
“appraise,” “evaluate,” “justify,” “judge,” “which would be better?” Considering their
local situation and their class needs, instructors should develop a table of the types of
questions they want to ask (e.g. 30% knowledge, 50% comprehension, 20% application).
This will help teachers plan and create tests, and ensure a balance of question types.
Having a mix of question types capitalizes on a balance of their various strengths.
105
When generating tests, we want tests to match the students’ expectancy. Since we
are asking students to spend time studying for these tests, it is important to help students
understand what material is important, and what they can expect from a test. We can
improve student expectancy by clearly designating test dates and the approximate amount
of time the test will take. Expectancy will further be increased if the instructor ensures
that all students take the same test with the same questions at the same time. We should
also indicate the approximate worth of test questions and the amount of time that students
should spend on particular tests According to Developing and Using Tests Effectively, as
a rule of thumb, allow about one minute per item with multiple choice and half a minute
for each true/false item. Short answer questions requiring a sentence or two will take
about two minutes to answer, and we should allow ten to fifteen minutes for a short essay
and thirty for an essay requiring two to three pages (72).
There are also some important things to understand regarding grading tests. The
location of the student’s paper in the stack can have an influence on the score assigned to
it by the reader. According to Bracht and Hopkins, (1968) the first papers read tend to get
higher scores than later ones. The reader tends to judge a paper harshly if it is preceded
by a well-written paper; if the previous paper is poorly written, the essay is judged
generously. Several studies show that the quality of handwriting, grammar, and spelling
(James 1927; Sheppard, 1929; Chase, 1968; Marshall and Powers, 1969) can all have an
impact on the scores given to as essay. We can improve reliability of scoring by
106
concealing student’s names until after a score has been assigned. This keeps instructors’
achievement expectations for their students from affecting their judgment of essays.
Other techniques include, reading only one item across all tests before going to the next
item, then reshuffling the stack of papers before going to the next. Reshuffling guarantees
that no paper will repeatedly suffer from following a good paper and none will reap the
advantage of repeatedly following a poor one.
According to Developing and Using Tests Effectively, instructors often give
inadvertent clues on tests which help students guess the correct answer. Because these
clues make questions easier, they also undermine the students’ learning process
(according to the theory of desirable difficulty). Specific determiners (e.g. “all” or
“always”) depict a situation as absolute or as qualified in a way that can lead a student to
guess that the question is probably false. Qualifying terms such as “sometimes,”
“usually,” or “typically” are uncertain enough to suggest that the question is more likely
to be true. A question wording like “The answer is a ______” has an embedded
grammatical clue. In this case only answers that begin with a consonant would be
grammatically correct, so we should modify the original to state “The answer is a(n).” By
the same token, we should make all blanks the same length so that they do not provide a
clue to the length of the answer. Other aspects to avoid include using “all of the above” or
“none of the above,” because these are too easy as distractors. Mentzer (1982) examined
thirty-five files of multiple choice test items for evidence of biases in the correct answers.
107
The most frequently occurring bias in that set was the “all of the above” response, which
was correct more than 25% of the time. If you do use multiple-choice questions, make
sure you mix them so they are not always correct. Furthermore, the correct answer should
be placed randomly, rather than in a favored letter. For example, option ‘C’ is over-
utilized as the correct response according to the same study, and option ‘A’ is
underutilized. Place the correct answer in each of the alternative positions approximately
an equal number of times but in a random order. Furthermore, avoid vague indefinite
terms denoting degree or amount, such as in a question like “T/F: A long time ago trees
covered a very large part of present-day Wyoming.” “A long time ago” could mean
anything from 1850 to 10 million years ago. Indefinite terms will make scoring less
reliable and probably confuse the test taker. Instead, use definite terms which allow for
only one correct response.
Recognition and Free-Response Questions
It is commonly accepted that, as a part of teaching writing, critical thinking is a
main component or aim of instruction. Critical thinking is both a method of thought and a
complex set of varied skills. Part of critical thinking is the ability to consider multiple
positions and reason towards the most likely conclusion. Tests, if used properly, can
encourage the development of critical thinking. Some forms of testing are better at
108
developing critical thinking skills than others such as essay or short-answer formats.
Short-answer questions are particularly effective for composition instruction and
developing critical thinking. For example, questions such as “What do you think was the
most effective form of evidence used in this article?” or “How did the author deploy
pathos to advance her argument?” not only require a sophisticated understanding of the
material, they also require critical evaluation of the author’s argument, and personal
articulation using the disciplinary language of composition.
There are two main types of test questions: recognition and free-response. Essay
questions and short answer questions are free-response. There are several advantages of
using free-response questions. They are best when assessing complex learning outcomes.
They are also relatively easy to construct. They also do not permit students to get a score
by guessing or bluffing (in most cases). However, the limitations are that they are
difficult to score, much more time consuming to grade than recognition questions,
scoring is more subjective, and a test consisting primarily of free response questions
limits the sampling of content due to time restraints. Recognition tests, such as matching,
cloze deletion, T/F, or multiple choice also have benefits and limitations. Recognition
tests allow for inclusion of much more content in a test than free response. Recognition
type items (multiple choice, T/F, matching) require students to select the correct answer
among several options. Recognition tests allow for more questions because each takes
less time to answer, which broadens the coverage of content. These qualities makes
109
recognition tests difficult to construct, but easy to score. Because recognition items
narrow down the range of possible answers, they are susceptible to guessing. However
this also makes scoring more objective and reliable.
Free-response requires students to organize and express answers in their own
words. Limiting the breadth of the essay question allows the answer to be relatively brief
and specifically tied to a single objective. A broad question like “What were the
conditions that led to the Civil war?” isn’t as good as a narrower question like “Compare
and contrast the role of agriculture in the economies of the North and South at the
outbreak of the Civil War.” A narrower question will produce a narrower response. This
will in turn improve the reliability of scoring. Rather than assessing factual content, a
method much better suited to other test formats, the essay test should be used for
assessing outcomes that require higher-level cognitive functions. Some examples that are
appropriate for essay questions are the following prompts: “Present arguments for and
against _______,” “Illustrate how a principle explains facts,” “Illustrate cause and
effect,” “Describe an application of a rule or principle,” “Evaluate the adequacy,
relevance, or implication of this data,” “Form new inferences from data,” “Organize the
parts of a situation, event, or mechanism and show how they interrelate into a whole,” or
“Sort out the relevant parts as distinct entities from a total situation, event, or
mechanism.”
The “stem” is the heart of a recognition test question. The stem should present the
110
problem with precision and clarity. Wordy problems need to be modified to reduce
unnecessary information. Always positively state the question stem; otherwise, call the
students’ attention to the negative. After writing the stem, write one correct or clearly
best answer, and three or four plausible distractors. This will help include as much of the
item material as possible in the stem, and prevent repeating words or phrases in each
distractor that could be put in the stem one time. Instruct students to choose either the
correct answer or the best answer. Some questions will have multiple possibilities but one
best answer that experts would agree on.
Writing distractors is probably the most difficult and most important part of
building multiple-choice items. Distractors should be designed around common errors
that students make or misconceptions they may have. A useful strategy in designing
distractors is to phrase an item in the form of a completion of a short-answer question.
Think of the incorrect response that students would be likely to make to the question and
let these be the distractors in the multiple-choice item. The distractors must be incorrect
but they should have enough plausibility to attract students who do not know the material
very well. Avoid writing absurd distractors. While they may be humorous or light
hearted, they increase the likelihood of students guessing the correct response because
they narrow the range of possible answers.
In cloze deletion, such as “There are _____ members of the U.S. house of
representatives and _____ members of the senate,” students get minimal cues and must
111
construct the answer. Cloze deletion items should be answered with a single word or
prhase, and statements should be worded so that they have only one right answer. For
example “The battle of Lexington was fought in ______” can be answered in several
ways, and it should be reworded to “The battle of Lexington was fought in the year
_____.” While in multiple choice tests, the students must only recognize the correct
response and choose it among the responses given; therefore the demand on students is
greater for cloze deletion items.
Also bear in mind the saying “a picture is worth a thousand words.” Graphical
occlusion can work very well for charts or mind maps or images with captions. Graphic
deletion works like cloze deletion but instead of a missing phrase it uses a missing image
component. Mind maps, charts, and other diagrams can be effective learning tools and
graphical occlusion allows for the information in these images to be tests on.
Concluding Discussion
There is much disciplinary content that can and should be tested on in
composition. First, for example, we can test students on the various stages of the writing
process and the different strategies that can be used in each one. Similarly, there are
many disciplinary specific terms that students need to know, such as “diction,” “syntax,”
“thesis-driven,” “transitions,” “juxtaposition,” and so on. And, since we teach several
112
different genres in composition the conventions of each genre such as the use of character
development and dialogue in personal essays, and logical arguments and textual evidence
in persuasive research writing can be tested on. And, of course, students can be tested on
their knowledge of grammar and punctuation rules and their ability to fix editorial errors.
I have argued in this project that tests can increase the knowledge that students have at
their disposal. Tests, alongside our current practices in composition (student-centered,
dialogic, multiple drafts, and active writing time) should improve creativity, and help
students gradually improve their writing. Unfortunately, tests are widely reviled for their
excessive role in assessment, which can result in a blanket dismissal of all forms of
testing. However, there is abundant research which shows that tests can be very effective
learning tools. Studies in cognitive science and neurophysiology both show that retrieval
practice is a necessary condition for long-term retention, and tests can provide retrieval
opportunities for students. It is my hope that this project will encourage instructors of all
kind, but especially composition instructors, to integrate frequent tests into their own
classes.
113
WORKS CITED
Abercrombie, M. L. J. The Anatomy of Judgment; an Investigation into the Processes of
Perception and Reasoning. New York: Basic, 1960. Print.
Ambrose, Susan A. How Learning Works: Seven Research-based Principles for Smart
Teaching. San Francisco, CA: Jossey-Bass, 2010. Print.
Aristotle. Aristotle's Psychology: A Treatise on the Principle of Life. S.l.: Hardpress,
2013. Print.
Bacon, Francis. The New Organon. Ed. Lisa Jardine and Michael Silverthorne.
Cambridge: Cambridge UP, 2000. Print.
Bakhtin, Mikhail Mikhaĭlovich. The Dialogic Imagination: Four Essays. N.p.: U of
Texas, 1981. Print.
Ball, Arnetha F., and Sarah Warshauer. Freedman. Bakhtinian Perspectives on Language,
Literacy, and Learning. Cambridge, UK: Cambridge UP, 2004. Print.
Bruffee, Kenneth A. "Collaborative Learning and the ‘Conversation of Mankind’."
College English 46.7 (1984): 635-52. Web.
Byrne, John H. "Synaptic Transmission in the Central Nervous System." Neuroscience
Online: An Electronic Textbook for the Neurosciences. The University of Texas
Medical School at Houston, 12 Mar. 2014. Web.
114
Carpenter, Shana K., and Edward L. DeLosh. "Application of the Testing and Spacing
Effects to Name Learning." Applied Cognitive Psychology 19.5 (2005): 619-36.
Web.
Carter, Michael, C. Miller, and A. Penrose. "Effective Composition Instruction: What
Does the Research Show?" Communication in Science, Technology and
Management 3rd ser. (1998): n. pag. Web.
Cepeda, Nicholas. ""Distributed Practice in Verbal Recall Tasks: A Review and
Quantitative Synthesis." Psychological Bulletin 132.3 (2006): 354. Web.
Chase, Clinton I. "The Impact of Some Obvious Variables on Essay Test Scores."
Journal of Educational Measurement 5.4 (1968): 315-318.
Cliver, Robert. "Discussions on Testing." Personal interview. 10 May. 2012.
Connors,, Robert J., and Andrea A. Lunsford. "Frequency of Formal Errors in Current
College Writing." College Composition and Communication 39.4 (1988): 395-
409. Web.
Cull, William L. "Untangling the Benefits of Multiple Study Opportunities and Repeated
Testing for Cued Recall." Applied Cognitive Psychology 14.3 (2000): 215-35.
Web.
Dempster, Frank N. "The Situation with Respect to the Spacing of Repetitions and
Memory." Journal of Verbal Learning and Verbal Behavior 9.5 (1970): 596-606.
Web.
115
Dempster, Frank N. "The Spacing Effect: A Case Study in the Failure to Apply the
Results of Psychological Research." American Psychologist 43.8 (1988): 627.
Web.
Dempster, Frank N. "Spacing Effects and Their Implications for Theory and Practice."
Educational Psychology Review 1.4 (1989): 309-30. Web.
Donald, Morris C., John D. Bransford, and Jeffery J. Franks. "Levels of Processing
Versus Transfer Appropriate Processing." Journal of Verbal Learning and Verbal
Behavior 16.5 (1977): 519-33. Web.
Elbow, Peter. Writing Without Teachers. New York: Oxford UP, 1973. Print.
Fernanda, Santos. "Teacher Survey Shows Morale Is at a Low Point." The New York
Times, 7 Mar. 2012. Web.
Finkel, Donald L. Teaching with Your Mouth Shut. Portsmouth, NH: Boynton/Cook,
2000. Print.
Fisher, Ronald P., and Fergus I. Craik. "Interaction between encoding and retrieval
operations in cued recall." Journal of Experimental Psychology: Human Learning
and Memory 3.6 (1977): 701.
Fleming, Gerald J., and Meredith Pike-Baky. Rain, Steam, and Speed: Building Fluency
in Adolescent Writers. San Francisco: Jossey-Bass, 2005. Print.
Freire, Paulo. Pedagogy of the Oppressed. New York: Herder & Herder, 1971. Print.
116
Glover, John A. "The 'Testing' Phenomenon: Not Gone but Nearly Forgotten." Journal of
Educational Psychology 81.3 (1989): 392. Web.
Graff, Gerald, Cathy Birkenstein, and Russel Durst. They Say / I Say: The Moves That
Matter in Academic Writing : With Readings. 2nd ed. New York: W.W. Norton,
2012. Print.
Halliday, Michael. "Towards a Language Based Theory of Learning." Linguistics and
Education 5.2 (1993): 93-116. Web.
Hintzman, Douglas L. "Judgments of Frequency and Recognition Memory in a Multiple-
Trace Memory Model." Psychological Review 95.4 (1988): 528. Web.
Jacobs, L. C., and C. I. Chase. Developing and Using Tests Effectively: A Guide for
Faculty. San Francisco, CA: Jossey-Bass, 1992. Print.
Jacoby, Larry L. "On interpreting the effects of repetition: Solving a problem versus
remembering a solution." Journal of verbal learning and verbal behavior 17.6
(1978): 649-667.
James, H. W. "The Effect of Handwriting upon Grading." The English Journal 16.3
(1927): 180-185.
Kang, Sean HK, Kathleen B. McDermott, and Henry L. Roediger III. "Test format and
corrective feedback modify the effect of testing on long-term retention."
European Journal of Cognitive Psychology 19.4-5 (2007): 528-558.
117
Kastenbaum, Steve. "The High Stakes of Standardized Tests." Schools of Thought. CNN,
26 Mar. 2012. Web.
Keyes, Ralph. The Writer's Book of Hope: Getting from Frustration to Publication. N.p.:
Macmillan, 2003. Print.
Klein, Stephen B. Learning: Principles and Applications. N.p.: Sage Publications, 2011.
Print.
Knight, J. K., and W. B. Wood. "Teaching More by Lecturing Less." Cell Biology
Education 4.4 (2005): 298-310. Web.
Kolers, Paul A., and Henry L. Roediger, III. "Procedures of Mind." Journal of Verbal
Learning and Verbal Behavior 23.4 (1984): 425-49. Web.
Landauer, Thomas K., and Lynn Eldridge. "Effect of Tests without Feedback and
Presentation-Test Interval in Paired-Associate Learning." Journal of Experimental
Psychology 75.3 (1967): 290. Web.
Lee, Carol D., and Peter Smagorinsky. Vygotskian Perspectives on Literacy Research:
Constructing Meaning through Collaborative Inquiry. Cambridge: Cambridge
UP, 2000. Print.
Lewis, Corey. "Discussions on Testing." Personal interview. 4 Apr. 2012.
Marshall, Jon C., and Jerry M. Powers. "Writing Neatness, Composition Errors, and
Essay Grades." Journal of Educational Measurement 6.2 (1969): 97-101.
118
Marsh, Elizabeth J. "The Memorial Consequences of Multiple-Choice Testing."
Psychonomic Bulletin & Review 14.2 (2007): 194-99. Web.
McDaniel, Mark A., Henry L. Roediger, and Kathleen B. McDermott. "Generalizing
Test-Enhanced Learning from the Laboratory to the Classroom." Psychonomic
Bulletin & Review 14.2 (2007): 200-06. Web.
McDaniel, Mark A. "Testing the Testing Effect in the Classroom." European Journal of
Cognitive Psychology 19.4 (2007): 494-513. Web.
Melton, Arthur W. "The Situation with Respect to the Spacing of Repetitions and
Memory."" Journal of Verbal Learning and Verbal Behavior 9.5 (1970): 596-606.
Web.
Mentzer, Thomas L. "Response biases in multiple-choice test item files." Educational
and Psychological Measurement 42.2 (1982): 437-448.
Morris, C. Donald, John D. Bransford, and Jeffery J. Franks. "Levels of processing
versus transfer appropriate processing." Journal of verbal learning and verbal
behavior 16.5 (1977): 519-533.
Moscovitch, Morris, and Fergus Craik. "Depth of processing, retrieval cues, and
uniqueness of encoding as factors in recall." Journal of Verbal Learning and
Verbal Behavior 15.4 (1976): 447-458.
119
Pellegrino, Pellegrino, Naomi Chudowsky, and Robert Glaser, eds. Knowing What
Students Know: The Science and Design of Educational Assessment. N.p.:
National Academies, 2001. Print.
Perry, Andre. "Education Reform Starts With Community Reform." Drandreperry.com.
N.p., 25 Feb. 2012. Web.
Phillips, Cecilia. "The Basics: Ion Channels Underlie Neuron Communication." Whirling
Whips: News and Stories about Neurotoxins. N.p., 12 Mar. 2014. Web.
Robinson, Ken. "Changing Education Paradigms." Youtube.com. RSA Animate, The
Royal Society of Arts, London, 2010. Web.
Roediger, Henry L., David A. Gallo, and Lisa Geraci. "Processing approaches to
cognition: The impetus from the levels-of-processing framework." Memory 10.5-
6 (2002): 319-332.
Roediger, Henry L., and Jeffrey D. Karpicke. "The Power of Testing Memory: Basic
Research and Implications for Educational Practice." Perspectives on
Psychological Science 1.3 (2006): 181-210. Web.
Roediger, Henry L., and Jeffrey D. Karpicke. "Test-Enhanced Learning: Taking Memory
Tests Improves Long-Term Retention." Psychological Science 17.3 (2006): 249-
55. Web.
Roediger, Henry L. "Implicit Memory: Retention Without Remembering." American
Psychologist 45.9 (1990): 10-43. Web.
120
Runquist, Willard N. "Some effects of remembering on forgetting." Memory & Cognition
11.6 (1983): 641-650.
Ruch, Theodore C. "Factors Influencing the Relative Economy of Massed and
Distributed Practice in Learning." Psychological Review 35.1 (1928): 19. Web.
Sapolsky, Robert M. Biology and Human Behavior: The Neurological Origins of
Individuality. Chantilly, VA: Teaching, 2005. Print.
Shepherd, Everett M. "The Effect of the Quality of Penmanship on Grades." The Journal
of Educational Research (1929): 102-105.
Tate, Gary, Amy Rupiper, and Kurt Schick. A Guide to Composition Pedagogies. New
York: Oxford UP, 2001. Print.
Vygotsky, Semenovitch L. Mind in Society: The Development of Higher Psychological
Processes. Trans. Michael Cole and Vera John-Steiner. Cambridge, MA: Harvard
UP, 1978. Print.
William, Cull L., John J. Shaughnessy, and Eugene B. Zechmeister. "Expanding
Understanding of the Expanding-Pattern-of-Retrieval Mnemonic: Toward
Confidence in Applicability." Journal of Experimental Psychology: Applied 2.4
(1996): 365. Web.
Winston, Janet. "Discussions on Testing." Personal interview. Feb. 2012.