document2
DESCRIPTION
data miningTRANSCRIPT
Assessing the subsequent effect of a formative evaluation on a program
J. Lynne Browna,*, Nancy Ellen Kiernanb
aPenn State University, Department of Food Science, 203B Borland, University Park, Pennsylvania, PA, USAbPenn State University, College of Agricultural Sciences, 401 Agricultural Administration Building, University Park, Pennsylvania, PA 814-863-3439, USA
Received 30 June 1999; received in revised form 1 September 2000; accepted 31 October 2000
Abstract
The literature on formative evaluation focuses on its conceptual framework, methodology and use. Permeating this work is a consensus
that a program will be strengthened as a result of a formative evaluation although little empirical evidence exists in the literature to
demonstrate the subsequent effects of a formative evaluation on a program. This study begins to ®ll that gap. To do this, we outline the
initial program and formative evaluation, present key ®ndings of the formative evaluation, describe how these ®ndings in¯uenced the ®nal
program and summative evaluation, and then compare the ®ndings to those of the formative. The study demonstrates that formative
evaluation can strengthen the implementation and some impacts of a program, i.e. knowledge and some behaviors. The ®ndings also suggest
that when researchers are faced with negative feedback about program components in a formative evaluation, they need to exercise care in
interpreting and using this feedback. q 2001 Elsevier Science Ltd. All rights reserved.
Keywords: Formative evaluation; Summative evaluation; Impact; Assessing feedback
1. Introduction
Formative evaluation commands a formidable place in
the evaluation literature. Highly regarded, the process was
used to improve educational ®lms in the 1920's (Cambre,
1981). Academic areas as diverse as agricultural safety
(Witte, Peterson, Vallabhan, Stephenson, Plugge, Givens
et al., 1992/93) and cardiovascular disease (Jacobs, Luep-
ker, Mittelmark, Folsom, Pirie, Mascioli et al., 1986) draw
on the process today, using ®ndings to improve a program;
among educators in particular, it is `almost universally
embraced' (Weston, 1986, p. 5). Surprisingly, the sub-
sequent effect of using the ®ndings of formative evaluation
has not received systematic attention. This paper address
that gap.
The literature focuses attention on three aspects of forma-
tive evaluation, the ®rst of which is its conceptualization.
Over time, researchers clari®ed the concept. They distin-
guished it from other forms of evaluation especially summa-
tive, the fundamental difference being the rationale and use
of the data (Baker & Alkin, 1973; Markle, 1989; Patton,
1994; Chambers, 1994; Weston, 1986); labeled it formative
evaluation (Scriven, 1967) and accepted that designation
(Rossi & Freeman, 1982; Patton, 1982; Fitz-Gibbon &
Morris, 1978); debated its frequency and timing in the
program cycle (Markle, 1979; Thiagarajan, 1991; Russell
& Blake, 1988; Chambers, 1994); scrutinized its overlap
with process evaluation (Patton, 1982; Stuf̄ ebeam, 1983;
Scheirer & Rezmovic, 1983; Dehar, Casswell & Duignan,
1993; Scheirer, 1994; Chen, 1996); and expanded its epis-
temological framework, linking it to developmental
programs (Patton, 1996). As the conceptual framework
evolved, the perceived value of formative evaluation has
only increased.
Second, the literature focuses on methods and design
strategies to conduct formative evaluation. That focus
appears ®rst, in handbooks or articles describing methods
and design strategies for either an entire program (Rossi &
Freeman, 1982; Patton, 1978; Fitzs-Gibbon & Morris, 1978)
or a segment of a program such as the materials (Weston,
1986; Bertrand, 1978), instruction (Tessmer, 1993), electro-
nic delivery like television (Baggaley, 1986), or interactive
technology (Flagg, 1990; Chen & Brown, 1994). The focus
on method and strategies appears secondly, in case studies
which illuminate a particular method or strategy tailored to
the exigencies of a particular situation such as a community
(Jacobs et al., 1986; Johnson, Osganian, Budman, Lytle,
Barrera, Bonura et al., 1994; McGraw, Stone, Osganian,
Elder, Johnson, Parcel et al., 1994; McGraw, McKinley,
McClements, Lasater, Assaf & Carleton, 1989) or worksite
(Kishchuk, Peters, Towers, Sylvestre, Bourgault & Richard,
Evaluation and Program Planning 24 (2001) 129±143
0149-7189/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved.
PII: S0149-7189(01)00004-0
www.elsevier.com/locate/evalprogplan
* Corresponding author. Tel.: 11-814-863-3973; fax: 11-814-863-6132.
E-mail address: [email protected] (J.L. Brown).
1994). Over time, the focus on methods and strategies illu-
minated critical decisions needed to design a valid forma-
tive evaluation. The decisions include: (1) who should
participateÐexperts (Geis, 1987), learners from the
targeted audience (Weston, 1986; Russell & Blake, 1988),
learners with different aptitudes (Wager, 1983), instructors
representative of those in the ®eld (Weston, 1987; Peterson
& Bickman, 1988), or drop outs from a program (Rossi &
Freeman, 1982); (2) how many to include and in what
formÐone or a group (Wager, 1983; Dick, 1980); (3)
type of data to collectÐqualitative or quantitative (Dennis,
Fetterman & Sechrest, 1994; Peterson & Bickman, 1988;
Flay, 1986); (4) data collection techniques (Weston, 1986;
Tessmer, 1993) and (5) similarity of pilot sessions relative
to actual learning situations (Rossi & Freeman, 1982;
Weston, 1986). Not surprisingly, the conviction permeating
the literature on methods and strategies is that formative
evaluation will lead to a stronger, more effective program.
Third, attention in the literature dwells on the immediate
use of formative evaluation ®ndings. Academic areas such
as nutrition (Cardinal & Sachs, 1995), cancer prevention for
agricultural workers (Parrott, Steiner & Goldenhar, 1996),
and child health (Seidel, 1993) have evaluated a program in
its formative stage. In case studies such as these, researchers
hail the evaluation process, describing the immediate effects
of the evaluation, i.e., the problems identi®ed and/or
changes to be made in a modi®ed version of the program
(Potter et al., 1990; Finnegan, Rooney, Viswanath, Elmer,
Graves, Baxter et al., 1992; Kishchuk et al., 1994; Iszler,
Crockett, Lytle, Elmer, Finnegan, Luepker et al., 1995).
These researchers are not consistent when reporting the
immediate effects of a formative evaluation. Some do not
include data; some do not outline the problems the process
identi®ed; and some do not describe the changes they made.
What is consistent however, is the message from these
researchers: formative evaluation led them to make changes
that should lead to a stronger program.
In summary, much has been written about formative
evaluationÐit's conceptual framework, its methods, and
its use. Throughout this literature, there is strong consensus
on the value of formative evaluation, some calling its value
`obvious' (Baggaley, 1986, p. 34) and `no longer ques-
tioned' (Chen & Brown, 1994, p. 192). Many educators
contend, however, that formative evaluation is not used
enough (Flagg, 1990; Kaufman, 1980; Geis, 1987; Foshee,
McLeroy, Sumner & Bibeau, 1986). Indeed, some evalua-
tions re¯ect no previous attempt at formative evaluation
(Foshee et al., 1988; Glanz, Sorensen & Farmer, 1996;
Pelletier, 1996; Schneider, Ituarte & Stokols, 1993; Wilk-
inson, Schuler & Skjolaas, 1993; Hill, May, Coppolo &
Jenkins, 1993). Other formative evaluations are limited:
using few people, non-representative samples, or selected
materials (Tessmer, 1993).
Part of the explanation for limited use of formative
evaluation may lie in the lack of empirical evidence in the
literature demonstrating its subsequent effect. Few research-
ers take the next step and demonstrate this by comparing
data from the initial program with data from the ®nal
program to show whether the changes resulted in an
improvement in program implementation and impacts.
Reviewing over 60 years of work in formative evaluation,
scholars (Flagg, 1990; Dick, 1980; Dick & Carey, 1985;
Weston, 1986,) found that the `evidence is supportive but
meager' (Geis, 1987, p. 6). Furthermore, most evidence
(Baker & Alkin, 1973; Baghdadi, 1981; Kandaswamy,
Stolovitch & Thiagarajan, 1976; Nathenson & Henderson,
1980; Scanlon, 1981; Wager, 1983; Montague, Ellis &
Wulfeck, 1983; Cambre, 1981) relates to only a component
of a program, the educational materials, not to an entire
program. Some landmark studies examine the impact of
an entire program in its formative stage such as the use of
negative income tax strategies as a substitute for welfare
(Kershaw & Fair, 1976; Rossi & Lyall, 1976; Robins, Spie-
gelman, Weiner & Bell, 1980; Hausman & Wise, 1985) and
the Department of Labor's LIFE effort to decrease arrest
rates among released felons with increased employment
(Lenihan, 1976), but only a few, such as those reported by
Fairweather and Tornatzky (1977), actually document that
the changes made as a result of a formative evaluation
resulted in a change in the impact of the ®nal program.
Given that researchers hail formative evaluation as impor-
tant, the lack of evidence about its subsequent effect points
to a surprising gap in the literature.
The purpose of this paper is to examine the subsequent
effect of a formative evaluation to see whether the changes
resulting from it improved the ®nal program, suf®ciently to
distinguish between the impact of two program delivery
methods. To do this, we: (a) outline the initial program
and its formative evaluation, (b) present the key ®ndings
of the formative evaluation, (c) describe how the formative
®ndings in¯uenced the design of the revised program and its
evaluation, and then (d) compare the results of the initial and
revised program, something rarely done in the formative
evaluation literature. In doing this, we provide a compre-
hensive look at the implementation of both a formative and
summative evaluation. In conclusion, we identify issues that
evaluators wishing to improve the design of a formative
evaluation need to consider. In addition, we identify
problems we encountered in attempting to assess the effec-
tiveness of a formative evaluation.
2. Stage one: The initial program
2.1. Background
Combining federal, state and local funding, land grant
colleges support educational health promotion programs
for individuals and communities offered by county-based
co-operative extension family living agents. Prior to our
study, agents reported poor attendance at evening and
weekend meetings but rarely offered daytime programs at
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143130
worksites. Instead, agents used correspondence lessons to
reach people unwilling to attend meetings. However, group
interaction is more likely to facilitate changes in behavior
(Glanz & Seewald-Klein, 1986), possibly through the
support offered by sharing experiences. While it was easier
for agents to mail correspondence lessons (an impersonal
delivery method), we postulated that using a group meeting
to motivate the use of each lesson in a series before it was
distributed was more likely to promote change in food/
health behaviors. To test this hypothesis, we designed a
two-stage impact study to evaluate two methods of deliver-
ing lessons biweekly at worksites: distribution alone vs
distribution in conjunction with a half-hour group meeting.
Agents delivering the program would work with new
delivery sites, new clientele, new content, and new delivery
methods. Because of this unfamiliarity and because this
program had to ®t into worksite environments with differing
work-shift patterns, lunch patterns, physical settings,
personnel departments, and required advertising, we
conducted a formative evaluation of the initial program
impact and its implementation. We included participants
and instructors in the evaluation, using a variety of qualita-
tive and quantitative methods.
2.2. Target health problem and audience
Four print lessons in the initial program addressed
prevention of osteoporosis, a recently proclaimed public
health problem (National Institutes of Health, 1985) most
often affecting white, elderly women. Prevention requires
life long adequate calcium intake and exercise. According to
NHANES II data, 75% of American women fail to consume
the recommended daily amount of calcium (Walden, 1989).
We targeted working women, ages 21±45, with at least
one child at home because these women are building bone
mass which peaks at age 35±45. Mothers can also provide
nutrition activities (Gillespie & Achterberg, 1989) that
teach children how to protect bone health.
2.3. Lesson content and organization
The lessons, based on the Health Belief Model (Janz &
Becker, 1984), encouraged participants to eat calcium-rich
foods and to walk for exercise by focusing on personal
susceptibility, disease severity, bene®ts of prevention, and
over coming barriers to health protecting actions. Because
many in the target audience disliked drinking ¯uid milk
(based on an initial survey) or could have reactions to
milk (Houts, 1988), each lesson introduced a different
calcium-rich food (non-fat dry milk, plain yogurt, canned
salmon or tofu) and menu ideas. Each also included scien-
ti®c background on the lifestyle±osteoporosis link, a self
assessment worksheet, a featured food fact sheet, sugges-
tions for involving children in food preparation, and
calcium-rich recipes. Rightwriter (1990) analysis indicated
a 12th grade reading level.
2.4. Delivery method
We tested two bi-weekly delivery methods for the
lessons. Group-delivery (G), based on the discussion±deci-
sion methods of Lewin (1943), was a 30 min motivational
session in which participants discussed adopting a behavior
suggested in each lesson (i.e. trying recipes, walking for
exercise, involving children in food preparation, and eating
calcium-rich foods, not supplements). Participants could
taste a recipe using the featured calcium-rich food and
vote by raised hands on their willingness to individually
adopt the suggested behavior. An agent served as facilita-
tor/motivator and distributed a lesson at the end of each
session. The other method, impersonal-delivery (I),
consisted of either the agent or a company contact person
simply distributing the required lesson to participants
according to schedule.
2.5. Staff training
To insure consistency, all agents received guidelines for
recruitment of worksites and participants, a program content
review, a printed program delivery script and instructions
for instrument administration.
2.6. Recruitment
Seven agents representing three rural and four urban/
suburban counties interviewed personnel managers at busi-
nesses within their county and recruited 48 worksites where
women comprised over 30% of the work force. Once work-
sites were randomly assigned to a delivery method (G or I),
agents systematically recruited participants within a month.
3. Stage one: Formative evaluation
We delineate the data collection methods, the evaluation
design, and analyses.
3.1. Evaluating program implementation
Our goals were to assess: (a) participant characteristics
relative to the prescribed target audience; (b) participant
attention to, and use of, the lessons; (c) participant reaction
to advertising, lessons content and structure, delivery
method, and time between lessons and (d) agent reaction
to delivering the program and its content.
To address goal (a), we included demographic questions
in the ®rst questionnaire administered. To address (b), we
designed a response sheet for each lesson which asked parti-
cipants how completely they had read the lesson, how easy
it was to read, and how useful it was, and whether they
completed the worksheet, tried suggestions or recipes, and
shared lesson materials. To address (c), we developed focus
group questions for participants, and, for (d), questions for
agents attending a debrie®ng.
We conducted four focus groups among participants
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 131
within a month of the intervention, two each for group-
delivery and impersonal-delivery. Each focus group derived
from a purposeful sample of thirty participants composed of
two-thirds completers and one-third non-completers. The
agent telephoned all selected and those who chose to attend
became the sample. We held the debrie®ng with all agents
within a month also. Data consisted of tape recordings and
written notes.
3.2. Evaluating program impact
Our goal was to examine changes in knowledge, attitudes,
and behaviors (KAB) needed to prevent osteoporosis using
appropriate scales, changes in calcium intake using a food
frequency questionnaire (FFQ), and changes in exercise
pattern using speci®c questions. Our hypothesis was persons
in group±delivery would exhibit greater changes in attitude
and behavior scores, calcium intake, and exercise pattern
than those in impersonal-delivery. We anticipated similar
changes in knowledge for both delivery methods because
the same lessons were used; the meeting focused primarily
on motivation.
To assess changes, we developed the KAB scales using
nutrition expert and target audience reviews and internal
consistency testing with 65 of our target audience prior to
use in Stage One. The ®nal formative instrument contained a
20 item knowledge scale (KR-20� 0.80); a 22 item attitude
scale (a� 0.78); and a 16 item behavior scale (a� 0.75) all
addressing concepts in the lessons.
We used a modi®ed version of the Block food
frequency questionnaire (Brown & Griebler, 1993) that
included the four foods featured in the osteoporosis
lessons to assess calorie and calcium intake. To examine
exercise behavior directly, we asked participants if they
exercised regularly within the last several months each
time they completed the KAB scales; after the lessons
we also asked if this exercise pattern was new, and, if
new, if it was due to the lessons.
3.3. Formative evaluation design
We employed a pre-test (T0), 8 week intervention, post-
test (T1) design to compare group-delivery (G) and imper-
sonal-delivery (I) (Fig. 1). We arranged the 48 worksites in
four blocks re¯ecting business types (white collar, educa-
tional/municipal, health care and blue collar) and assigned
them randomly to either delivery. Although eleven work-
sites withdrew prior to the intervention, primarily due to
company changes, the proportion of business types in
each delivery method was unaffected.
Participants completed pre KAB and FFQ instruments at
a meeting 1 week prior to receiving lesson one; the last
lesson included post KAB and FFQ instruments, which
participants returned at an optional post program meeting
1 week later or by mail according to Dillman (1978).
Question order in each KAB scale differed at each measure-
ment to diminish recall bias. Each lesson included the
response sheet that participants returned prior to receiving
the next lesson.
3.4. Formative evaluation data analyses
We used x2 analysis to compare categorical and ANOVA
to compare continuous implementation data, between
lessons and between delivery methods, from response sheets
returned. We examined tape recording transcripts and focus
group and debrie®ng notes for repeated themes (Krippen-
dorff, 1980).
Data from those completing both KAB instruments were
analyzed and scale scores determined allowing only one
missing value. Each individual's knowledge score was the
sum of correct answers. Each attitude and behavior state-
ment required a response on a 5-point Likert scale. Each
individual's attitude and behavior scale score was the mean
of all their responses to those questions.
Data from those completing both FFQs were coded,
entered, and analyzed using FFQ software (Health Habits
and History Questionnaire, 1989). Because nutrient value
distributions were not normal, the data were transformed
using log e prior to statistical analysis (SAS Proprietary
software, 1989).
Non-directional t-tests for independent samples were
used to test signi®cance of continuous and ordinal data
(mean age, education, KAB scores and calcium) between
delivery methods (G vs I) at each time point (T0, T1). Cate-
gorical demographic and exercise data were compared using
x2 analysis. ANOVA for repeated measures and ANCOVA
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143132
Fig. 1. Model of formative and summative evaluation design.
were used to test signi®cance of mean KAB scores and
calcium intake of matched individuals across time. The
covariates of mean income and employment status were
used in testing changes in KAB scores. Signi®cance was
assumed at #0.05.
4. Key ®ndings from the formative evaluation of theinitial program
We outline program implementation ®ndings for goals
(a)±(d) and impact ®ndings.
4.1. Program implementation
4.1.1. Goal (a): Target audience
Ultimately, 275/489 (56%) women completed post
questionnaires that met analysis criteria. Completers
and non-completers did not differ in demographic
characteristics (data not shown). When comparing deliv-
ery methods, completers differed signi®cantly only in
two factors: percent employed full time (91.6% in G
vs 81.6% in I) and percent of families with incomes
over $35,000 (57.7% vs 42.4%).
4.1.2. Goal (b): Participant's attention to and use of
lessons
Response sheets returned dropped over the four lessons;
Method G dropped from 81% of initial registrants for lesson
one to 41% for lesson four and method I from 95 to 67%.
Otherwise, the two delivery methods did not differ signi®-
cantly in attention to, and use of lessons.
Respondents that reported reading all lesson materials fell
from 85% for lesson one to 62±64% for lesson four (Fig. 2).
Regardless of delivery method, respondents rated all
lessons, on a scale of 1±5, fairly easy to read (1.4 ^ 0.6
where 1� easy to read), and useful (hovering at 2.1 ^ 0.8
where 1 is very useful). About 70% reported completing
worksheet one, 28% worksheet two, 80% worksheet three,
and 50% worksheet four.
The response sheets assessed whether participants tried
recipes, involved children in food preparation, and shared
lesson materials and revealed no signi®cant differences
between delivery methods. Although 37% of method G
tried lesson one recipes compared to 20% in method I, there-
after, percentages were lower and similar between delivery
groups. Those involving children varied from 11% for
lesson one to 2% for lesson four and those sharing recipes
with friends between 16 and 22% (Fig. 3).
4.1.3. Goal (c): Participant reactions
Fifty women (27 from G and 23 from I) participated in the
focus groups. Participants from both delivery methods were
more likely to remember personal contacts and paycheck
¯yers than other advertisements. They recommended
changes in lesson format, recipes, worksheets, and
calcium-rich foods featured. Many found the lesson booklet
cumbersome, the menus unhelpful, the worksheets in two
lessons long, and some featured foods dif®cult to adopt.
Some participants wanted the emphasis on drinking milk.
They suggested including menus and microwave instruc-
tions in the recipes. With some exceptions, women reported
it was dif®cult to involve children in food preparation or that
their children were grown.
However, some feedback was unique to a delivery
method. Group-delivery participants wanted more
lecture, more question and answer time, and less moti-
vational discussion. They could not recall voting to try
a behavior (critical to the discussion±decision method)
but liked the food tasting activity. Impersonal-delivery
participants also wanted question and answer time and
reminders to complete each lesson, but disagreed about
the period between lessons.
Participants from both delivery methods revealed that
they had limited time to try recipes and had not yet put
learned health-promoting actions into practice. They
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 133
Fig. 2. Percent reading all the lesson in each evaluation.
disliked the long KAB questionnaire and completing the
second FFQ, only 2 months after the initial one, when
they had not yet initiated changes in eating habits.
4.1.4. Goal (d): Agent reactions
All agents participated in both delivery methods. They
reported that the advertising materials did not clearly
de®ne the target audience and that in-person appeals
and an enthusiastic site contact improved recruitment.
Despite managing shifts, they preferred the interaction
and participant interest in group-delivery and the oppor-
tunity for daytime programs. But agents using group-
delivery resisted being motivators and asked to provide
lectures, perceiving that participants wanted prescriptive
advice. They felt the recipes needed improvement.
Agents echoed the lack of emphasis on drinking milk,
a political issue in counties with a dairy industry.
4.2. Program impact
As hoped, changes over time for KAB were signi®-
cant. As expected, the hypothesis that changes in knowl-
edge would not differ by delivery method was supported.
Unexpectedly, the hypothesis that those in group-deliv-
ery would show greater gains in attitude, behavior,
calcium intake, and exercise pattern than those in imper-
sonal-delivery was not supported. For the KAB
measures, time by delivery method interaction was not
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143134
Fig. 3. Percent reporting sharing lesson recipe with friends.
Fig. 4. Change in knowledge score over time.
signi®cant (Figs. 4±6). Group delivery did not affect
knowledge, attitude, or behavior scores any more than
impersonal delivery. Changes in calcium intake and
exercise pattern were not signi®cantly different between
delivery groups (data not shown).
5. Stage two: The revised program and summativeevaluation
The changes made in stage two in the program content,
recruitment, delivery method, and evaluation design and
instruments for the summative evaluation are shown in
Table 1. Almost all re¯ect key ®ndings of the stage one
formative evaluation.
5.1. Revised program lesson content and recruitment
We changed the lesson content to address the concerns
outlined above. We asked six county agents, representing
three rural and three suburban counties, to recruit four work-
sites each, a total of 24. We clari®ed the target audience in
advertising materials and directed agents toward in-person
recruiting. We lowered the lessons' reading level to accom-
modate participants from more blue collar worksites where
mothers, ages 21±45, were a signi®cant part of the work
force to insure enrolling more working women with young
children.
5.2. Revised program delivery method
The initial program implementation and impact data
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 135
Fig. 5. Change in attitude score over time.
Fig. 6. Change in behavior score over time.
indicated the group delivery method did not affect atti-
tudes and behaviors possibly because agents were
uncomfortable and did not conduct the meeting accord-
ing to directions. To rectify this, using Pelz (1959), six
agents designed four new 30 min meeting scripts that
included two to three main points, retained the food
tasting (with new recipes), and eliminated the motiva-
tional discussion. A suggested action was still promoted
at the end of the meeting and a group vote taken on
adoption. Agents were trained to use these scripts and
distributed the lessons biweekly.
5.3. Summative evaluation design
Participants' comments and the poor formative comple-
tion rate led us to use a pre (T0), immediate post (T1), and 4
month post (T2) summative evaluation design (Fig. 1). We
asked participants to complete the KAB instrument at all
time points, but the FFQ only at T0 and T2, a 6 month
interval, expecting the T2 measure would detect changes
which initial program participants claimed took time to
implement.
To improve our ability to detect changes, we compared
three intervention groups (two experimentalÐgroup-deliv-
ery and impersonal-deliveryÐand one control). The
controls received four correspondence lessons addressing
cancer prevention, identical in design to the osteoporosis
lessons the experimental groups received. The osteoporosis
and cancer lessons differed only in diet±disease context,
bene®cial nutrients and foods, and emphasis on exercise
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143136
Table 1
Major changes in educational program, evaluation design and evaluation instruments prompted by results of the formative evaluation
Type of change From To
Program lesson content
z Layout of each lesson Booklet Folder with pull-out fact sheets
z Calcium rich foods Emphasize four non-traditional foods Emphasize ¯uid milk and four non-traditional foods
z Worksheets Lesson 1: 7 day exercise diary Lesson 1: 3 day exercise diary
Lesson 4: long contract to make one
behavior change
Lesson 4: short contract to make one behavior change
z Fact sheet: food activities for children Suggestions to involve children in food
activities
Retain and give added emphasis in lessons and group meeting
z Recipes Six per lesson with conventional
instructions
Keep four most popular, but add microwave instructions and
menu suggestions; emphasize testing on weekends
z Reading level 12th grade 8th grade
Program recruitment
z Recruitment of worksites Work force must have a high percentage
of working women
Target blue collar worksites; work force must have a high
percentage of working mothers
z Advertising for target audience Print material and in-person recruitment Emphasis on in-person recruitment; clarify target audience in
all recruitment material
Program delivery
z Delivery method Group: motivational discussion about
overcoming barriers to suggested
behaviors ending with group vote on
trying the behavior [try recipes, start
walking program, involve kids in
kitchen, use foods not supplements]
plus food tasting
Group: lecture stressing 2±3 main points of lesson followed by
pep talk about suggested behavior followed by group vote on
trying the behavior [try recipes, start walking program, involve
kids in kitchen, use foods not supplements] plus food tasting
with revised recipes
Impersonal: pass out lessons on schedule Impersonal: pass out lessons on schedule
Evaluation design
z Intervention design Comparison of two delivery methods Comparison of two delivery methods with a control
Pre±post measures, T0 & T1ÐKAB and
FFQ
Pre±post 4 month post measures: T0, T1 & T2ÐKAB, T0 &
T2ÐFFQ
Response sheet in each lesson; no
incentive to return
Response sheet in each lesson; provide incentive for return
Evaluation instruments
z Impact instrument scales KAB questionnaire KAB questionnaire
z 20 knowledge questions z 14 behavior questions
KR-20� 0.80 KR-20� 0.725
z 22 attitude questions z 16 attitude questions
a� 0.78 a� 0.80
z 16 behavior questions z 14 behavior questions
a� 0.75 a� 0.80
z Response sheets z Try any suggestion for child activity:
responsesÐyes, no
For both questions, add the response: no, but plan to
z Try any recipe: responsesÐyes, no
in the osteoporosis lessons. In sum, those in group-delivery
received the modi®ed group meeting and osteoporosis
lessons; those in impersonal-delivery only the osteoporosis
lessons, and the controls only the cancer lessons.
We divided the 24 worksites into ®ve blocks re¯ecting
relative pay scale and type of worker. These were assigned
purposefully to the three intervention groups such that there
was an equal representation of all ®ve blocks in the two
experimental groups while the controls lacked representa-
tion from one of two lower pay blocks. Three companies
withdrew prior to recruitment.
All participants completed pre-test instruments at a meet-
ing 1 week prior to receiving the ®rst lesson. The post-test
KAB instrument, distributed with the last lesson, was
collected at a concluding meeting 2 weeks later. Three
months later the ®nal instruments were distributed to all
participants by the agent or by mail using a modi®ed Dill-
man Method.
5.4. Evaluating revised program implementation
To assess demographics, we included questions in the
pre-test instrument of all three intervention groups. To
assess attention to, and use of the lessons, we included a
response sheet in each lesson for the two experimental
groups only. We added a third possible response (no, but I
plan to) to questions about children's activities or recipes to
capture behavioral intention.
5.5. Evaluating revised program impact
As in the formative evaluation, we hypothesized that
those in group-delivery would exhibit greater changes
than those in impersonal-delivery in attitude and behavior
scores, exercise pattern, and calcium intake. In addition, we
hypothesized that: (a) both experimental groups would exhi-
bit greater changes in knowledge than controls and (b) those
in impersonal-delivery would exhibit greater changes than
controls in attitude and behavior scores, exercise pattern,
and calcium intake.
Based on formative participant comments, we shortened
the KAB scales to reduce participant burden, hoping to
improve completion. Using the formative instruments
completed by the 677 women registrants for stage one
(mean age 43.34 ^ 11.58), we used internal consistency
testing to eliminate less discriminating items, producing
the scales in Table 1. Items retained in the KAB instrument
(76% of the original) represented all content areas in the
formative, improving as for two scales. However, no new
questions were added. Both the question about exercise
regularity and the FFQ were not changed for the summative
evaluation.
5.6. Summative evaluation data analyses
In contrast to the formative, implementation data of those
completing all four response sheets were used, providing a
more accurate assessment of the responses of those complet-
ing the program.
Impact analysis methods were similar to those used in the
formative analyses with these modi®cations: (a) we used
only data of those completing all three KAB or FFQ instru-
ments; (b) we allowed up to two missing answers on the
knowledge scale; (c) we tested the signi®cance of continu-
ous and ordinal data among the three delivery groups at
three time points (T0, T1, T2) and (d) age served as the
covariate for ANOVA and ANCOVA. We determined
statistically signi®cant differences among values at time
points using pair-wise tests of differences between least-
squares means. A Bonferoni adjustment was used to control
the overall error rate. Signi®cance was assumed at #0.05.
Finally, we compared categorical and continuous demo-
graphic characteristics (mean age and education) between
the formative and the summative evaluation completers
using x2 analysis and non-directional t-tests.
6. Summative evaluation ®ndings and comparison withthe formative
First, we examine the implementation ®ndings. Then we
examine impacts over time comparing the results to the
control, looking at differences between the two delivery
methods. In each instance we compare the summative ®nd-
ings with those of the formative.
6.1. Program implementation
6.1.1. Goal (a): Target audience
Completion rates were better and participant demo-
graphics were closer to those desired in the summative
compared to the formative. In the summative, 70% of initial
registrants completed all three KAB measures. Almost 90%
completed the KAB instruments at T0 and T1, in contrast to
56% in the formative. Eighty percent completed both FFQ
instruments in the summative compared to 51% in the
formative. Table 2 lists the demographics of completers in
both evaluations, ®nding them similar in family income,
race, marital status, awareness of relatives with osteoporo-
sis, initial exercise pattern, and calcium intake per 1000
calories. Those completing the summative evaluation were
signi®cantly more likely than those in the formative
however, to be younger, have only a high school education,
work full time, and have at least one child at home.
In the summative, the two experimental groups and
controls differed signi®cantly in only age (data not
shown). Those in group-delivery had a mean age of
39.1 ^ 9.95 compared to 40.6 ^ 10.10 in impersonal-
delivery and 40.9 ^ 9.61 in the control.
6.1.2. Goal (b): Participant's attention to, and use of,
lessons
In the summative evaluation, 78% of group-delivery and
63% of impersonal-delivery registrants returned all four
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 137
response sheets, providing the sample for analysis.
However, formative and summative return rates are not
comparable because we did not restrict that sample to
those completing all four.
In both evaluations, we asked participants if they read all,
parts of, skimmed, or did not read each lesson. Similar to the
formative, those reading the whole lesson declined to 65±
70% by lesson four. Unlike the formative where there was
no difference between delivery groups in percent reading the
lessons, in the summative, signi®cantly more in group-
delivery skimmed and less read lessons one and two than
in impersonal-delivery (Fig. 2).
Completion rates of the worksheets did not differ between
evaluations with one exception. More completed worksheet
two, a revised exercise diary, in the summative than in the
formative (40 vs 28%, respectively).
In both evaluations, respondents were asked how easy
to read and how useful each lesson was. Respondents in
both evaluations provided nearly identical ratings of ease
of reading regardless of delivery method, suggesting the
lower reading level of the summative materials made it
easier for the less educated participants. Respondents in
both evaluations provided nearly identical ratings of
perceived usefulness for lessons one and two; however,
in the summative, group-delivery participants rated
lessons three and four, highlighting canned salmon and
tofu respectively, signi®cantly more useful than did
impersonal-delivery participants, whereas in the forma-
tive, the ratings were identical.
Questions about materials use revealed that the summa-
tive results differed from the formative, regardless of the
delivery method, in that:
² 3±10% tried a recipe (data not shown) compared to 10±
37% in the formative. Yet, in the summative, two thirds
or more in both delivery groups indicated they planned to
try a recipe, an option not available in the formative.
² 11±21% involved children in food preparation compared
to 2±11% in the formative. Additionally, in contrast to
the formative, the summative exposed differences among
delivery groups, in that the percent sharing a recipe with
friends was signi®cantly greater in group-delivery than
impersonal-delivery for lessons two (20 and 6% respec-
tively) and four (24 and 10%) while no signi®cant differ-
ences between delivery methods was evident in the
formative (Fig. 3). Lessons two and four featured less
familiar calcium-rich foods.
² Consistently more in group-delivery shared other
lesson materials with friends than in impersonal-delivery
(16±22% vs 9±11%). This distinction was not seen in the
formative where about 15% in both delivery methods
shared materials (data not shown).
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143138
Table 2
Demographic characteristics of those completing each evaluation phase. SD� standard deviation; yr.� years
Variable Formative Summative
Mean age ^ SDa N� 275, 43.02 ^ 11.12 N� 247, 40.15 ^ 9.88
Family income N� 255 N� 232
# 10±19.9000 41 (16.1%) 45 (19.4%)
20±34.9000 92 (36.1%) 83 (35.8%)
35±50,000 1 122 (47.8%) 104 (44.8%)
Employment statusa N� 272 N� 245
% full time 235 (86.4%) 232 (94.7%)
Mean educational level ^ SD (yr.)b N� 268, 13.40 ^ 1.97 N� 246, 12.70 ^ 1.54
Race N� 272 N� 246
% white 260 (95.6%) 239 (97.2%)
Marital status N� 272 N� 246
Married 200 (73.5%) 175 (71.1%)
Single 40 (14.7%) 27 (11.0%)
Other 32 (11.7%) 44 (17.9%)
Percent with at least one child at homeb N� 274, 129 (47.1%) N� 247, 163 (66.0%)
Relatives with osteoporosis N� 246 N� 247
Yes 49 (19.9%) 35 (14.2%)
No 167 (67.9%) 166 (67.2%)
Don't know 30 (12.2%) 46 (18.6%)
Exercise regularly in past 6 months? N� 269 N� 246
Yes 94 (34.9%) 90 (36.6%)
No 175 (65.1%) 156 (63.4%)
N� 248 N� 244
Calories (mean ^ SD) 1623.8 ^ 584.4 1798.0 ^ 711.2
Calcium in mg 805.4 ^ 415.5 895.3 ^ 549.2
Calcium in mg/1000 calories 497.1 ^ 178.9 490.2 ^ 206.3
a p , 0.01.b p , 0.001.
6.2. Program impact
6.2.1. Change in knowledge
As expected, our hypothesis that both group-delivery and
impersonal-delivery of osteoporosis lessons would lead to
greater knowledge gain than in controls was supported
(Fig. 4). Indeed, the gain in knowledge in group-delivery
was signi®cantly greater than that in impersonal-delivery
and both were signi®cantly greater than that of the
controls at both T1 and T2. This was not seen in the formative
evaluation.
6.2.2. Change in attitude
Our hypothesis that those in the experimental groups
would show greater gains in attitude than controls was not
supported in the summative evaluation (Fig. 5). Our hypoth-
esis that gains in attitude would differ between delivery
methods was not supported either. The initial mean attitude
scores in the summative were signi®cantly higher than those
in the formative (3.9±4.0 vs 3.0±3.1, respectively), perhaps
due to increased media focus on osteoporosis or to differ-
ences in participants, and could have limited our ability to
improve attitudes and thus to detect signi®cant changes. We
did not detect the in¯uence of any previous worksite educa-
tional activities, however.
6.2.3. Change in behavior
Our hypothesis that experimental groups would show
greater gains in behavior than controls was partially
supported in the summative (Fig. 6). Gains in behavior for
group-delivery were signi®cantly greater than the control at
T2. Administering the behavior scale 4 months after the
intervention (T2) identi®ed further changes in participant
behavior in group-delivery that were not seen in controls,
even when controlling for age, a possible explanation. Our
hypothesis that those in group-delivery would show greater
gains in food-related behavior than those in the impersonal-
delivery was not supported in the summative evaluation.
6.2.4. Changes in calcium intake and exercise pattern
Our hypothesis that those in experimental groups would
show greater gains in calcium intake than controls was not
supported in the summative. Our hypothesis that those in
group-delivery would show greater gains than those in
impersonal-delivery was also not supported. The summative
delivery methods did not have a greater differential impact
than those of the formative.
Our hypothesis that experimental groups would report
greater change in exercise patterns than controls was
partially supported (data not shown). In the summative
evaluation, the number at T2 who reported that exercising
regularly was a new pattern was signi®cantly greater in
group-delivery than in controls. Our hypothesis that exer-
cise patterns would differ between delivery methods was not
supported. The summative delivery methods did not have a
greater impact than those of the formative although the
difference was in the right direction.
7. Discussion
The purpose of this study was to test whether the changes
made in a program as a result of a formative evaluation
strengthened the implementation and impact of the program
in the summative. We found that implementation improved
but only certain measures of impact improved enough to
distinguish the effects of each delivery method. As a result
we suggest the following for evaluators to consider in
designing a formative evaluation and in attempting to assess
its effectiveness.
7.1. Improving the design of a formative evaluation
7.1.1. Interpreting (focus group) feedback from participants
As a result of participants' comments in the formative
focus groups, we extensively altered the lessons for the
summative and this did lead to greater use of a formerly
underused worksheet, continued ease of reading with parti-
cipants with lower educational levels, and more involve-
ment of children in food preparation.
Focus group participants wanted less motivational discus-
sion and more lecture in the meetings. In response, we
increased lecture time in the group-delivery method. The
in¯uence of the group-delivery method was underscored
because group-delivery participants were more likely to
skim than to read the lessons. Even with less reliance on
reading, group-delivery participants found two lessons
signi®cantly more useful and were more likely to share
lesson materials with others than those in impersonal-deliv-
ery. The agent presentation in group-delivery was clearly
critical to sell the lessons. In this case, following participant
recommendations appeared to improve impact on knowl-
edge gained, but not in most areas of behavior change.
When the focus group participants indicated they did not
like the group-delivery motivational discussion±decision
session, they also indicated a preferred alternative. Given
the extensive literature on assessing participant perspectives
(Basch,1987), we assumed we needed to alter the delivery
method to one they liked (especially since the agents echoed
this preference) and ultimately emphasized lecture over
motivation in the summative. In hindsight, we should
have asked the focus group participants how to change the
motivational aspects of the session with its emphasis on
behavior modi®cation, rather than abandon this based on
their negative feedback.
A researcher using focus groups should not assume that if
participants express dislike of a particular delivery method
and suggest another, that one should drop the original
method without considering the down stream effects. Be
prepared to probe to learn why it was disliked and how it
might be modi®ed, especially if participants are suggesting
a more passive path to learning. By making the assumption
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 139
participants were right, we limited focus group inquiry and
the type of information collected for program planning in
the summative.
We revised the recipes, believing these could be used as
the context to show participants how to use calcium-rich
foods and as a device to facilitate behavior change. In the
focus groups, we listened openly to complaints about the
recipes and asked participants how to improve them. Some
participants indicated they came to the program for the
recipes but disliked those provided. Despite these partici-
pant-guided revisions, reported use of recipes did not
improve in the summative, although a large number indi-
cated they planned to.
In the formative, we assumed that when a component of
the program i.e., the recipes, was poorly received by parti-
cipants that this merely needed improvement and thus we
speci®cally asked participants for suggestions. In hindsight,
the question we should have asked ®rst, given the
complaints, was one that tested our assumption that partici-
pants valued recipes, i.e. should recipes be included in the
lessons at all? And, if the answer was no, we should have
been ready to discuss with focus group participants, alter-
native devices to motivate behavior change.
Researchers designing formative evaluations need to be
alert for such a methodological inconsistency: why did
participants give us suggestions to improve the recipes
when, in fact, many had not used these (10±30% in the
formative). This might be explained by the inclination of
people to provide answers to questions when they are asked.
People have a tendency to tell more than they can know
(Nisbett & Wilson, 1977).
In summary, researchers using focus groups must be
prepared to probe assertions by participants that some
component of a program is unsatisfactory. Probing should
investigate both how that component might be improved
and retained as well as what might be substituted and
why. In particular, asking a question about the fundamental
usefulness of some component of a program in a formative
evaluation may not be easy for researchers as they may have
considerable ownership and resources invested in that
component. However, when faced with negative feedback
about a component of the initial program, researchers
should investigate both options to insure suf®cient data for
an informed decision about summative activities.
7.1.2. Assessing barriers to changing behaviors
Because participants came to this program on calcium-
rich foods, we assumed that they would be open to change
and that testing the suggested foods at family meals would
be acceptable. Some focus group participants indicated it
was dif®cult to introduce these foods because of family
member aversion to change. On hearing this, we assumed
that altered recipes would be suf®cient to overcome opposi-
tion and explored this in all focus groups. By not making a
conscious effort to uncover social barriers to changing food
choices for families, we limited the information we gained
to plan the summative. In hindsight, we should have taken a
proactive stance once these unanticipated barriers surfaced
in the initial focus group, and posed questions to later focus
groups based on the comments offered in previous ones.
7.1.3. Interpreting feedback from program instructors
As a result of the formative, we allowed instructors to
change the presentation in the group delivery method,
believing they would have greater ownership and thus
impact, if they designed a presentation with which they
were more comfortable. The remedy the instructors devel-
oped, a lecture rather than a motivational discussion, led to
an emphasis on knowledge rather than on behavior change
in the presentation. This may partially explain why the two
delivery methods did not differ signi®cantly in ®nal beha-
vior scale scores. And it may also explain the signi®cantly
greater gain in knowledge in the group- than the impersonal-
delivery.
The formative evaluation provided feedback about the
presentation. We assumed that when a component of the
program i.e., the presentation, was unacceptable to instruc-
tors that it should be changed and we asked for suggestions.
The acceptable change did not lead to greater impact.
Although instructor suggestions carry great face validity,
researchers need to be wary because instructors may provide
suggestions that shift the aims of the program. In hindsight,
what we should have done was to propose to the instructors,
alternative presentation methods that retained an emphasis
on behavior change. If none were acceptable, we should
have queried our assumption that agents were the appropri-
ate instructors for the group presentation.
7.1.4. Incorporating a control group
We included impact measures in the formative to gain a
quantitative estimate of the effects of the initial program but
we did not include a control group because we expected the
contrast between group- and impersonal-delivery to be
robust. Due to instructor dif®culties with the stage one
group-delivery, this contrast did not materialize. Although
we had evidence that participants were learning from the
print materials in both delivery methods, we could not
demonstrate that the change seen was better than with no
intervention. Hence we suggest that evaluators include a
control group when testing the impact of a new program
offered via different delivery methods in a formative
evaluation.
7.1.5. Watching for serendipitous effects of formative
evaluation
Including impact measures lead to unexpected participant
feedback about the instruments and their mode of adminis-
tration. This feedback proved invaluable in improving
program implementation in the summative, underscoring
the usefulness of impact measures in formative evaluations.
In hindsight, we recommend including these measures in the
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143140
formative to assess impact as well as to obtain feedback
about the instruments and their implementation.
7.2. Assessing the effectiveness of the formative evaluation
The need exists to demonstrate the subsequent effects of
formative evaluation in order to improve formative evalua-
tion design. Our study provides some guidance.
7.2.1. Altering the evaluation design
As a result of the formative, we added a control group and a
3-month post-intervention measurement in the summative.
Without either of these, we would not have been able to
observe some signi®cant differences between the two delivery
methods in the summative. Three months after the interven-
tion, group delivery produced signi®cantly greater changes in
behavior scores and exercise pattern than seen in the controls
while the impersonal-delivery method did not. This design
change also revealed that the group method lead to signi®-
cantly greater knowledge gain than the impersonal method
and both gains were greater than that of controls. Clearly our
target audience was not likely to adopt behavior changes based
on receiving just the printed materials.
However bene®cial these changes in evaluation design
were in illuminating important ®ndings about the impact
of the ®nal program, implementing them only in stage two
prevented a rigorous comparison between formative and
summative evaluations which might have allowed us to
see more clearly, the effect of the formative evaluation.
Lack of the T2 measure in the formative meant we could
not be sure what effect the initial program had 3 months later
and lack of a control group meant we could not fully inter-
pret the formative impact data. In future, we recommend
that if researchers want to assess the effectiveness of a
formative evaluation, that the design elements of formative
and summative steps be parallel. These design details may
seem more appropriate for a summative evaluation but they
are necessary to see the effects of the formative.
7.2.2. Altering the evaluation instruments
As a result of the formative, we shortened the KAB ques-
tionnaire somewhat to address participant complaints in the
summative. We believe this contributed to more participants
completing all the evaluation instruments in stage two. Thus
implementation improved in the summative. However, this
improvement carried a price. Subsequently, we were not
able to conduct the most rigorous comparison of the forma-
tive KAB results to those of the summative.
8. Conclusions
The literature on formative evaluation focuses on its
conceptual framework, its methodology and use. Permeat-
ing this work is a consensus that a program will be strength-
ened as a result of a formative evaluation although little
empirical evidence exists in the literature to demonstrate
the subsequent effects of formative evaluation on a program.
This study begins to ®ll that gap.
This study demonstrates that formative evaluation can
strengthen the implementation and impact of an educational
intervention designed to compare the impact of two program
delivery methods. The modi®cations made as a result of the
formative appear to have signi®cantly improved knowledge
gained but resulted in only modest improvements in beha-
viors in the ®nal program.
Our retrospective analysis of our experience supports the
inclusion of a control group and impact measures at several
time points in a formative evaluation of a program imple-
mented in a new environment for agency personnel. The
®ndings also suggest that when researchers are faced with
negative feedback about the components of the program in a
formative evaluation, they need to exercise care in interpret-
ing feedback and in revising those components.
In addition, the need to gather evidence of the subsequent
effect of using data from formative evaluations is critical.
Otherwise we cannot begin to examine whether the methods
and processes we take for granted in the formative evalua-
tion are valid and appropriate. However, evaluators must
carefully plan the evaluations making sure that instruments
and evaluation design are parallel in order to carry out these
comparisons. We offer our ®ndings as stimulus to consider
such studies.
References
Baggaley, J. (1986). Formative evaluation of educational television. Cana-
dian Journal of Educational Communication, 15 (1), 29±43.
Baghdadi, A. A. (1981). A comparison between two formative evaluation
methods. Dissertation Abstracts International, 41 (8), 3387A.
Baker, E. L., & Alkin, M. C. (1973). Formative evaluation of instructional
development. AV Communication Review, 21 (4), 389±418.
Basch, C. E. (1987). Focus group interview: An underutilized research
technique for improving theory and practice in health education. Health
Education Quarterly, 14 (4), 411±448.
Bertrand, J. (1978). Communications pretesting (Media Monograph No.
Six). Chicago: University of Chicago, Community and Family Study
Center.
Brown, J. L., & Griebler, R. (1993). Reliability of a short and long
version of the Block food frequency from for assessing changes in
calcium intake. Journal of the American Dietetic Association, 93
(7), 784±789.
Cambre, M. (1981). Historical overview of formative evaluation of instruc-
tional media products. Educational Communication & Technology
Journal, 29 (1), 3±25.
Cardinal, B. J., & Sachs, M. L. (1995). Prospective analysis of stage-of-
exercise movement following mail-delivered, self-instructional exer-
cise packets. American Journal of Health Promotion, 9 (6), 430±432.
Chambers, F. (1994). Removing confusion about formative and summative
evaluation: Purpose versus time. Evaluation and Program Planning, 17
(1), 9±12.
Chen, H. T. (1996). A comprehensive typology for program evaluation.
Evaluation Practice, 17 (2), 121±130.
Chen, C. H., & Brown, S. W. (1994). The impact of feedback during
interactive video instruction. International Journal of Instructional
Media, 21 (3), 191±197.
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 141
Dehar, M. A., Casswell, S., & Duignan, P. (1993). Formative and process
evaluation of health promotion and disease prevention programs.
Evaluation Review Journal, 17 (2), 204±220.
Dick, W. (1980). Formative evaluation in instructional development. Jour-
nal of Instructional Development, 3 (3), 2±6.
Dick, W., & Carey, L. (1985). The systematic design of instruction, (2nd ed)
Glenview, IL: Scott, Foresman.
Dillman, D. A. (1978). Mail and telephone surveys: The total design
method, New York: John Wiley and Sons.
Dennis, M. L., Fetterman, D. M., & Sechrest, L. (1994). Integrating quali-
tative and quantitative evaluation methods in substance abuse research.
Evaluation and Program Planning, 17 (4), 419±427.
Fairweather, G., & Tornatzky, L. G. (1977). Experimental methods for
social policy research, New York: Pergamon.
Finnegan Jr, J. R., Rooney, B., Viswanath, K., Elmer, P., Graves, K.,
Baxter, J., Hertog, J., Mullis, R., & Potter, J. (1992). Process evaluation
of a home-based program to reduce diet-related cancer risk. The `WIN
at Home Series'. Health Education Quarterly, 19 (2), 233±248.
Fitz-Gibbon, C. T., & Morris, L. L. (1978). How to design a program
evaluation, Beverly Hills, CA: Sage Publications.
Flagg, B. N. (1990). Formative evaluation for educational technologies,
Hillsdale, NJ: Lawrence Erlbaum Associates.
Flay, B. R. (1986). Ef®cacy and effectiveness trials (and other phases of
research) in the development of health promotion programs. Preventa-
tive Medicine, 15, 451±474.
Foshee, V., McLeroy, K. R., Sumner, S. K., & Bibeau, D. L. (1986).
Evaluation of worksite weight loss programs: A review of data and
issues. Journal of Nutrition Education, 18 (1), S38±S43.
Geis, G. L. (1987). Formative evaluation: Developmental testing and expert
review. Performance & Instruction, May/June, 1±8.
Gillespie, A., & Achterberg, C. (1989). Comparison of family interaction
patterns related to food and nutrition. Journal of the American Dietetic
Association, 89 (4), 509±512.
Glanz, K., & Seewald-Klein, T. (1986). Nutrition at the worksite: An over-
view. Journal of Nutrition Education, 18 (1), S1±S12.
Glanz, K., Sorensen, G., & Farmer, A. (1996). The health impact of work-
site nutrition and cholesterol intervention programs. American Journal
of Health Promotion, 10 (6), 453±470.
Hausman, J. A., & Wise, D. A. (1985). Social experimentation, Chicago:
The University of Chicago Press.
Hill, M., May, J., Coppolo, D., & Jenkins, P. (1993). Long term effective-
ness of a respiratory awareness program for farmers. National Institute
for Farm Safety, Inc. NIFS Paper No. 93-3. Columbia, MO. NIFS
Summer Meeting, Coeur d'Alene, Idaho.
Health Habits And History Questionnaire: Diet History And Other Risk
Factors (1989). Personal computer system packet. Version 2.2.
Washington, D.C.: National Cancer Institute, Division of Cancer
Prevention and Control, National Institutes of Health.
Houts, S. A. (1988). Lactose intolerance. Food Technology, 42 (3), 110±
113.
Iszler, J., Crockett, S., Lytle, L., Elmer, P., Finnegan, J., Luepker, R., &
Laing, B. (1995). Formative evaluation for planning a nutrition inter-
vention: Results from focus groups. Journal of Nutrition Education, 27
(3), 127±132.
Jacobs Jr, D. R., Luepker, R. V., Mittelmark, M. B., Folsom, A. R., Pirie, P.
L., Mascioli, S. R., Hannan, P. J., Pechacek, T. F., Bracht, N. F., Carlaw,
R. W., Kline, F. G., & Blackburn, H. (1986). Community-wide
prevention strategies: Evaluation design of the Minnesota heart health
program. Journal of Chronic Disease, 39 (10), 775±788.
Janz, N. K., & Becker, M. H. (1984). The health belief model: A decade
later. Health Education Quarterly, 11 (1), 1±47.
Johnson, C. C., Osganian, S. K., Budman, S. B., Lytle, L. A., Barrera, E. P.,
Bonura, S. R., Wu, M. C., & Nader, P. R. (1994). CATCH: Family
process evaluation in a multicenter trial. Health Education Quarterly,
Supplement, 2, S91±S106.
Kandaswamy, S., Stolovitch, H. D., & Thiagarajan, S. (1976). Learner
veri®cation and revision: An experimental comparison of two methods.
AV Communication Review, 24 (3), 316±328.
Kaufman, R. (1980). A formative evaluation of formative evaluation:
The state of the art concept. Journal of Instructional Development, 3
(3), 1±2.
Kershaw, D., & Fair, J. (1976). The New Jersey income maintenance
experiment, New York: Academic Press.
Kishchuk, N., Peters, C., Towers, A. M., Sylvestre, M., Bourgault, C., &
Richard, L. (1994). Formative and effectiveness evaluation of a work-
site program promoting healthy alcohol consumption. American Jour-
nal of Health Promotion, 8 (5), 353±362.
Krippendorff, K. (1980). Content analysis: An introduction to its methodol-
ogy, Sage: Beverly Hills, CA.
Lenihan, K. (1976). Opening the second gate, Washington, DC: U.S.
Government Printing Services.
Lewin, K. (1943). Forces behind food habits and methods of change. In
The Problem of Changing Food Habits. National Research Council
Bulletin 108. (pp. 35±65). Washington, D.C.: National Academy of
Sciences.
Markle, S. M. (1979). Evaluating instructional programs: How much is
enough? NSPI Journal, Feb, 22±24.
Markle, S. M. (1989). The ancient history of formative evaluation. Perfor-
mance and Instruction, Aug, 27±29.
McGraw, S. A., McKinley, S. A., McClements, L., Lasater, T. M., Assaf,
A., & Carleton, R. A. (1989). Methods in program evaluation: The
process evaluation system of the Pawtucket Heart Health Program.
Evaluation Review, 13 (5), 459±483.
McGraw, S. A., Stone, E. J., Osganian, S. K., Elder, J. P., Johnson, C. C.,
Parcel, G. S., Webber, L. S., & Luepker, R. V. (1994). Design of
process evaluation within the child and adolescent trial for cardio-
vascular health (CATCH). Health Education Quarterly, Supplement,
2, S5±S26.
Montague, W. E., Ellis, J. A., & Wulfeck, W. H. (1983). Instructional
quality inventory: A formative evaluation tool for instructional devel-
opment. Performance and Instruction Journal, 22 (5), 11±14.
Nathenson, M. B., & Henderson, E. S. (1980). Using student feedback to
improve learning materials, London: Croom Helm.
National Institutes of Health (1985). Surgeon general's report on nutrition
and health. U.S. Department of Health and Human Services, Public
Health Service (Chapter 7, pp. 311±343). Washington, DC: U.S.
Government Printing Service.
Nisbett, R. E. & Wilson, T. D. (1977). Tellimg more than we can know:
Verbal reports on mental processes. Psychological Review, 84(3), May.
Parrott, R., Steiner, C., & Godenhar, L. (1996). Georgia's harvesting
healthy habits: A formative evaluation. The Journal of Rural Health,
12 (4), 291±300.
Patton, M. Q. (1978). UtilizationÐfocused evaluation, Beverly Hills, CA:
Sage.
Patton, M. Q. (1982). Practical evaluation, Beverly Hills, CA: Sage.
Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice, 15
(3), 311±319.
Patton, M. Q. (1996). A world larger than formative and summative.
Evaluation Practice, 17 (2), 131±144.
Pelletier, K. R. (1996). A review and analysis of the health and cost-effec-
tive outcome studies of comprehensive health promotion and disease
prevention programs at the worksite: 1993±1995 update. American
Journal of Health Promotion, 10 (5), 380±388.
Pelz, E. B. (1959). Some factors in group decision. In E. E. Macoby, T. M.
Newcomb & E. L. Hartley, Readings in social psychology (3rd ed).
(pp. 212±219). New York: Holt, Rinehart and Winston, Inc.
Peterson, K. A., & Bickman, L. (1988). Program personnel: The missing
ingredient in describing the program environment. In J. Kendon,
Conrad Roberts-Gray & Cynthia Roberts-Gray, Evaluating program
environments, San Francisco, CA: Jossey-Bass, Inc.
Potter, J. D., Graves, K. L., Finnegan, J. R., Mullis, R. M., Baxter, J. S.,
Crockett, S., Elmer, P. J., Gloeb, B. D., Hall, N. J., Hertog, J., Pirie, P.,
Richardson, S. L., Rooney, B., Slavin, J., Snyder, M. P., Splett, P., &
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143142
Viswanath, K. (1990). The cancer and diet intervention project: a
community-based intervention to reduce nutrition-related risk of
cancer. Health Education Research, 5 (4), 489±503.
Rightwriter (1990). Version 3.1. Sarasota, FL: RightSoft, Inc.
Robins, P. K., Spiegelman, R. G., Weiner, S., & Bell, J. G. (1980). A
guaranteed annual income: Evidence from a social experiment, New
York: Academic Press.
Rossi, P. H., & Lyall, K. (1976). Reforming public welfare, New York:
Russell Sage.
Rossi, P. H., & Freeman, H. E. (1982). Evaluation: A systematic approach
(p. 69). Beverly Hills, CA: Sage Publications.
Russell, J. D., & Blake, B. L. (1988). Formative and summative evaluation
of instructional products and learners. Educational Technology, 28 (9),
22±28.
SAS Proprietary Software Release 6.09. (1989). Cary, N.C.: SAS Institute,
Inc.
Scanlon, E. (1981). Evaluating the effectiveness of distance learning: A
case study. In F. Percival & H. Ellington, Aspects of educational tech-
nology: Vol. XV: Distance learning and evaluation (pp. 164±171).
London: Kogan Page.
Scheirer, M. A. (1994). Designing and using process evaluation. In J. S.
Wholey, H. Hatry & K. Newcomer, Handbook of practical program
evaluation (pp. 40±68). San Francisco: Jossey-Bass.
Scheirer, M. A., & Rezmovic, E. L. (1983). Measuring the degree of
program implementation. Evaluation Review, 7 (5), 599±633.
Schneider, M. L., Ituarte, P., & Stokols, D. (1993). Evaluation of a commu-
nity bicycle helmet promotion campaign: What works and why. Amer-
ican Journal of Health Promotion, 7 (4), 281±287.
Scriven, M. (1967). The methodology of evaluation. In R. Tyler, R. Gagne
& M. Scriven, Perspectives of curriculum evaluation (pp. 39±83).
Chicago: Rand McNally.
Seidel, R. E. (1993). Notes from the ®eld in communication for child survi-
val, Washington, DC: USAID.
Stuf̄ ebeam, D. L. (1983). The CIPP model for program evaluation. In G.
Madaus, M. Scriven & D. Stuf̄ ebeam, Evaluation models: Viewpoints
on educational and human services evaluationBoston: Kluwer-Nijhoff.
Tessmer, M. (1993). Planning and conducting formative evaluations,
London: Kogan Page.
Thiagarajan, S. (1991). Formative evaluation in performance technology.
Performance Improvement Quarterly, 4 (2), 22±34.
Wager, J. C. (1983). One-to-one and small group formative evaluation: An
examination of two basic formative evaluation procedures. Perfor-
mance and Instruction, 22 (5), 5±7.
Walden, O. (1989). The relationship of dietary and supplemental calcium
intake to bone loss and osteoporosis. Journal of the American Dietetic
Association, 89 (3), 397±400.
Weston, C. B. (1986). Formative evaluation of instructional materials: An
overview of approaches. Canadian Journal of Educational Communi-
cation, 15 (1), 5±17.
Weston, C. B. (1987). The importance of involving experts and learners in
formative evaluation. Canadian Journal of Educational Communica-
tions, 16 (1), 45±58.
Wilkinson, T. L., Schuler, R. T., & Skjolaas, C. A. (1993). The effect of
safety training and experience of youth tractor operators. National Insti-
tute for Farm Safety, Inc. NIFS Paper No. 93±6. Columbia, MO. NIFS
Summer Meeting, Coeur d'Alene, Idaho.
Witte, K., Peterson, T. R., Vallabhan, S., Stephenson, M. T., Plugge, C. D.,
Givens, V. K., Todd, J. D., Bechtold, M. G., Hyde, M. K., & Jarrett, R.
(1992/3). Preventing tractor-related injuries and deaths in rural popula-
tions: Using a persuasive health message framework in formative
evaluation research. International Quarterly of Community Health
Education, 13 (3), 219±251.
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 143