document2

15
Assessing the subsequent effect of a formative evaluation on a program J. Lynne Brown a, * , Nancy Ellen Kiernan b a Penn State University, Department of Food Science, 203B Borland, University Park, Pennsylvania, PA, USA b Penn State University, College of Agricultural Sciences, 401 Agricultural Administration Building, University Park, Pennsylvania, PA 814-863-3439, USA Received 30 June 1999; received in revised form 1 September 2000; accepted 31 October 2000 Abstract The literature on formative evaluation focuses on its conceptual framework, methodology and use. Permeating this work is a consensus that a program will be strengthened as a result of a formative evaluation although little empirical evidence exists in the literature to demonstrate the subsequent effects of a formative evaluation on a program. This study begins to fill that gap. To do this, we outline the initial program and formative evaluation, present key findings of the formative evaluation, describe how these findings influenced the final program and summative evaluation, and then compare the findings to those of the formative. The study demonstrates that formative evaluation can strengthen the implementation and some impacts of a program, i.e. knowledge and some behaviors. The findings also suggest that when researchers are faced with negative feedback about program components in a formative evaluation, they need to exercise care in interpreting and using this feedback. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Formative evaluation; Summative evaluation; Impact; Assessing feedback 1. Introduction Formative evaluation commands a formidable place in the evaluation literature. Highly regarded, the process was used to improve educational films in the 1920’s (Cambre, 1981). Academic areas as diverse as agricultural safety (Witte, Peterson, Vallabhan, Stephenson, Plugge, Givens et al., 1992/93) and cardiovascular disease (Jacobs, Luep- ker, Mittelmark, Folsom, Pirie, Mascioli et al., 1986) draw on the process today, using findings to improve a program; among educators in particular, it is ‘almost universally embraced’ (Weston, 1986, p. 5). Surprisingly, the sub- sequent effect of using the findings of formative evaluation has not received systematic attention. This paper address that gap. The literature focuses attention on three aspects of forma- tive evaluation, the first of which is its conceptualization. Over time, researchers clarified the concept. They distin- guished it from other forms of evaluation especially summa- tive, the fundamental difference being the rationale and use of the data (Baker & Alkin, 1973; Markle, 1989; Patton, 1994; Chambers, 1994; Weston, 1986); labeled it formative evaluation (Scriven, 1967) and accepted that designation (Rossi & Freeman, 1982; Patton, 1982; Fitz-Gibbon & Morris, 1978); debated its frequency and timing in the program cycle (Markle, 1979; Thiagarajan, 1991; Russell & Blake, 1988; Chambers, 1994); scrutinized its overlap with process evaluation (Patton, 1982; Stufflebeam, 1983; Scheirer & Rezmovic, 1983; Dehar, Casswell & Duignan, 1993; Scheirer, 1994; Chen, 1996); and expanded its epis- temological framework, linking it to developmental programs (Patton, 1996). As the conceptual framework evolved, the perceived value of formative evaluation has only increased. Second, the literature focuses on methods and design strategies to conduct formative evaluation. That focus appears first, in handbooks or articles describing methods and design strategies for either an entire program (Rossi & Freeman, 1982; Patton, 1978; Fitzs-Gibbon & Morris, 1978) or a segment of a program such as the materials (Weston, 1986; Bertrand, 1978), instruction (Tessmer, 1993), electro- nic delivery like television (Baggaley, 1986), or interactive technology (Flagg, 1990; Chen & Brown, 1994). The focus on method and strategies appears secondly, in case studies which illuminate a particular method or strategy tailored to the exigencies of a particular situation such as a community (Jacobs et al., 1986; Johnson, Osganian, Budman, Lytle, Barrera, Bonura et al., 1994; McGraw, Stone, Osganian, Elder, Johnson, Parcel et al., 1994; McGraw, McKinley, McClements, Lasater, Assaf & Carleton, 1989) or worksite (Kishchuk, Peters, Towers, Sylvestre, Bourgault & Richard, Evaluation and Program Planning 24 (2001) 129–143 0149-7189/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S0149-7189(01)00004-0 www.elsevier.com/locate/evalprogplan * Corresponding author. Tel.: 11-814-863-3973; fax: 11-814-863-6132. E-mail address: [email protected] (J.L. Brown).

Upload: anna3546

Post on 24-May-2015

164 views

Category:

Education


0 download

DESCRIPTION

data mining

TRANSCRIPT

Page 1: Document2

Assessing the subsequent effect of a formative evaluation on a program

J. Lynne Browna,*, Nancy Ellen Kiernanb

aPenn State University, Department of Food Science, 203B Borland, University Park, Pennsylvania, PA, USAbPenn State University, College of Agricultural Sciences, 401 Agricultural Administration Building, University Park, Pennsylvania, PA 814-863-3439, USA

Received 30 June 1999; received in revised form 1 September 2000; accepted 31 October 2000

Abstract

The literature on formative evaluation focuses on its conceptual framework, methodology and use. Permeating this work is a consensus

that a program will be strengthened as a result of a formative evaluation although little empirical evidence exists in the literature to

demonstrate the subsequent effects of a formative evaluation on a program. This study begins to ®ll that gap. To do this, we outline the

initial program and formative evaluation, present key ®ndings of the formative evaluation, describe how these ®ndings in¯uenced the ®nal

program and summative evaluation, and then compare the ®ndings to those of the formative. The study demonstrates that formative

evaluation can strengthen the implementation and some impacts of a program, i.e. knowledge and some behaviors. The ®ndings also suggest

that when researchers are faced with negative feedback about program components in a formative evaluation, they need to exercise care in

interpreting and using this feedback. q 2001 Elsevier Science Ltd. All rights reserved.

Keywords: Formative evaluation; Summative evaluation; Impact; Assessing feedback

1. Introduction

Formative evaluation commands a formidable place in

the evaluation literature. Highly regarded, the process was

used to improve educational ®lms in the 1920's (Cambre,

1981). Academic areas as diverse as agricultural safety

(Witte, Peterson, Vallabhan, Stephenson, Plugge, Givens

et al., 1992/93) and cardiovascular disease (Jacobs, Luep-

ker, Mittelmark, Folsom, Pirie, Mascioli et al., 1986) draw

on the process today, using ®ndings to improve a program;

among educators in particular, it is `almost universally

embraced' (Weston, 1986, p. 5). Surprisingly, the sub-

sequent effect of using the ®ndings of formative evaluation

has not received systematic attention. This paper address

that gap.

The literature focuses attention on three aspects of forma-

tive evaluation, the ®rst of which is its conceptualization.

Over time, researchers clari®ed the concept. They distin-

guished it from other forms of evaluation especially summa-

tive, the fundamental difference being the rationale and use

of the data (Baker & Alkin, 1973; Markle, 1989; Patton,

1994; Chambers, 1994; Weston, 1986); labeled it formative

evaluation (Scriven, 1967) and accepted that designation

(Rossi & Freeman, 1982; Patton, 1982; Fitz-Gibbon &

Morris, 1978); debated its frequency and timing in the

program cycle (Markle, 1979; Thiagarajan, 1991; Russell

& Blake, 1988; Chambers, 1994); scrutinized its overlap

with process evaluation (Patton, 1982; Stuf̄ ebeam, 1983;

Scheirer & Rezmovic, 1983; Dehar, Casswell & Duignan,

1993; Scheirer, 1994; Chen, 1996); and expanded its epis-

temological framework, linking it to developmental

programs (Patton, 1996). As the conceptual framework

evolved, the perceived value of formative evaluation has

only increased.

Second, the literature focuses on methods and design

strategies to conduct formative evaluation. That focus

appears ®rst, in handbooks or articles describing methods

and design strategies for either an entire program (Rossi &

Freeman, 1982; Patton, 1978; Fitzs-Gibbon & Morris, 1978)

or a segment of a program such as the materials (Weston,

1986; Bertrand, 1978), instruction (Tessmer, 1993), electro-

nic delivery like television (Baggaley, 1986), or interactive

technology (Flagg, 1990; Chen & Brown, 1994). The focus

on method and strategies appears secondly, in case studies

which illuminate a particular method or strategy tailored to

the exigencies of a particular situation such as a community

(Jacobs et al., 1986; Johnson, Osganian, Budman, Lytle,

Barrera, Bonura et al., 1994; McGraw, Stone, Osganian,

Elder, Johnson, Parcel et al., 1994; McGraw, McKinley,

McClements, Lasater, Assaf & Carleton, 1989) or worksite

(Kishchuk, Peters, Towers, Sylvestre, Bourgault & Richard,

Evaluation and Program Planning 24 (2001) 129±143

0149-7189/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved.

PII: S0149-7189(01)00004-0

www.elsevier.com/locate/evalprogplan

* Corresponding author. Tel.: 11-814-863-3973; fax: 11-814-863-6132.

E-mail address: [email protected] (J.L. Brown).

Page 2: Document2

1994). Over time, the focus on methods and strategies illu-

minated critical decisions needed to design a valid forma-

tive evaluation. The decisions include: (1) who should

participateÐexperts (Geis, 1987), learners from the

targeted audience (Weston, 1986; Russell & Blake, 1988),

learners with different aptitudes (Wager, 1983), instructors

representative of those in the ®eld (Weston, 1987; Peterson

& Bickman, 1988), or drop outs from a program (Rossi &

Freeman, 1982); (2) how many to include and in what

formÐone or a group (Wager, 1983; Dick, 1980); (3)

type of data to collectÐqualitative or quantitative (Dennis,

Fetterman & Sechrest, 1994; Peterson & Bickman, 1988;

Flay, 1986); (4) data collection techniques (Weston, 1986;

Tessmer, 1993) and (5) similarity of pilot sessions relative

to actual learning situations (Rossi & Freeman, 1982;

Weston, 1986). Not surprisingly, the conviction permeating

the literature on methods and strategies is that formative

evaluation will lead to a stronger, more effective program.

Third, attention in the literature dwells on the immediate

use of formative evaluation ®ndings. Academic areas such

as nutrition (Cardinal & Sachs, 1995), cancer prevention for

agricultural workers (Parrott, Steiner & Goldenhar, 1996),

and child health (Seidel, 1993) have evaluated a program in

its formative stage. In case studies such as these, researchers

hail the evaluation process, describing the immediate effects

of the evaluation, i.e., the problems identi®ed and/or

changes to be made in a modi®ed version of the program

(Potter et al., 1990; Finnegan, Rooney, Viswanath, Elmer,

Graves, Baxter et al., 1992; Kishchuk et al., 1994; Iszler,

Crockett, Lytle, Elmer, Finnegan, Luepker et al., 1995).

These researchers are not consistent when reporting the

immediate effects of a formative evaluation. Some do not

include data; some do not outline the problems the process

identi®ed; and some do not describe the changes they made.

What is consistent however, is the message from these

researchers: formative evaluation led them to make changes

that should lead to a stronger program.

In summary, much has been written about formative

evaluationÐit's conceptual framework, its methods, and

its use. Throughout this literature, there is strong consensus

on the value of formative evaluation, some calling its value

`obvious' (Baggaley, 1986, p. 34) and `no longer ques-

tioned' (Chen & Brown, 1994, p. 192). Many educators

contend, however, that formative evaluation is not used

enough (Flagg, 1990; Kaufman, 1980; Geis, 1987; Foshee,

McLeroy, Sumner & Bibeau, 1986). Indeed, some evalua-

tions re¯ect no previous attempt at formative evaluation

(Foshee et al., 1988; Glanz, Sorensen & Farmer, 1996;

Pelletier, 1996; Schneider, Ituarte & Stokols, 1993; Wilk-

inson, Schuler & Skjolaas, 1993; Hill, May, Coppolo &

Jenkins, 1993). Other formative evaluations are limited:

using few people, non-representative samples, or selected

materials (Tessmer, 1993).

Part of the explanation for limited use of formative

evaluation may lie in the lack of empirical evidence in the

literature demonstrating its subsequent effect. Few research-

ers take the next step and demonstrate this by comparing

data from the initial program with data from the ®nal

program to show whether the changes resulted in an

improvement in program implementation and impacts.

Reviewing over 60 years of work in formative evaluation,

scholars (Flagg, 1990; Dick, 1980; Dick & Carey, 1985;

Weston, 1986,) found that the `evidence is supportive but

meager' (Geis, 1987, p. 6). Furthermore, most evidence

(Baker & Alkin, 1973; Baghdadi, 1981; Kandaswamy,

Stolovitch & Thiagarajan, 1976; Nathenson & Henderson,

1980; Scanlon, 1981; Wager, 1983; Montague, Ellis &

Wulfeck, 1983; Cambre, 1981) relates to only a component

of a program, the educational materials, not to an entire

program. Some landmark studies examine the impact of

an entire program in its formative stage such as the use of

negative income tax strategies as a substitute for welfare

(Kershaw & Fair, 1976; Rossi & Lyall, 1976; Robins, Spie-

gelman, Weiner & Bell, 1980; Hausman & Wise, 1985) and

the Department of Labor's LIFE effort to decrease arrest

rates among released felons with increased employment

(Lenihan, 1976), but only a few, such as those reported by

Fairweather and Tornatzky (1977), actually document that

the changes made as a result of a formative evaluation

resulted in a change in the impact of the ®nal program.

Given that researchers hail formative evaluation as impor-

tant, the lack of evidence about its subsequent effect points

to a surprising gap in the literature.

The purpose of this paper is to examine the subsequent

effect of a formative evaluation to see whether the changes

resulting from it improved the ®nal program, suf®ciently to

distinguish between the impact of two program delivery

methods. To do this, we: (a) outline the initial program

and its formative evaluation, (b) present the key ®ndings

of the formative evaluation, (c) describe how the formative

®ndings in¯uenced the design of the revised program and its

evaluation, and then (d) compare the results of the initial and

revised program, something rarely done in the formative

evaluation literature. In doing this, we provide a compre-

hensive look at the implementation of both a formative and

summative evaluation. In conclusion, we identify issues that

evaluators wishing to improve the design of a formative

evaluation need to consider. In addition, we identify

problems we encountered in attempting to assess the effec-

tiveness of a formative evaluation.

2. Stage one: The initial program

2.1. Background

Combining federal, state and local funding, land grant

colleges support educational health promotion programs

for individuals and communities offered by county-based

co-operative extension family living agents. Prior to our

study, agents reported poor attendance at evening and

weekend meetings but rarely offered daytime programs at

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143130

Page 3: Document2

worksites. Instead, agents used correspondence lessons to

reach people unwilling to attend meetings. However, group

interaction is more likely to facilitate changes in behavior

(Glanz & Seewald-Klein, 1986), possibly through the

support offered by sharing experiences. While it was easier

for agents to mail correspondence lessons (an impersonal

delivery method), we postulated that using a group meeting

to motivate the use of each lesson in a series before it was

distributed was more likely to promote change in food/

health behaviors. To test this hypothesis, we designed a

two-stage impact study to evaluate two methods of deliver-

ing lessons biweekly at worksites: distribution alone vs

distribution in conjunction with a half-hour group meeting.

Agents delivering the program would work with new

delivery sites, new clientele, new content, and new delivery

methods. Because of this unfamiliarity and because this

program had to ®t into worksite environments with differing

work-shift patterns, lunch patterns, physical settings,

personnel departments, and required advertising, we

conducted a formative evaluation of the initial program

impact and its implementation. We included participants

and instructors in the evaluation, using a variety of qualita-

tive and quantitative methods.

2.2. Target health problem and audience

Four print lessons in the initial program addressed

prevention of osteoporosis, a recently proclaimed public

health problem (National Institutes of Health, 1985) most

often affecting white, elderly women. Prevention requires

life long adequate calcium intake and exercise. According to

NHANES II data, 75% of American women fail to consume

the recommended daily amount of calcium (Walden, 1989).

We targeted working women, ages 21±45, with at least

one child at home because these women are building bone

mass which peaks at age 35±45. Mothers can also provide

nutrition activities (Gillespie & Achterberg, 1989) that

teach children how to protect bone health.

2.3. Lesson content and organization

The lessons, based on the Health Belief Model (Janz &

Becker, 1984), encouraged participants to eat calcium-rich

foods and to walk for exercise by focusing on personal

susceptibility, disease severity, bene®ts of prevention, and

over coming barriers to health protecting actions. Because

many in the target audience disliked drinking ¯uid milk

(based on an initial survey) or could have reactions to

milk (Houts, 1988), each lesson introduced a different

calcium-rich food (non-fat dry milk, plain yogurt, canned

salmon or tofu) and menu ideas. Each also included scien-

ti®c background on the lifestyle±osteoporosis link, a self

assessment worksheet, a featured food fact sheet, sugges-

tions for involving children in food preparation, and

calcium-rich recipes. Rightwriter (1990) analysis indicated

a 12th grade reading level.

2.4. Delivery method

We tested two bi-weekly delivery methods for the

lessons. Group-delivery (G), based on the discussion±deci-

sion methods of Lewin (1943), was a 30 min motivational

session in which participants discussed adopting a behavior

suggested in each lesson (i.e. trying recipes, walking for

exercise, involving children in food preparation, and eating

calcium-rich foods, not supplements). Participants could

taste a recipe using the featured calcium-rich food and

vote by raised hands on their willingness to individually

adopt the suggested behavior. An agent served as facilita-

tor/motivator and distributed a lesson at the end of each

session. The other method, impersonal-delivery (I),

consisted of either the agent or a company contact person

simply distributing the required lesson to participants

according to schedule.

2.5. Staff training

To insure consistency, all agents received guidelines for

recruitment of worksites and participants, a program content

review, a printed program delivery script and instructions

for instrument administration.

2.6. Recruitment

Seven agents representing three rural and four urban/

suburban counties interviewed personnel managers at busi-

nesses within their county and recruited 48 worksites where

women comprised over 30% of the work force. Once work-

sites were randomly assigned to a delivery method (G or I),

agents systematically recruited participants within a month.

3. Stage one: Formative evaluation

We delineate the data collection methods, the evaluation

design, and analyses.

3.1. Evaluating program implementation

Our goals were to assess: (a) participant characteristics

relative to the prescribed target audience; (b) participant

attention to, and use of, the lessons; (c) participant reaction

to advertising, lessons content and structure, delivery

method, and time between lessons and (d) agent reaction

to delivering the program and its content.

To address goal (a), we included demographic questions

in the ®rst questionnaire administered. To address (b), we

designed a response sheet for each lesson which asked parti-

cipants how completely they had read the lesson, how easy

it was to read, and how useful it was, and whether they

completed the worksheet, tried suggestions or recipes, and

shared lesson materials. To address (c), we developed focus

group questions for participants, and, for (d), questions for

agents attending a debrie®ng.

We conducted four focus groups among participants

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 131

Page 4: Document2

within a month of the intervention, two each for group-

delivery and impersonal-delivery. Each focus group derived

from a purposeful sample of thirty participants composed of

two-thirds completers and one-third non-completers. The

agent telephoned all selected and those who chose to attend

became the sample. We held the debrie®ng with all agents

within a month also. Data consisted of tape recordings and

written notes.

3.2. Evaluating program impact

Our goal was to examine changes in knowledge, attitudes,

and behaviors (KAB) needed to prevent osteoporosis using

appropriate scales, changes in calcium intake using a food

frequency questionnaire (FFQ), and changes in exercise

pattern using speci®c questions. Our hypothesis was persons

in group±delivery would exhibit greater changes in attitude

and behavior scores, calcium intake, and exercise pattern

than those in impersonal-delivery. We anticipated similar

changes in knowledge for both delivery methods because

the same lessons were used; the meeting focused primarily

on motivation.

To assess changes, we developed the KAB scales using

nutrition expert and target audience reviews and internal

consistency testing with 65 of our target audience prior to

use in Stage One. The ®nal formative instrument contained a

20 item knowledge scale (KR-20� 0.80); a 22 item attitude

scale (a� 0.78); and a 16 item behavior scale (a� 0.75) all

addressing concepts in the lessons.

We used a modi®ed version of the Block food

frequency questionnaire (Brown & Griebler, 1993) that

included the four foods featured in the osteoporosis

lessons to assess calorie and calcium intake. To examine

exercise behavior directly, we asked participants if they

exercised regularly within the last several months each

time they completed the KAB scales; after the lessons

we also asked if this exercise pattern was new, and, if

new, if it was due to the lessons.

3.3. Formative evaluation design

We employed a pre-test (T0), 8 week intervention, post-

test (T1) design to compare group-delivery (G) and imper-

sonal-delivery (I) (Fig. 1). We arranged the 48 worksites in

four blocks re¯ecting business types (white collar, educa-

tional/municipal, health care and blue collar) and assigned

them randomly to either delivery. Although eleven work-

sites withdrew prior to the intervention, primarily due to

company changes, the proportion of business types in

each delivery method was unaffected.

Participants completed pre KAB and FFQ instruments at

a meeting 1 week prior to receiving lesson one; the last

lesson included post KAB and FFQ instruments, which

participants returned at an optional post program meeting

1 week later or by mail according to Dillman (1978).

Question order in each KAB scale differed at each measure-

ment to diminish recall bias. Each lesson included the

response sheet that participants returned prior to receiving

the next lesson.

3.4. Formative evaluation data analyses

We used x2 analysis to compare categorical and ANOVA

to compare continuous implementation data, between

lessons and between delivery methods, from response sheets

returned. We examined tape recording transcripts and focus

group and debrie®ng notes for repeated themes (Krippen-

dorff, 1980).

Data from those completing both KAB instruments were

analyzed and scale scores determined allowing only one

missing value. Each individual's knowledge score was the

sum of correct answers. Each attitude and behavior state-

ment required a response on a 5-point Likert scale. Each

individual's attitude and behavior scale score was the mean

of all their responses to those questions.

Data from those completing both FFQs were coded,

entered, and analyzed using FFQ software (Health Habits

and History Questionnaire, 1989). Because nutrient value

distributions were not normal, the data were transformed

using log e prior to statistical analysis (SAS Proprietary

software, 1989).

Non-directional t-tests for independent samples were

used to test signi®cance of continuous and ordinal data

(mean age, education, KAB scores and calcium) between

delivery methods (G vs I) at each time point (T0, T1). Cate-

gorical demographic and exercise data were compared using

x2 analysis. ANOVA for repeated measures and ANCOVA

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143132

Fig. 1. Model of formative and summative evaluation design.

Page 5: Document2

were used to test signi®cance of mean KAB scores and

calcium intake of matched individuals across time. The

covariates of mean income and employment status were

used in testing changes in KAB scores. Signi®cance was

assumed at #0.05.

4. Key ®ndings from the formative evaluation of theinitial program

We outline program implementation ®ndings for goals

(a)±(d) and impact ®ndings.

4.1. Program implementation

4.1.1. Goal (a): Target audience

Ultimately, 275/489 (56%) women completed post

questionnaires that met analysis criteria. Completers

and non-completers did not differ in demographic

characteristics (data not shown). When comparing deliv-

ery methods, completers differed signi®cantly only in

two factors: percent employed full time (91.6% in G

vs 81.6% in I) and percent of families with incomes

over $35,000 (57.7% vs 42.4%).

4.1.2. Goal (b): Participant's attention to and use of

lessons

Response sheets returned dropped over the four lessons;

Method G dropped from 81% of initial registrants for lesson

one to 41% for lesson four and method I from 95 to 67%.

Otherwise, the two delivery methods did not differ signi®-

cantly in attention to, and use of lessons.

Respondents that reported reading all lesson materials fell

from 85% for lesson one to 62±64% for lesson four (Fig. 2).

Regardless of delivery method, respondents rated all

lessons, on a scale of 1±5, fairly easy to read (1.4 ^ 0.6

where 1� easy to read), and useful (hovering at 2.1 ^ 0.8

where 1 is very useful). About 70% reported completing

worksheet one, 28% worksheet two, 80% worksheet three,

and 50% worksheet four.

The response sheets assessed whether participants tried

recipes, involved children in food preparation, and shared

lesson materials and revealed no signi®cant differences

between delivery methods. Although 37% of method G

tried lesson one recipes compared to 20% in method I, there-

after, percentages were lower and similar between delivery

groups. Those involving children varied from 11% for

lesson one to 2% for lesson four and those sharing recipes

with friends between 16 and 22% (Fig. 3).

4.1.3. Goal (c): Participant reactions

Fifty women (27 from G and 23 from I) participated in the

focus groups. Participants from both delivery methods were

more likely to remember personal contacts and paycheck

¯yers than other advertisements. They recommended

changes in lesson format, recipes, worksheets, and

calcium-rich foods featured. Many found the lesson booklet

cumbersome, the menus unhelpful, the worksheets in two

lessons long, and some featured foods dif®cult to adopt.

Some participants wanted the emphasis on drinking milk.

They suggested including menus and microwave instruc-

tions in the recipes. With some exceptions, women reported

it was dif®cult to involve children in food preparation or that

their children were grown.

However, some feedback was unique to a delivery

method. Group-delivery participants wanted more

lecture, more question and answer time, and less moti-

vational discussion. They could not recall voting to try

a behavior (critical to the discussion±decision method)

but liked the food tasting activity. Impersonal-delivery

participants also wanted question and answer time and

reminders to complete each lesson, but disagreed about

the period between lessons.

Participants from both delivery methods revealed that

they had limited time to try recipes and had not yet put

learned health-promoting actions into practice. They

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 133

Fig. 2. Percent reading all the lesson in each evaluation.

Page 6: Document2

disliked the long KAB questionnaire and completing the

second FFQ, only 2 months after the initial one, when

they had not yet initiated changes in eating habits.

4.1.4. Goal (d): Agent reactions

All agents participated in both delivery methods. They

reported that the advertising materials did not clearly

de®ne the target audience and that in-person appeals

and an enthusiastic site contact improved recruitment.

Despite managing shifts, they preferred the interaction

and participant interest in group-delivery and the oppor-

tunity for daytime programs. But agents using group-

delivery resisted being motivators and asked to provide

lectures, perceiving that participants wanted prescriptive

advice. They felt the recipes needed improvement.

Agents echoed the lack of emphasis on drinking milk,

a political issue in counties with a dairy industry.

4.2. Program impact

As hoped, changes over time for KAB were signi®-

cant. As expected, the hypothesis that changes in knowl-

edge would not differ by delivery method was supported.

Unexpectedly, the hypothesis that those in group-deliv-

ery would show greater gains in attitude, behavior,

calcium intake, and exercise pattern than those in imper-

sonal-delivery was not supported. For the KAB

measures, time by delivery method interaction was not

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143134

Fig. 3. Percent reporting sharing lesson recipe with friends.

Fig. 4. Change in knowledge score over time.

Page 7: Document2

signi®cant (Figs. 4±6). Group delivery did not affect

knowledge, attitude, or behavior scores any more than

impersonal delivery. Changes in calcium intake and

exercise pattern were not signi®cantly different between

delivery groups (data not shown).

5. Stage two: The revised program and summativeevaluation

The changes made in stage two in the program content,

recruitment, delivery method, and evaluation design and

instruments for the summative evaluation are shown in

Table 1. Almost all re¯ect key ®ndings of the stage one

formative evaluation.

5.1. Revised program lesson content and recruitment

We changed the lesson content to address the concerns

outlined above. We asked six county agents, representing

three rural and three suburban counties, to recruit four work-

sites each, a total of 24. We clari®ed the target audience in

advertising materials and directed agents toward in-person

recruiting. We lowered the lessons' reading level to accom-

modate participants from more blue collar worksites where

mothers, ages 21±45, were a signi®cant part of the work

force to insure enrolling more working women with young

children.

5.2. Revised program delivery method

The initial program implementation and impact data

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 135

Fig. 5. Change in attitude score over time.

Fig. 6. Change in behavior score over time.

Page 8: Document2

indicated the group delivery method did not affect atti-

tudes and behaviors possibly because agents were

uncomfortable and did not conduct the meeting accord-

ing to directions. To rectify this, using Pelz (1959), six

agents designed four new 30 min meeting scripts that

included two to three main points, retained the food

tasting (with new recipes), and eliminated the motiva-

tional discussion. A suggested action was still promoted

at the end of the meeting and a group vote taken on

adoption. Agents were trained to use these scripts and

distributed the lessons biweekly.

5.3. Summative evaluation design

Participants' comments and the poor formative comple-

tion rate led us to use a pre (T0), immediate post (T1), and 4

month post (T2) summative evaluation design (Fig. 1). We

asked participants to complete the KAB instrument at all

time points, but the FFQ only at T0 and T2, a 6 month

interval, expecting the T2 measure would detect changes

which initial program participants claimed took time to

implement.

To improve our ability to detect changes, we compared

three intervention groups (two experimentalÐgroup-deliv-

ery and impersonal-deliveryÐand one control). The

controls received four correspondence lessons addressing

cancer prevention, identical in design to the osteoporosis

lessons the experimental groups received. The osteoporosis

and cancer lessons differed only in diet±disease context,

bene®cial nutrients and foods, and emphasis on exercise

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143136

Table 1

Major changes in educational program, evaluation design and evaluation instruments prompted by results of the formative evaluation

Type of change From To

Program lesson content

z Layout of each lesson Booklet Folder with pull-out fact sheets

z Calcium rich foods Emphasize four non-traditional foods Emphasize ¯uid milk and four non-traditional foods

z Worksheets Lesson 1: 7 day exercise diary Lesson 1: 3 day exercise diary

Lesson 4: long contract to make one

behavior change

Lesson 4: short contract to make one behavior change

z Fact sheet: food activities for children Suggestions to involve children in food

activities

Retain and give added emphasis in lessons and group meeting

z Recipes Six per lesson with conventional

instructions

Keep four most popular, but add microwave instructions and

menu suggestions; emphasize testing on weekends

z Reading level 12th grade 8th grade

Program recruitment

z Recruitment of worksites Work force must have a high percentage

of working women

Target blue collar worksites; work force must have a high

percentage of working mothers

z Advertising for target audience Print material and in-person recruitment Emphasis on in-person recruitment; clarify target audience in

all recruitment material

Program delivery

z Delivery method Group: motivational discussion about

overcoming barriers to suggested

behaviors ending with group vote on

trying the behavior [try recipes, start

walking program, involve kids in

kitchen, use foods not supplements]

plus food tasting

Group: lecture stressing 2±3 main points of lesson followed by

pep talk about suggested behavior followed by group vote on

trying the behavior [try recipes, start walking program, involve

kids in kitchen, use foods not supplements] plus food tasting

with revised recipes

Impersonal: pass out lessons on schedule Impersonal: pass out lessons on schedule

Evaluation design

z Intervention design Comparison of two delivery methods Comparison of two delivery methods with a control

Pre±post measures, T0 & T1ÐKAB and

FFQ

Pre±post 4 month post measures: T0, T1 & T2ÐKAB, T0 &

T2ÐFFQ

Response sheet in each lesson; no

incentive to return

Response sheet in each lesson; provide incentive for return

Evaluation instruments

z Impact instrument scales KAB questionnaire KAB questionnaire

z 20 knowledge questions z 14 behavior questions

KR-20� 0.80 KR-20� 0.725

z 22 attitude questions z 16 attitude questions

a� 0.78 a� 0.80

z 16 behavior questions z 14 behavior questions

a� 0.75 a� 0.80

z Response sheets z Try any suggestion for child activity:

responsesÐyes, no

For both questions, add the response: no, but plan to

z Try any recipe: responsesÐyes, no

Page 9: Document2

in the osteoporosis lessons. In sum, those in group-delivery

received the modi®ed group meeting and osteoporosis

lessons; those in impersonal-delivery only the osteoporosis

lessons, and the controls only the cancer lessons.

We divided the 24 worksites into ®ve blocks re¯ecting

relative pay scale and type of worker. These were assigned

purposefully to the three intervention groups such that there

was an equal representation of all ®ve blocks in the two

experimental groups while the controls lacked representa-

tion from one of two lower pay blocks. Three companies

withdrew prior to recruitment.

All participants completed pre-test instruments at a meet-

ing 1 week prior to receiving the ®rst lesson. The post-test

KAB instrument, distributed with the last lesson, was

collected at a concluding meeting 2 weeks later. Three

months later the ®nal instruments were distributed to all

participants by the agent or by mail using a modi®ed Dill-

man Method.

5.4. Evaluating revised program implementation

To assess demographics, we included questions in the

pre-test instrument of all three intervention groups. To

assess attention to, and use of the lessons, we included a

response sheet in each lesson for the two experimental

groups only. We added a third possible response (no, but I

plan to) to questions about children's activities or recipes to

capture behavioral intention.

5.5. Evaluating revised program impact

As in the formative evaluation, we hypothesized that

those in group-delivery would exhibit greater changes

than those in impersonal-delivery in attitude and behavior

scores, exercise pattern, and calcium intake. In addition, we

hypothesized that: (a) both experimental groups would exhi-

bit greater changes in knowledge than controls and (b) those

in impersonal-delivery would exhibit greater changes than

controls in attitude and behavior scores, exercise pattern,

and calcium intake.

Based on formative participant comments, we shortened

the KAB scales to reduce participant burden, hoping to

improve completion. Using the formative instruments

completed by the 677 women registrants for stage one

(mean age 43.34 ^ 11.58), we used internal consistency

testing to eliminate less discriminating items, producing

the scales in Table 1. Items retained in the KAB instrument

(76% of the original) represented all content areas in the

formative, improving as for two scales. However, no new

questions were added. Both the question about exercise

regularity and the FFQ were not changed for the summative

evaluation.

5.6. Summative evaluation data analyses

In contrast to the formative, implementation data of those

completing all four response sheets were used, providing a

more accurate assessment of the responses of those complet-

ing the program.

Impact analysis methods were similar to those used in the

formative analyses with these modi®cations: (a) we used

only data of those completing all three KAB or FFQ instru-

ments; (b) we allowed up to two missing answers on the

knowledge scale; (c) we tested the signi®cance of continu-

ous and ordinal data among the three delivery groups at

three time points (T0, T1, T2) and (d) age served as the

covariate for ANOVA and ANCOVA. We determined

statistically signi®cant differences among values at time

points using pair-wise tests of differences between least-

squares means. A Bonferoni adjustment was used to control

the overall error rate. Signi®cance was assumed at #0.05.

Finally, we compared categorical and continuous demo-

graphic characteristics (mean age and education) between

the formative and the summative evaluation completers

using x2 analysis and non-directional t-tests.

6. Summative evaluation ®ndings and comparison withthe formative

First, we examine the implementation ®ndings. Then we

examine impacts over time comparing the results to the

control, looking at differences between the two delivery

methods. In each instance we compare the summative ®nd-

ings with those of the formative.

6.1. Program implementation

6.1.1. Goal (a): Target audience

Completion rates were better and participant demo-

graphics were closer to those desired in the summative

compared to the formative. In the summative, 70% of initial

registrants completed all three KAB measures. Almost 90%

completed the KAB instruments at T0 and T1, in contrast to

56% in the formative. Eighty percent completed both FFQ

instruments in the summative compared to 51% in the

formative. Table 2 lists the demographics of completers in

both evaluations, ®nding them similar in family income,

race, marital status, awareness of relatives with osteoporo-

sis, initial exercise pattern, and calcium intake per 1000

calories. Those completing the summative evaluation were

signi®cantly more likely than those in the formative

however, to be younger, have only a high school education,

work full time, and have at least one child at home.

In the summative, the two experimental groups and

controls differed signi®cantly in only age (data not

shown). Those in group-delivery had a mean age of

39.1 ^ 9.95 compared to 40.6 ^ 10.10 in impersonal-

delivery and 40.9 ^ 9.61 in the control.

6.1.2. Goal (b): Participant's attention to, and use of,

lessons

In the summative evaluation, 78% of group-delivery and

63% of impersonal-delivery registrants returned all four

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 137

Page 10: Document2

response sheets, providing the sample for analysis.

However, formative and summative return rates are not

comparable because we did not restrict that sample to

those completing all four.

In both evaluations, we asked participants if they read all,

parts of, skimmed, or did not read each lesson. Similar to the

formative, those reading the whole lesson declined to 65±

70% by lesson four. Unlike the formative where there was

no difference between delivery groups in percent reading the

lessons, in the summative, signi®cantly more in group-

delivery skimmed and less read lessons one and two than

in impersonal-delivery (Fig. 2).

Completion rates of the worksheets did not differ between

evaluations with one exception. More completed worksheet

two, a revised exercise diary, in the summative than in the

formative (40 vs 28%, respectively).

In both evaluations, respondents were asked how easy

to read and how useful each lesson was. Respondents in

both evaluations provided nearly identical ratings of ease

of reading regardless of delivery method, suggesting the

lower reading level of the summative materials made it

easier for the less educated participants. Respondents in

both evaluations provided nearly identical ratings of

perceived usefulness for lessons one and two; however,

in the summative, group-delivery participants rated

lessons three and four, highlighting canned salmon and

tofu respectively, signi®cantly more useful than did

impersonal-delivery participants, whereas in the forma-

tive, the ratings were identical.

Questions about materials use revealed that the summa-

tive results differed from the formative, regardless of the

delivery method, in that:

² 3±10% tried a recipe (data not shown) compared to 10±

37% in the formative. Yet, in the summative, two thirds

or more in both delivery groups indicated they planned to

try a recipe, an option not available in the formative.

² 11±21% involved children in food preparation compared

to 2±11% in the formative. Additionally, in contrast to

the formative, the summative exposed differences among

delivery groups, in that the percent sharing a recipe with

friends was signi®cantly greater in group-delivery than

impersonal-delivery for lessons two (20 and 6% respec-

tively) and four (24 and 10%) while no signi®cant differ-

ences between delivery methods was evident in the

formative (Fig. 3). Lessons two and four featured less

familiar calcium-rich foods.

² Consistently more in group-delivery shared other

lesson materials with friends than in impersonal-delivery

(16±22% vs 9±11%). This distinction was not seen in the

formative where about 15% in both delivery methods

shared materials (data not shown).

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143138

Table 2

Demographic characteristics of those completing each evaluation phase. SD� standard deviation; yr.� years

Variable Formative Summative

Mean age ^ SDa N� 275, 43.02 ^ 11.12 N� 247, 40.15 ^ 9.88

Family income N� 255 N� 232

# 10±19.9000 41 (16.1%) 45 (19.4%)

20±34.9000 92 (36.1%) 83 (35.8%)

35±50,000 1 122 (47.8%) 104 (44.8%)

Employment statusa N� 272 N� 245

% full time 235 (86.4%) 232 (94.7%)

Mean educational level ^ SD (yr.)b N� 268, 13.40 ^ 1.97 N� 246, 12.70 ^ 1.54

Race N� 272 N� 246

% white 260 (95.6%) 239 (97.2%)

Marital status N� 272 N� 246

Married 200 (73.5%) 175 (71.1%)

Single 40 (14.7%) 27 (11.0%)

Other 32 (11.7%) 44 (17.9%)

Percent with at least one child at homeb N� 274, 129 (47.1%) N� 247, 163 (66.0%)

Relatives with osteoporosis N� 246 N� 247

Yes 49 (19.9%) 35 (14.2%)

No 167 (67.9%) 166 (67.2%)

Don't know 30 (12.2%) 46 (18.6%)

Exercise regularly in past 6 months? N� 269 N� 246

Yes 94 (34.9%) 90 (36.6%)

No 175 (65.1%) 156 (63.4%)

N� 248 N� 244

Calories (mean ^ SD) 1623.8 ^ 584.4 1798.0 ^ 711.2

Calcium in mg 805.4 ^ 415.5 895.3 ^ 549.2

Calcium in mg/1000 calories 497.1 ^ 178.9 490.2 ^ 206.3

a p , 0.01.b p , 0.001.

Page 11: Document2

6.2. Program impact

6.2.1. Change in knowledge

As expected, our hypothesis that both group-delivery and

impersonal-delivery of osteoporosis lessons would lead to

greater knowledge gain than in controls was supported

(Fig. 4). Indeed, the gain in knowledge in group-delivery

was signi®cantly greater than that in impersonal-delivery

and both were signi®cantly greater than that of the

controls at both T1 and T2. This was not seen in the formative

evaluation.

6.2.2. Change in attitude

Our hypothesis that those in the experimental groups

would show greater gains in attitude than controls was not

supported in the summative evaluation (Fig. 5). Our hypoth-

esis that gains in attitude would differ between delivery

methods was not supported either. The initial mean attitude

scores in the summative were signi®cantly higher than those

in the formative (3.9±4.0 vs 3.0±3.1, respectively), perhaps

due to increased media focus on osteoporosis or to differ-

ences in participants, and could have limited our ability to

improve attitudes and thus to detect signi®cant changes. We

did not detect the in¯uence of any previous worksite educa-

tional activities, however.

6.2.3. Change in behavior

Our hypothesis that experimental groups would show

greater gains in behavior than controls was partially

supported in the summative (Fig. 6). Gains in behavior for

group-delivery were signi®cantly greater than the control at

T2. Administering the behavior scale 4 months after the

intervention (T2) identi®ed further changes in participant

behavior in group-delivery that were not seen in controls,

even when controlling for age, a possible explanation. Our

hypothesis that those in group-delivery would show greater

gains in food-related behavior than those in the impersonal-

delivery was not supported in the summative evaluation.

6.2.4. Changes in calcium intake and exercise pattern

Our hypothesis that those in experimental groups would

show greater gains in calcium intake than controls was not

supported in the summative. Our hypothesis that those in

group-delivery would show greater gains than those in

impersonal-delivery was also not supported. The summative

delivery methods did not have a greater differential impact

than those of the formative.

Our hypothesis that experimental groups would report

greater change in exercise patterns than controls was

partially supported (data not shown). In the summative

evaluation, the number at T2 who reported that exercising

regularly was a new pattern was signi®cantly greater in

group-delivery than in controls. Our hypothesis that exer-

cise patterns would differ between delivery methods was not

supported. The summative delivery methods did not have a

greater impact than those of the formative although the

difference was in the right direction.

7. Discussion

The purpose of this study was to test whether the changes

made in a program as a result of a formative evaluation

strengthened the implementation and impact of the program

in the summative. We found that implementation improved

but only certain measures of impact improved enough to

distinguish the effects of each delivery method. As a result

we suggest the following for evaluators to consider in

designing a formative evaluation and in attempting to assess

its effectiveness.

7.1. Improving the design of a formative evaluation

7.1.1. Interpreting (focus group) feedback from participants

As a result of participants' comments in the formative

focus groups, we extensively altered the lessons for the

summative and this did lead to greater use of a formerly

underused worksheet, continued ease of reading with parti-

cipants with lower educational levels, and more involve-

ment of children in food preparation.

Focus group participants wanted less motivational discus-

sion and more lecture in the meetings. In response, we

increased lecture time in the group-delivery method. The

in¯uence of the group-delivery method was underscored

because group-delivery participants were more likely to

skim than to read the lessons. Even with less reliance on

reading, group-delivery participants found two lessons

signi®cantly more useful and were more likely to share

lesson materials with others than those in impersonal-deliv-

ery. The agent presentation in group-delivery was clearly

critical to sell the lessons. In this case, following participant

recommendations appeared to improve impact on knowl-

edge gained, but not in most areas of behavior change.

When the focus group participants indicated they did not

like the group-delivery motivational discussion±decision

session, they also indicated a preferred alternative. Given

the extensive literature on assessing participant perspectives

(Basch,1987), we assumed we needed to alter the delivery

method to one they liked (especially since the agents echoed

this preference) and ultimately emphasized lecture over

motivation in the summative. In hindsight, we should

have asked the focus group participants how to change the

motivational aspects of the session with its emphasis on

behavior modi®cation, rather than abandon this based on

their negative feedback.

A researcher using focus groups should not assume that if

participants express dislike of a particular delivery method

and suggest another, that one should drop the original

method without considering the down stream effects. Be

prepared to probe to learn why it was disliked and how it

might be modi®ed, especially if participants are suggesting

a more passive path to learning. By making the assumption

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 139

Page 12: Document2

participants were right, we limited focus group inquiry and

the type of information collected for program planning in

the summative.

We revised the recipes, believing these could be used as

the context to show participants how to use calcium-rich

foods and as a device to facilitate behavior change. In the

focus groups, we listened openly to complaints about the

recipes and asked participants how to improve them. Some

participants indicated they came to the program for the

recipes but disliked those provided. Despite these partici-

pant-guided revisions, reported use of recipes did not

improve in the summative, although a large number indi-

cated they planned to.

In the formative, we assumed that when a component of

the program i.e., the recipes, was poorly received by parti-

cipants that this merely needed improvement and thus we

speci®cally asked participants for suggestions. In hindsight,

the question we should have asked ®rst, given the

complaints, was one that tested our assumption that partici-

pants valued recipes, i.e. should recipes be included in the

lessons at all? And, if the answer was no, we should have

been ready to discuss with focus group participants, alter-

native devices to motivate behavior change.

Researchers designing formative evaluations need to be

alert for such a methodological inconsistency: why did

participants give us suggestions to improve the recipes

when, in fact, many had not used these (10±30% in the

formative). This might be explained by the inclination of

people to provide answers to questions when they are asked.

People have a tendency to tell more than they can know

(Nisbett & Wilson, 1977).

In summary, researchers using focus groups must be

prepared to probe assertions by participants that some

component of a program is unsatisfactory. Probing should

investigate both how that component might be improved

and retained as well as what might be substituted and

why. In particular, asking a question about the fundamental

usefulness of some component of a program in a formative

evaluation may not be easy for researchers as they may have

considerable ownership and resources invested in that

component. However, when faced with negative feedback

about a component of the initial program, researchers

should investigate both options to insure suf®cient data for

an informed decision about summative activities.

7.1.2. Assessing barriers to changing behaviors

Because participants came to this program on calcium-

rich foods, we assumed that they would be open to change

and that testing the suggested foods at family meals would

be acceptable. Some focus group participants indicated it

was dif®cult to introduce these foods because of family

member aversion to change. On hearing this, we assumed

that altered recipes would be suf®cient to overcome opposi-

tion and explored this in all focus groups. By not making a

conscious effort to uncover social barriers to changing food

choices for families, we limited the information we gained

to plan the summative. In hindsight, we should have taken a

proactive stance once these unanticipated barriers surfaced

in the initial focus group, and posed questions to later focus

groups based on the comments offered in previous ones.

7.1.3. Interpreting feedback from program instructors

As a result of the formative, we allowed instructors to

change the presentation in the group delivery method,

believing they would have greater ownership and thus

impact, if they designed a presentation with which they

were more comfortable. The remedy the instructors devel-

oped, a lecture rather than a motivational discussion, led to

an emphasis on knowledge rather than on behavior change

in the presentation. This may partially explain why the two

delivery methods did not differ signi®cantly in ®nal beha-

vior scale scores. And it may also explain the signi®cantly

greater gain in knowledge in the group- than the impersonal-

delivery.

The formative evaluation provided feedback about the

presentation. We assumed that when a component of the

program i.e., the presentation, was unacceptable to instruc-

tors that it should be changed and we asked for suggestions.

The acceptable change did not lead to greater impact.

Although instructor suggestions carry great face validity,

researchers need to be wary because instructors may provide

suggestions that shift the aims of the program. In hindsight,

what we should have done was to propose to the instructors,

alternative presentation methods that retained an emphasis

on behavior change. If none were acceptable, we should

have queried our assumption that agents were the appropri-

ate instructors for the group presentation.

7.1.4. Incorporating a control group

We included impact measures in the formative to gain a

quantitative estimate of the effects of the initial program but

we did not include a control group because we expected the

contrast between group- and impersonal-delivery to be

robust. Due to instructor dif®culties with the stage one

group-delivery, this contrast did not materialize. Although

we had evidence that participants were learning from the

print materials in both delivery methods, we could not

demonstrate that the change seen was better than with no

intervention. Hence we suggest that evaluators include a

control group when testing the impact of a new program

offered via different delivery methods in a formative

evaluation.

7.1.5. Watching for serendipitous effects of formative

evaluation

Including impact measures lead to unexpected participant

feedback about the instruments and their mode of adminis-

tration. This feedback proved invaluable in improving

program implementation in the summative, underscoring

the usefulness of impact measures in formative evaluations.

In hindsight, we recommend including these measures in the

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143140

Page 13: Document2

formative to assess impact as well as to obtain feedback

about the instruments and their implementation.

7.2. Assessing the effectiveness of the formative evaluation

The need exists to demonstrate the subsequent effects of

formative evaluation in order to improve formative evalua-

tion design. Our study provides some guidance.

7.2.1. Altering the evaluation design

As a result of the formative, we added a control group and a

3-month post-intervention measurement in the summative.

Without either of these, we would not have been able to

observe some signi®cant differences between the two delivery

methods in the summative. Three months after the interven-

tion, group delivery produced signi®cantly greater changes in

behavior scores and exercise pattern than seen in the controls

while the impersonal-delivery method did not. This design

change also revealed that the group method lead to signi®-

cantly greater knowledge gain than the impersonal method

and both gains were greater than that of controls. Clearly our

target audience was not likely to adopt behavior changes based

on receiving just the printed materials.

However bene®cial these changes in evaluation design

were in illuminating important ®ndings about the impact

of the ®nal program, implementing them only in stage two

prevented a rigorous comparison between formative and

summative evaluations which might have allowed us to

see more clearly, the effect of the formative evaluation.

Lack of the T2 measure in the formative meant we could

not be sure what effect the initial program had 3 months later

and lack of a control group meant we could not fully inter-

pret the formative impact data. In future, we recommend

that if researchers want to assess the effectiveness of a

formative evaluation, that the design elements of formative

and summative steps be parallel. These design details may

seem more appropriate for a summative evaluation but they

are necessary to see the effects of the formative.

7.2.2. Altering the evaluation instruments

As a result of the formative, we shortened the KAB ques-

tionnaire somewhat to address participant complaints in the

summative. We believe this contributed to more participants

completing all the evaluation instruments in stage two. Thus

implementation improved in the summative. However, this

improvement carried a price. Subsequently, we were not

able to conduct the most rigorous comparison of the forma-

tive KAB results to those of the summative.

8. Conclusions

The literature on formative evaluation focuses on its

conceptual framework, its methodology and use. Permeat-

ing this work is a consensus that a program will be strength-

ened as a result of a formative evaluation although little

empirical evidence exists in the literature to demonstrate

the subsequent effects of formative evaluation on a program.

This study begins to ®ll that gap.

This study demonstrates that formative evaluation can

strengthen the implementation and impact of an educational

intervention designed to compare the impact of two program

delivery methods. The modi®cations made as a result of the

formative appear to have signi®cantly improved knowledge

gained but resulted in only modest improvements in beha-

viors in the ®nal program.

Our retrospective analysis of our experience supports the

inclusion of a control group and impact measures at several

time points in a formative evaluation of a program imple-

mented in a new environment for agency personnel. The

®ndings also suggest that when researchers are faced with

negative feedback about the components of the program in a

formative evaluation, they need to exercise care in interpret-

ing feedback and in revising those components.

In addition, the need to gather evidence of the subsequent

effect of using data from formative evaluations is critical.

Otherwise we cannot begin to examine whether the methods

and processes we take for granted in the formative evalua-

tion are valid and appropriate. However, evaluators must

carefully plan the evaluations making sure that instruments

and evaluation design are parallel in order to carry out these

comparisons. We offer our ®ndings as stimulus to consider

such studies.

References

Baggaley, J. (1986). Formative evaluation of educational television. Cana-

dian Journal of Educational Communication, 15 (1), 29±43.

Baghdadi, A. A. (1981). A comparison between two formative evaluation

methods. Dissertation Abstracts International, 41 (8), 3387A.

Baker, E. L., & Alkin, M. C. (1973). Formative evaluation of instructional

development. AV Communication Review, 21 (4), 389±418.

Basch, C. E. (1987). Focus group interview: An underutilized research

technique for improving theory and practice in health education. Health

Education Quarterly, 14 (4), 411±448.

Bertrand, J. (1978). Communications pretesting (Media Monograph No.

Six). Chicago: University of Chicago, Community and Family Study

Center.

Brown, J. L., & Griebler, R. (1993). Reliability of a short and long

version of the Block food frequency from for assessing changes in

calcium intake. Journal of the American Dietetic Association, 93

(7), 784±789.

Cambre, M. (1981). Historical overview of formative evaluation of instruc-

tional media products. Educational Communication & Technology

Journal, 29 (1), 3±25.

Cardinal, B. J., & Sachs, M. L. (1995). Prospective analysis of stage-of-

exercise movement following mail-delivered, self-instructional exer-

cise packets. American Journal of Health Promotion, 9 (6), 430±432.

Chambers, F. (1994). Removing confusion about formative and summative

evaluation: Purpose versus time. Evaluation and Program Planning, 17

(1), 9±12.

Chen, H. T. (1996). A comprehensive typology for program evaluation.

Evaluation Practice, 17 (2), 121±130.

Chen, C. H., & Brown, S. W. (1994). The impact of feedback during

interactive video instruction. International Journal of Instructional

Media, 21 (3), 191±197.

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 141

Page 14: Document2

Dehar, M. A., Casswell, S., & Duignan, P. (1993). Formative and process

evaluation of health promotion and disease prevention programs.

Evaluation Review Journal, 17 (2), 204±220.

Dick, W. (1980). Formative evaluation in instructional development. Jour-

nal of Instructional Development, 3 (3), 2±6.

Dick, W., & Carey, L. (1985). The systematic design of instruction, (2nd ed)

Glenview, IL: Scott, Foresman.

Dillman, D. A. (1978). Mail and telephone surveys: The total design

method, New York: John Wiley and Sons.

Dennis, M. L., Fetterman, D. M., & Sechrest, L. (1994). Integrating quali-

tative and quantitative evaluation methods in substance abuse research.

Evaluation and Program Planning, 17 (4), 419±427.

Fairweather, G., & Tornatzky, L. G. (1977). Experimental methods for

social policy research, New York: Pergamon.

Finnegan Jr, J. R., Rooney, B., Viswanath, K., Elmer, P., Graves, K.,

Baxter, J., Hertog, J., Mullis, R., & Potter, J. (1992). Process evaluation

of a home-based program to reduce diet-related cancer risk. The `WIN

at Home Series'. Health Education Quarterly, 19 (2), 233±248.

Fitz-Gibbon, C. T., & Morris, L. L. (1978). How to design a program

evaluation, Beverly Hills, CA: Sage Publications.

Flagg, B. N. (1990). Formative evaluation for educational technologies,

Hillsdale, NJ: Lawrence Erlbaum Associates.

Flay, B. R. (1986). Ef®cacy and effectiveness trials (and other phases of

research) in the development of health promotion programs. Preventa-

tive Medicine, 15, 451±474.

Foshee, V., McLeroy, K. R., Sumner, S. K., & Bibeau, D. L. (1986).

Evaluation of worksite weight loss programs: A review of data and

issues. Journal of Nutrition Education, 18 (1), S38±S43.

Geis, G. L. (1987). Formative evaluation: Developmental testing and expert

review. Performance & Instruction, May/June, 1±8.

Gillespie, A., & Achterberg, C. (1989). Comparison of family interaction

patterns related to food and nutrition. Journal of the American Dietetic

Association, 89 (4), 509±512.

Glanz, K., & Seewald-Klein, T. (1986). Nutrition at the worksite: An over-

view. Journal of Nutrition Education, 18 (1), S1±S12.

Glanz, K., Sorensen, G., & Farmer, A. (1996). The health impact of work-

site nutrition and cholesterol intervention programs. American Journal

of Health Promotion, 10 (6), 453±470.

Hausman, J. A., & Wise, D. A. (1985). Social experimentation, Chicago:

The University of Chicago Press.

Hill, M., May, J., Coppolo, D., & Jenkins, P. (1993). Long term effective-

ness of a respiratory awareness program for farmers. National Institute

for Farm Safety, Inc. NIFS Paper No. 93-3. Columbia, MO. NIFS

Summer Meeting, Coeur d'Alene, Idaho.

Health Habits And History Questionnaire: Diet History And Other Risk

Factors (1989). Personal computer system packet. Version 2.2.

Washington, D.C.: National Cancer Institute, Division of Cancer

Prevention and Control, National Institutes of Health.

Houts, S. A. (1988). Lactose intolerance. Food Technology, 42 (3), 110±

113.

Iszler, J., Crockett, S., Lytle, L., Elmer, P., Finnegan, J., Luepker, R., &

Laing, B. (1995). Formative evaluation for planning a nutrition inter-

vention: Results from focus groups. Journal of Nutrition Education, 27

(3), 127±132.

Jacobs Jr, D. R., Luepker, R. V., Mittelmark, M. B., Folsom, A. R., Pirie, P.

L., Mascioli, S. R., Hannan, P. J., Pechacek, T. F., Bracht, N. F., Carlaw,

R. W., Kline, F. G., & Blackburn, H. (1986). Community-wide

prevention strategies: Evaluation design of the Minnesota heart health

program. Journal of Chronic Disease, 39 (10), 775±788.

Janz, N. K., & Becker, M. H. (1984). The health belief model: A decade

later. Health Education Quarterly, 11 (1), 1±47.

Johnson, C. C., Osganian, S. K., Budman, S. B., Lytle, L. A., Barrera, E. P.,

Bonura, S. R., Wu, M. C., & Nader, P. R. (1994). CATCH: Family

process evaluation in a multicenter trial. Health Education Quarterly,

Supplement, 2, S91±S106.

Kandaswamy, S., Stolovitch, H. D., & Thiagarajan, S. (1976). Learner

veri®cation and revision: An experimental comparison of two methods.

AV Communication Review, 24 (3), 316±328.

Kaufman, R. (1980). A formative evaluation of formative evaluation:

The state of the art concept. Journal of Instructional Development, 3

(3), 1±2.

Kershaw, D., & Fair, J. (1976). The New Jersey income maintenance

experiment, New York: Academic Press.

Kishchuk, N., Peters, C., Towers, A. M., Sylvestre, M., Bourgault, C., &

Richard, L. (1994). Formative and effectiveness evaluation of a work-

site program promoting healthy alcohol consumption. American Jour-

nal of Health Promotion, 8 (5), 353±362.

Krippendorff, K. (1980). Content analysis: An introduction to its methodol-

ogy, Sage: Beverly Hills, CA.

Lenihan, K. (1976). Opening the second gate, Washington, DC: U.S.

Government Printing Services.

Lewin, K. (1943). Forces behind food habits and methods of change. In

The Problem of Changing Food Habits. National Research Council

Bulletin 108. (pp. 35±65). Washington, D.C.: National Academy of

Sciences.

Markle, S. M. (1979). Evaluating instructional programs: How much is

enough? NSPI Journal, Feb, 22±24.

Markle, S. M. (1989). The ancient history of formative evaluation. Perfor-

mance and Instruction, Aug, 27±29.

McGraw, S. A., McKinley, S. A., McClements, L., Lasater, T. M., Assaf,

A., & Carleton, R. A. (1989). Methods in program evaluation: The

process evaluation system of the Pawtucket Heart Health Program.

Evaluation Review, 13 (5), 459±483.

McGraw, S. A., Stone, E. J., Osganian, S. K., Elder, J. P., Johnson, C. C.,

Parcel, G. S., Webber, L. S., & Luepker, R. V. (1994). Design of

process evaluation within the child and adolescent trial for cardio-

vascular health (CATCH). Health Education Quarterly, Supplement,

2, S5±S26.

Montague, W. E., Ellis, J. A., & Wulfeck, W. H. (1983). Instructional

quality inventory: A formative evaluation tool for instructional devel-

opment. Performance and Instruction Journal, 22 (5), 11±14.

Nathenson, M. B., & Henderson, E. S. (1980). Using student feedback to

improve learning materials, London: Croom Helm.

National Institutes of Health (1985). Surgeon general's report on nutrition

and health. U.S. Department of Health and Human Services, Public

Health Service (Chapter 7, pp. 311±343). Washington, DC: U.S.

Government Printing Service.

Nisbett, R. E. & Wilson, T. D. (1977). Tellimg more than we can know:

Verbal reports on mental processes. Psychological Review, 84(3), May.

Parrott, R., Steiner, C., & Godenhar, L. (1996). Georgia's harvesting

healthy habits: A formative evaluation. The Journal of Rural Health,

12 (4), 291±300.

Patton, M. Q. (1978). UtilizationÐfocused evaluation, Beverly Hills, CA:

Sage.

Patton, M. Q. (1982). Practical evaluation, Beverly Hills, CA: Sage.

Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice, 15

(3), 311±319.

Patton, M. Q. (1996). A world larger than formative and summative.

Evaluation Practice, 17 (2), 131±144.

Pelletier, K. R. (1996). A review and analysis of the health and cost-effec-

tive outcome studies of comprehensive health promotion and disease

prevention programs at the worksite: 1993±1995 update. American

Journal of Health Promotion, 10 (5), 380±388.

Pelz, E. B. (1959). Some factors in group decision. In E. E. Macoby, T. M.

Newcomb & E. L. Hartley, Readings in social psychology (3rd ed).

(pp. 212±219). New York: Holt, Rinehart and Winston, Inc.

Peterson, K. A., & Bickman, L. (1988). Program personnel: The missing

ingredient in describing the program environment. In J. Kendon,

Conrad Roberts-Gray & Cynthia Roberts-Gray, Evaluating program

environments, San Francisco, CA: Jossey-Bass, Inc.

Potter, J. D., Graves, K. L., Finnegan, J. R., Mullis, R. M., Baxter, J. S.,

Crockett, S., Elmer, P. J., Gloeb, B. D., Hall, N. J., Hertog, J., Pirie, P.,

Richardson, S. L., Rooney, B., Slavin, J., Snyder, M. P., Splett, P., &

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143142

Page 15: Document2

Viswanath, K. (1990). The cancer and diet intervention project: a

community-based intervention to reduce nutrition-related risk of

cancer. Health Education Research, 5 (4), 489±503.

Rightwriter (1990). Version 3.1. Sarasota, FL: RightSoft, Inc.

Robins, P. K., Spiegelman, R. G., Weiner, S., & Bell, J. G. (1980). A

guaranteed annual income: Evidence from a social experiment, New

York: Academic Press.

Rossi, P. H., & Lyall, K. (1976). Reforming public welfare, New York:

Russell Sage.

Rossi, P. H., & Freeman, H. E. (1982). Evaluation: A systematic approach

(p. 69). Beverly Hills, CA: Sage Publications.

Russell, J. D., & Blake, B. L. (1988). Formative and summative evaluation

of instructional products and learners. Educational Technology, 28 (9),

22±28.

SAS Proprietary Software Release 6.09. (1989). Cary, N.C.: SAS Institute,

Inc.

Scanlon, E. (1981). Evaluating the effectiveness of distance learning: A

case study. In F. Percival & H. Ellington, Aspects of educational tech-

nology: Vol. XV: Distance learning and evaluation (pp. 164±171).

London: Kogan Page.

Scheirer, M. A. (1994). Designing and using process evaluation. In J. S.

Wholey, H. Hatry & K. Newcomer, Handbook of practical program

evaluation (pp. 40±68). San Francisco: Jossey-Bass.

Scheirer, M. A., & Rezmovic, E. L. (1983). Measuring the degree of

program implementation. Evaluation Review, 7 (5), 599±633.

Schneider, M. L., Ituarte, P., & Stokols, D. (1993). Evaluation of a commu-

nity bicycle helmet promotion campaign: What works and why. Amer-

ican Journal of Health Promotion, 7 (4), 281±287.

Scriven, M. (1967). The methodology of evaluation. In R. Tyler, R. Gagne

& M. Scriven, Perspectives of curriculum evaluation (pp. 39±83).

Chicago: Rand McNally.

Seidel, R. E. (1993). Notes from the ®eld in communication for child survi-

val, Washington, DC: USAID.

Stuf̄ ebeam, D. L. (1983). The CIPP model for program evaluation. In G.

Madaus, M. Scriven & D. Stuf̄ ebeam, Evaluation models: Viewpoints

on educational and human services evaluationBoston: Kluwer-Nijhoff.

Tessmer, M. (1993). Planning and conducting formative evaluations,

London: Kogan Page.

Thiagarajan, S. (1991). Formative evaluation in performance technology.

Performance Improvement Quarterly, 4 (2), 22±34.

Wager, J. C. (1983). One-to-one and small group formative evaluation: An

examination of two basic formative evaluation procedures. Perfor-

mance and Instruction, 22 (5), 5±7.

Walden, O. (1989). The relationship of dietary and supplemental calcium

intake to bone loss and osteoporosis. Journal of the American Dietetic

Association, 89 (3), 397±400.

Weston, C. B. (1986). Formative evaluation of instructional materials: An

overview of approaches. Canadian Journal of Educational Communi-

cation, 15 (1), 5±17.

Weston, C. B. (1987). The importance of involving experts and learners in

formative evaluation. Canadian Journal of Educational Communica-

tions, 16 (1), 45±58.

Wilkinson, T. L., Schuler, R. T., & Skjolaas, C. A. (1993). The effect of

safety training and experience of youth tractor operators. National Insti-

tute for Farm Safety, Inc. NIFS Paper No. 93±6. Columbia, MO. NIFS

Summer Meeting, Coeur d'Alene, Idaho.

Witte, K., Peterson, T. R., Vallabhan, S., Stephenson, M. T., Plugge, C. D.,

Givens, V. K., Todd, J. D., Bechtold, M. G., Hyde, M. K., & Jarrett, R.

(1992/3). Preventing tractor-related injuries and deaths in rural popula-

tions: Using a persuasive health message framework in formative

evaluation research. International Quarterly of Community Health

Education, 13 (3), 219±251.

J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 143