document resume ed 366 511 author stevens ...document resume se 054 163 stevens, floraline; and...

111
ED 366 511 AUTHOR TITLE INSTITUTION REPORT NO PUB DATE NOTE AVAILABLE FROM PUB TYPE EDRS PRICE DESCRIPTORS IDENTIFIERS ABSTRACT DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering, and Technology Education. National Science Foundation, Washington;l0C. Directorate for Education and Human Resources. NSF-93-152 31 Aug 93 111p. National Science Foundation, 4201 Wilson Boulevard, Arlington, VA 22230. Reports Research/Technical (143) MFOI/PC05 Plus Postage. Classroom Research; *Educational Research; Elementary Secondary Education; Engineering Education; *Evaluation Methods; Mathematics Education; *Program Evaluation; Science Education; Technology Education Science Education Research This handbook was developed to provide principal investigators and project evaluators with a basic understanding of selected approaches to evaluation. It ics aimed at people who need to learn more about both what evaluation can do and how to do an evaluation, rather than those who already have a solid base of experience in the field. It builds on firmly established principles, blending technical knowledge and common sense to meet the special needs of National Science Foundation programs and projects. Chapters include: (1) Evaluation Prototypes; (2) The Evaluation Process--An Overview; (3) Design, Data Collection, and Data Analysis; (4) Reporting; (5) Examples; (6) Selecting an Evaluator; (7) Glossary; and (8) Annotated Bibliography. (PR) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. ***********************************************************************

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

ED 366 511

AUTHORTITLE

INSTITUTION

REPORT NOPUB DATENOTEAVAILABLE FROM

PUB TYPE

EDRS PRICEDESCRIPTORS

IDENTIFIERS

ABSTRACT

DOCUMENT RESUME

SE 054 163

Stevens, Floraline; And OthersUser-Friendly Handbook for Project Evaluation:Science, Mathematics, Engineering, and TechnologyEducation.National Science Foundation, Washington;l0C.Directorate for Education and Human Resources.NSF-93-15231 Aug 93111p.National Science Foundation, 4201 Wilson Boulevard,Arlington, VA 22230.Reports Research/Technical (143)

MFOI/PC05 Plus Postage.Classroom Research; *Educational Research; ElementarySecondary Education; Engineering Education;*Evaluation Methods; Mathematics Education; *ProgramEvaluation; Science Education; TechnologyEducationScience Education Research

This handbook was developed to provide principalinvestigators and project evaluators with a basic understanding ofselected approaches to evaluation. It ics aimed at people who need tolearn more about both what evaluation can do and how to do anevaluation, rather than those who already have a solid base ofexperience in the field. It builds on firmly established principles,blending technical knowledge and common sense to meet the specialneeds of National Science Foundation programs and projects. Chaptersinclude: (1) Evaluation Prototypes; (2) The Evaluation Process--AnOverview; (3) Design, Data Collection, and Data Analysis; (4)Reporting; (5) Examples; (6) Selecting an Evaluator; (7) Glossary;and (8) Annotated Bibliography. (PR)

***********************************************************************

Reproductions supplied by EDRS are the best that can be madefrom the original document.

***********************************************************************

Page 2: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

'OrAIKOSiQEMOOMMOMOrMefilSMIIMOMithOl

'Mr

Directorate for EducNi`kpnand Human Resources

10- So

Ai D A

Division of Research,Evaluation and Dissemination

User-Friendly Handbook4HZ 9.46f5

for Project Evaluation:

Science, Mathematics, Engineering

and Technology Education

BEST COPY AVAILABLE

Co-Authors:

Roraline Stevens

Frances Lawrenz

Oe DEPARTMENT OF EDUCATION Laure Sharpol Edocanonal Rasiaarch and Improyamanl

EDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC)

k rTe ch se vtodc u omo an; .h a sp oh am: nn or or pc. o d au dn Oa

originating it.O Moodipnror cucharinor cluahatryys been mad* to improve

Points ol vow Of opinions stated in this docu-mnl do not ncassarity rspramint officialGERI position or pohcy

Edited by:

Joy Frechtling2

Page 3: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

The Foundation provides awards for research in the sciences andengineering. The awardee is wholly responsible for the conduct of suchresearch and preparation of the results for publication. The Foundation,therefore, does not assume responsibility for the research fmdings or theirinterpretation.

The Foundation welcomes proposals from all qualified scientists andengineers, and strongly encourages women, minorities, and persons withdisabilities to compete fully in any of the research and related programsdescribed here.

In accordance with federal statutes, regulations, and NSF polides, noperson on grounds of race, color, age, sex, national origin, or disabilityshall be excluded from participation in, denied the benefits of, or besubject to discrimination under any program or activity receiving finan-cial assistance from the National Science Foundation.

Facilitation Awards for Scientists and Engineers with Disabilities(FASED) provide funding for special assistance or equipment to enablepersons with disabilities (investigators and other staff. including studentresearch assistants) to work on an NSF project. See the programannouncement or contact the program coordinator at (703) 306-1633.

Privacy Act and Public Burden

Information requested on NSF application materials is solicited under theauthority of the National Science Foundation Act of 1950, as amended.It will be used in connection with the selection of qualified proposals andmay be used and disclosed to qualified reviewers and staff assistants aspart of the review process and to other government agencies. See Systemsof Records, NSF-50, "Principal Investigator/Proposal File and AssociatedRecords," and NSF-51, "Reviewer/Proposals File and Associated Records,"56 Federal Register 54907 (Oct. 23, 1991). Submission of the informationis voluntary. Failure to provide full and complete information, however,may reduce the possibility of your receiving an award.

The public reporting burden for this collection of information is estimatedto average 120 hours per response, including the time for reviewinginstructions. Send comments regarding this burden estimate or anyother aspect of this collection of information, including suggestions forreducing this burden, to Herman G. Fleming, Reports Clearance Officer,Division of CPO, NSF, Arlington, VA 22230: and the Office of Managemerr,and Budget, Paperwork Reduction Project (3145-0058), Wash., D.C.20503.

The National Science Foundation has TID (Telephonic Device for theDeaf) capability, which enables individuals with hearing impairment tocommunicate with the Foundation about NSF programs, employment, orgeneral information. This number is (703) 306-0090.

3

Page 4: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

ACKNOWLEDGMENTS

Appreciation is expressed to the Project Directors,NSF staff, and Advisory Committee members whoreviewed drafts of this document. Specifically wewould like to thank Jack Bookman, Cha Guzman,Kathleen Martin, Mildred Murray- Ward, LynettePadmore, and Sharon Reynolds who reviewed theHandbook from a project perspective; NSF staff mem-bers Susan Gross, Lany Enochs, Conrad Katzenmeyer,and Larry Suter who reviewed it from a Foundationperspective; and members of the Advisory Committeefor Research, Evaluation, and Dissemination, espe-cially Wayne Welch and Nick Smith, who brought animportant outside perspective.

EHR/NSF Evaluation Handbook iii

Page 5: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

USER-FRIENDLY HANDBOOK FOR PROJECT EVALUATION:Science, Mathematics, Engineering and Technology Education

Table of Contents

About the Authors: VII

Introduction: IX-Xl

Chapter One: Evaluation Prototypes 1-14

Chapter Two: The Evaluation ProcessAn Overview 15-30

Chapter Three: Design, Data Collection, and Data Analysis 31-58

Chapter Four: Reporting 59-70

Chapter Five: Examples 71-82

Chapter Six: Selecting an Evaluator 83-86

Chapter Seven: Glossary 87-98

Chapter Eight: Annotated Bibliography 99-103

EHR/NSF Evaluation Handbook v

5

Page 6: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

ABOUT THE AUTHORS

This Handbook is the product of several authors. Originally, it began as two separate papersdescribing various aspects of the evaluation process. Sensing a need for a more completedocument, the National Science Foundation asked that the papers be used as the basis fora Handbook specially tailored for projects supported by the Directorate for Education andHuman Resource Development (EHR). The Handbook has grown a bit beyond the originalpapers and the papers have been somewhat reshaped, but the message and advice containedreflect the solid advice of the NSF staff that originated it. Credit for the Handbook goes to:

Dr. Floraline StevensDr. Stevens is currently a Program Director for Evaluation in the Evaluation Sectionof the Division of Research, Evaluation, and Dissemination (RED). While at NSF sheis on an Interagency Personnel Assignment from the Los Angeles Public Schoolswhere she is Director of Research and Evaluation. Dr. Stevens conceived the idea forthis Handbook and drafted the original versions of Chapters One, Four, and Six, andthe Glossary.

Dr. Frances LawrenzDr. Lawrenz is a Professor at the University of Minnesota and Director of GraduateStudies in the Department of Curriculum and Instruction. While at NSF she was anEvaluation Specialist in RED. She served as a advisor on evaluation to other divi-sions and provided support to the evaluation groups of the Federal CoordinatingCommittee on Science, Engineering, and Technology. Dr. Lawrenz wrote the originalversion of Chapter Two.

Ms. Laura SharpMs. Laure Sharp is a social science researcher, who has published extensively in theareas of survey research, high school and post high school education, and socialprogram evaluation. She was formerly a senior research associate and assistant di-rector at the Bureau of Social Science Research in Washington, D.C. She is cur-rently a consultant to BoozAllen & Hamilton Inc. and is the author of ChapterThree.

Additional writing and editing was done by the staff of Booz.Allen & Hamilton Inc. Dr. JoyFrechtling refined the original chapters and provided overall technical editing.

EHR/NSP Evaluation Handbook vii

6

Page 7: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

INTRODUCTION: ABOUT THIS HANDBOOK

This Haridbook was developed to provide PrincipalInvestigators and Project Evaluators working with theNational Science Foundation's Directorate for Educa-tion and Human Resource Development (EHR) with abasic understanding of selected approaches to evalu-ation. It is aimed at people who need to leam moreabout both what evaluation can do and how to do anevaluation, rather than those who already have a solidbase of experience in the field. It builds on firmlyestablished principles, blending technical knowledgeand common sense to meet the special needs of NSF'sprograms and projects and those involved 'n them.

NSF supports a wide range of programs aimed atimproving the status of mathematics, science, tech-nology, and engineering in our schools and increasingthe participation of students at every level of theeducational system. Each program funds manyprojects, some of which are broad-based and systemicin nature; others which are more specifically focusedon a part or a small number of parts of the educationalsystem. Evaluation is important to each one of these.

NSF and the Principal Investigators themselves needto know what these projects and programs are accom-plishing, and what it is about them that makes themwork or may stand in the way of success. Although theapproaches to evaluation selected may differ depend-ing on the nature of the program or project, its goals,and where it is in its "life cycle," the Foundation firmlybelieves that each program and project can be im-proved by soundly conducted evaluation studies.While the information in this Handbook should beuseful in evaluating programs as well as projects, it isprimarily targeted at project evaluation, which may beconducted by a member of the project staff or by anoutside evaluator.

The Handbook discusses quantitative and qualitativeevaluation methods, but the emphasis is on quantita-tive techniques for conducting outcome evaluations,those designed to assess the results of NSF fundedinnovations and interventions. Although there is muchinterest in the evaluation community in a less tradi-tional and more qualitative approach to evaluation, atthe present time this approach seldom meets NSFrequirements, especially for Summative Evaluations.For activities dependent on federal funding, which are

EHR/NSF Evaluation Handbook ix

7

Page 8: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

INTRODUCTION NSF EVALUATION HANDBOOK

subject to periodic funding decisions by NSF manag-ers as well as the Office of Management and Budget(OMB) and congressional staffs and decision-makers,emphasis is still on quantitatively measurable out-come informatiox did the treatment or innovation(the program funded by the federal agency) result inoutcomes which can be attributed to these federalexpenditures and might not have occurred withoutthese expenditures? As stated in a recent reportissued by GAO (the Government Accounting Office,which is the agency charged with oversight of govern-ment programs for the U.S. Congress):

"Over the next few years, the federal gov-ernment will face powerful opposing pres-

sures: the need on the one hand toreduce the federal deficit, and the de-

mand, on the other, for a federal responseto some potentially expensive domestic

problems. . . (The need is for) program ef-fectiveness evaluations which estimate

the effects of federal programs using sta-tistical analysis of outcomes (such as edu-

cational achievement test scores orconditions of housing) for grout s of per-

sons receiving program services comparedwith similar groups of non participants."

Obviously, decision makers at the highest levels of theexecutive and legislative branches of government amlooking for traditional "effectiveness indicators" al-though, as we have emphasized in the Handbook,these are often very difficult for evaluators to establish.

To develop this Handbook we have drawn on the skillsof both NSF' staff familiar with the Foundation'seducational programs and outside evaluators whohave experienced the challenge of examining projectsin a real-world setting. This Handbook is not intendedto be a theoretical treatise, but rather a practical guideto evaluating NSF/EHR funded projects.

The Handbook addresses several topics. The first fourchapters focus on designing and implementing evalu-ation studies:

Chapter One describes the various types ofevaluation prototypes

Chapter Two presents an overview of theevaluation process

3 EHR/NSF Evaluation Handbook

Page 9: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

NSF EVALUATION HANDBOOK INTRODUCTION

REFERENCE

Chapter Three describes the collection andanalysis of data

Chapter Four examines report writing.

The remaining chapters provide support materials:

Chapter Five contains examples of projectevaluations, illustrating the prototypes describedearlier, and highlighting strengths andweaknesses of the approaches described.

Chapter Six provides information on selecting anevaluator.

Chapter Seven provides a glossary.

Finally (for those who are challenged or intriguedby what they have read here, or feel the need tolearn more), Chapter Eight providessupplemental references in the form of anannotated bibliography.

In addition to evaluation, Project Directors need toplan the dissemination of project outcomes to a broaderaudience. A separate publication dealing with dis-semination guidelines has been prepared by NSF.

Government Accounting Office (1992). Program Evalu-ation Issues. GAO/OCG-93-6TR.

EHR/NSF Evaluation Handbook xl

Page 10: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE: EVALUATION PROTOTYPES

The purpose of this chapter is to help PrincipalInvestigators and Project Evaluators think practicallyabout evaluation and the kinds of information evalu-ations can provide. We start with the assumption thatthe term "evaluation" describes different models orprototypes that suit different purposes at differentstages in the life of a project. A major goal of thischapter is to help Principal Investigators and ProjectEvaluators understand what some of these differentprototypes are and to assist them in using differentapproaches to evaluation to meet these varying needs.

What is Evaluation?

Evaluation means different thingsto different people and takesplace in different contexts.

The notion of evaluation has been around a longtimein fact, the Chinese had a large functionalevaluation system in place for their civil servants aslong ago as 2000 B.C. Not only does the idea ofevaluation have a long history, but it also has varieddefinitions. Evaluation means different things todifferent people and takes place in different contexts.Thus, evaluation can be synonymous with tests,descriptions, documentation, or management. Manydefinitions have been developed, but a comprehensivedefinition is presented by the Joint Committee onStandards ibr Educational Evaluation (1981):

"Systematic investigation of the worth ormerit of an object. . ."

This definition centers on the goal of using evaluationfor a purpose. Evaluations should be conducted foraction-related reasons, and the information providedshould facilitate deciding a course of action.

Over the years evaluation has frequently been viewedas an adversarial process. Its main use has been toprovide "thumbs-up" or "thumbs down" about a pro-gram or project. In this role, it has all too often beenconsidered by program or project Directors as anexternal imposition which is threatening, disruptive,and not very helpful to Project stall. Our contention isthat while this may be true in some situations, this isnot the case in all, nor even in most, evaluation efforts.And. today in contrast to a decade or two ago, the viewis gaining ground that evaluation should be a tool thatnot only measures, but can contribute to, success.

El ik/NSF Evaluation Handbook

io

Page 11: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE EVALUATION PROTOTYPES

What are the Different Kinds of Evaluations?

Within NSF, there are two basic levels of evaluation:Program Evaluation and Project Evaluation. ProjectEvaluation is sometimes further subdivided into spe-cific project components as shown in Exhibit 1.

Exhibit I

Let's start by defining terms and showing how theyrelate, First, let s define what we mean by a "pro-gram" and a "project." A program is a c oordinatedapproach to exploring a specific area related toNSF's mission of strengthening science, mathemat-ics, engineering, and technology. A project is aparticular investigative or developmental activityfunded by that program. NSF initiates a program onthe assumption that a policy goal (for example,strengthening minority student development) canbe attained by certain educational activities andstrategies (for example, exposing students in inner-city schools to science presentations targeted at theinterests and concerns of young African Ameri-cans). The Foundation then funds a series of discreteprojects to explore the utility of these activities andstrategies in specific situations. Thus, a programconsists of a collection of projects that seek to meet.a defined set of goals and objectives.

2 EHR/NSF Evaluation Handbook

1 I.

Page 12: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EVALUATION PROTOTYPES CHAPTER ONE

In NSF, a Program Evaluationdetermines the value saf a

collection of projects. ProjectEvaluation, in contrast, focuses

on the individual projectsfunded under the umbrella

of the program.

Now, let's turn to the terms 'Program" and "ProjectEvaluation." A Program Evaluation determines thevalue of this collection of projects. It looks acrossprojects, examining the utility of the activities andstrategies employed, in light of the initial policy goal.It is carried to completion after the projects havebecome fully operational and adequate time has passedfor expected outcomes to be achieved. Frequently, theinitiation of a Program Evaluation is deferred until 3 to5 years after program initiation. Other times, report-ing may be deferred, while data collection is begunsimultaneously with program onset. Under this latteralternative, the evaluation could draw upon informationcollected on an annual basis that is aggregated acrossprojects and summarized at an appropriate checkpoint. The program evaluator is, in this case, usuallyan experienced, external evaluator, selected by NSF.

Project Evaluation, in contrast, focuses on an indi-vidual project funded under the umbrella of theprogram. The evaluation provides information toimprove the project as it develops and progresses.information is collected to help determine whether it isproceeding as planned; whether it is meeting its statedprogram goals and project objectives according to theproposed timeline. Frequently these evaluation find-ings are also used to assess whether the particularproject merits continued funding as it is currentlyoperating, or if it needs modifications. Ideally in aI'roject Evaluation, evaluation design and data collec-tion begin soon afier the project is funded. Datacollection occurs on a planned schedule, e.g., every 6months or every year; and may lead to and supportrecommendations to continue, modify, and/or deleteproject activities and strategies. Frequently, althoughnot universally, the Project Evaluator is a member ofthe project stafE is selected by, and reports to theProject Director.

Project Evaluations may also include examination ofspecific components. A component of a project may bea specific teacher training approach, a classroompractice, or a governance strategy. An evaluation of acomponent frequently looks to see the extent to whichits goals have been met (these goals are a subset of theoverall project goals). and to clarify the extent to whichthe component contributes to the success or failure ofthe overall project.

The information contained in this Handbook has beenprimarily prepared for the use of Project Evaluators

EHR/NSF Evaluation Handbook 3

12

Page 13: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE EVALUATION PROTOTYPES

and Principal Investigators, although Program Evalu-ators may also find it useful. Our aim is to provide toolsthat will help those responsible for examination ofindividual projects gain the most from their evaluationefforts. Clew ly, however, these activities will alsobenefit program studies and the work of the Founda-tion in general. The better the information about eachof the NSF projects, the more we can all learn.

In the next section we describe three general types ofevaluation studies: (1) Planning Evaluation, (2) For-mative (Implementation and Progress) Evaluation,and (3) Summative Evaluation. While SummativeEvaluation is frequently the notion that comes to mindwhen the term "evaluation" is used, each has its owncontribution to make in understanding how well aproject is doing. As each type ofevaluation is discussed wepresent a brief definition of its purpose and some ideasof the kinds of questions that could be addressed.

What is a Planning Evaluation?

The purpose of a PlanningEvaluation is to assess

understanding of projects'goals, objectives, strategies,

and timelines.

The purpose of a Planning Evaluation is to assessunderstanding of a project's goals, objectives, strate-gies, and timelines. "Planning Evaluation" is not ascommonly carried out as the other prototypes. In factmost project proposals typically mention only "Forma-tive" and "Summative" Evaluation, defining these asactivities to be performed once a project has beendesigned, written up, and funded. The evaluatorenters the scene after the project has been put in place.

A strong argument can be made for a different approach.Rossi and Freeman (1993) argue strongly for theinvolvement of evaluators in diagnosing and definingthe condition that a Oven project is designed to address,in stating clearly and precisely the goals of the project,and in reviewing the proposed procedures for accu-racy of information and soundness of methods.

The Planning Evaluation will provide everyonePro-gram Directors, Principal Investigators, Project Direc-tors/Managers, participants, and the publicwith anunderstanding of what the project is supposed to doand the timelines and strategies for doing it. Theproduct of the Planning Evaluation is a rich, context-laden description of a project, including its major goalsand objectives, activities, participants and other majorstakeholders, resources, timelines, locale, and in-tended accomplishments. The Planning Evaluationcan also serve the purpose of describing the status of

4 EHR/NSF Evaluation Handbook

13

Page 14: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EVALUATION PROTO1YPES CHAPTER ONE

key outcome indicators prior to the project to serve asa baseline for measuring success.

To conduct a Planning Evaluation, the evaluator shouldbe present when the project is in its developmentalphase. The Planning Evaluation is typically designedto address the following questions:

Why was the project developed? What is theproblem or need it is attempting to address?

Who are the stakeholders (those who havecredibility, power, or other capital involved in theproject)? Who are the people interested in theproject who may not be involved?

What do the stakeholders want to know? Whatquestions are most important to whichstakeholders? What questions are secondary inimportance? Where do concerns coinc!de? Whereare they in conflict?

Who are th participants to be served?.t6

What are the a tivities and strategies that willaddress the problem or need which was identified?What is the intervention? How will participantsbenefit? What are the expected outcomes?

Where will the program be located (educationallevel, geographical area)?

How many months of the school year or calendaryear will the program operate? When will theprogram begin and end?

How much does it cost? What is the budget forthe program? What human, material, andinstitutional resources are needed? How much isneeded for evaluation? for dissemination?

What are the measurable outcomes which theproject wants to achieve? What is the expectedimpact of the project in the short nm? the longer run?

What arrangements have been made for datacollection? What are the understandings regardingrecord keeping, responding to surveys, andparticipation in testing?

These questions can become a checklist to determine

EHR/NSF Evalual ion Handbook14 5

Page 15: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE EVALUATION PROTOTYPES

if all relevant elements are included in the descriptionof the project or program. These questions also providethe basis for the formative and summative evaluativeinquiries about the project.

What is Formative Evaluation?

The purpose of FormativeEvaluation is to assess ongoing

project activities.

The purpose of Formative Evaluation is to assessongoing project activities. Formative Evaluation be-gins at project start-up and continues throughout thelife of the project. Its intent is to provide informationto improve the project. It is done at several points inthe developmental life of a project. According toevaluation theorist Bob Stake, Formative Evaluation,when contrasted with Summative Evaluation, is:

"When the cook tastes the soup,that's formative; when

the guests taste the soup,that's summative."

For most NSF projects, Formative Evaluation consistsof two segments: Implementation Evaluation andProgress Evaluation.

What is Implementation Evaluation?

The purpose of an Implementation Evaluation is toassess whether the project is being conducted asplanned. It may occur once or several times duringthe life of the project. Recall the principle learnedfrom the tale of the emperor who had no clothes andno one would tell him. The same principle appliesto a new project or new program. Before you canevaluate the outcomes of a project, you must makesure the project is really operating, and if it isoperating according to its plan or description. Forexample, in the description for ComprehensiveRegional Centers for Minorities (CRCM), these Re-gional Centers must be comprehensive in theircoverage of science, engineering and mathematicsand focus on a span of educational levelselemen-tary through high school. An ImplementationEvaluation of a CRCM project might begin byinvestigating whether or not the CRCM was indeedcomprehensive in its coverage and whether itsfocus spanned elementary through senior highschool. If these two essential conditions weresatisfied, it could be concluded that the CRCM wasinitially implemented as intended and that evalua-tion of outcomes and impacts associated with the

6 15EHR/NSF Evaluation Handbook

Page 16: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EVALUATION PROTOTYPES CHAPTER ONE

The purpose of an ImplementationEvaluation is to assess whether the

project is being conducted asplanned.

What is Progress Evaluation?

implementation could proceed.

Implementation Evaluation collects information todetermine if the program or project is being deliv-ered as planned. A series cc implementation ques-tions is needed to guide the ImplementationEvaluation. Examples of these questions are:

Were the appropriate participants selected andinvolved in the planned activities?

Do the activities and strategies match thosedescribed in the plan? If not, are the changes inactivities justified and described?

Were the appropriate staff members hired,trained, and are they working in accordance withthe proposed plan? Were the appropriatematerials and equipment obtained?

Were activities conducted according to theproposed timeline? by appropriate personnel?

Was a management plan developed and followed?

Sometimes the terms "Implementation Evaluation"and "Monitoring Evaluation" arc confused. They arenot the same. While Implementation Evaluation is anearly internal check by the project staff to see if all theessential elements of the project are in place andoperating, monitoring is an external check and shouldfollow the Implementation Evaluation. The monitorcomes from the funding agency and is responsible fordetermining progress and compliance on a contract orgrant for the project. The monitor investigates properuse of funds, observes progress, and provides infor-mation to the funding agency about the project.Although the two differ, Implementation Evaluation, ifeffective, can facilitate and ensure that there are nounwelcome surprises during monitoring.

The purpose of a Progress Evaluation is to assessprogress in meeting the project's goals. Progress Evalu-ation is also formative. It involves collecting informa-tion to learn whether or not the benchmarks ofparticipant progress were attained and to point outunexpected developments. Progress Evaluation col-lects information to determine what the impact of theactivities and strategies is on the participants at

EHR/NSF' Evaluation Handbook 7

16'

Page 17: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE EVALUATION PROTOTYPES

The purpose of a ProgressEvaluation is to assess progressin meeting the project's goals.

various stages of the intervention. By measuringinterim outcomes, project staff eliminate the risk ofwaiting until participants have experienced the entiretreatment to assess outcomes. If the data collected aspart of the Progress Evaluation fail to show expectedchanges, this iriormation can be used to "fine-tune" orterminate the project. The data collected as part of aProgress Evaluation can also contribute to, or form thebasis for, a Summative Evaluation study conducted atsome future date. In a Progress Evaluation, the follow-ing questions could be asked:

Are the part icipants moving toward theanticipated goals of the project or program?

Which of the activities and strategies are aidingthe participants to move toward the goals?

For example, one of the goals for the Alliances forMinority Participation (AMP) Program is to increasethe size of the pool a underrepresented minoritystudents eligible for Science, Engineering, and Math-ematics (SEM) graduate study. One of the interimindicators which shows progress towards meeting thegoal is the number (percent) of participants in theSummer Bridge program (a component of the AMPProgram) who successfully complete calculus by theirfreshman year in college. Additional progress informa-tion could be scores from calculus classroom quizzesthroughout the summer before the fmal exam andgrades given for the course. Collecting this informationon course completion, test scores, and grades, givesinterested parties some idea of the rate and extent towhich progress is being made toward the overarchinggoal of increasing the numbers of underrepresentedminority students eligible for SEM graduate study. Itgives some idea of the probability of achieving that finalgoal. If course completion or other indicators are notshowing progress, significant project changes may beconsidered.

Another example of measuring progress can bedrawn from Comprehensive Regional Centers forMinorities (CRCM). A goal is that, through work-shops, teachers will learn to improve and enrichtheir teaching strategies when teaching classessuch as high school chemistry. This interim goal isrelated to meeting the CRCM goal of retainingprecollege students' interest in science. Progressfindings could include teachers' ratings of theirinservice training classes, and the independent

EHR/NSF Evaluation Handbook

17

Page 18: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EVALUATION PROTOTYPES CHAPTER ONE

appraisals by outside observers of the quality oftheir performance when using new strategies in theclassroom. In addition, the opinions and attitudesof the participants (students and teachers) could becollected to determine whether the impact of theactivities and strategies is negative or positive.

In Progress Evaluation, quantitative and qualita-tive information about the participants is collectedto determine if parts of the project need to bechanged or deleted to improve the project. ProgressEvaluation is useful throughout the life of a project,but is most vital during the early stages whenactivities are piloted and their individual effective-ness or articulation with other project componentsis unknown.

What is Summative Evaluation?

The purpose of a SummativeEvaluation is to assessthe project's success.

The purpose of a Summative Evaluation is to assessthe project's success. Summative Evaluation takesplace after ultimate modifications and changes havebeen made, after the project is stabilized and after theimpact of the project has had a chance to be realized.(Another term frequently used interchangeably with"Summative Evaluation" is "Impact Evaluation.")Summative Evaluation answers these basic questions:

Was the project successful? What were itsstrengths and weaknesses?

To what extent did the project or program meet theoverall goal(s)?

Did the participants benefit from the project? Inwhat ways?

What components were the most effective?

Were the results worth the project's cost?

Is this project replicable and transportable?

Su mmative Evaluation collects information about pro-cesses and outcomes. The evaluation is an externalappraisal of worth, value or merit. Usually this type ofevaluation is needed for decisionmaking. The decisionalternatives may include the following: disseminatethe intervention to other sites or agencies; continuefunding; increase the funding; continue on probation-ary status; or discontinue.

EHR/NSF' Evaluation Handbook

1 39

Page 19: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE EVALUATION PROTOTYPES

An important idea to keepin mind while conducting a

Summative Evaluation isto be vigilant of

"unanticipated outcomes."

Summary

Summative Evaluation informs decisionmakers aboutwhether the activities and strategies were successfulin helping the project and/or its participants reachtheir goals. This evaluation also describes the extentto which each goal was attained. Sample SummativeEvaluation questions for a project like AMP couldinclude the following:

Did the majority of the undergraduate studentsin the project graduate with majors inmathematics, engineering or science?

What proportion of graduates pursued theireducation until they received doctorates inmathematics, engineering, or science?

Which elements or combinations of elements(mentoring, counseling, tutoring, or financialsupport) were most effective in retainingstudents in the SEM pipeline?

An important idea to keep in mind in conducting aSummative Evaluation is what has been called "unan-ticipated outcomes." These are findings that come tolight during data collection or data analyses that werenever anticipated when the study was first designed.An example of an unanticipated finding comes from astudy that started out to look at whether or not schoolbuses should have seat belts. This study also lookedat the cost of purchasing new buses that had seatbelts, versus retrofitting old models. This study,prompted by a desire to assure the safety of students,was ultimately unable to reach definitive conclusionsregarding the utility of seat belts from the data avail-able. Along the way, however, it was found that busesmanufactured before a certain date were missingother safety features and the safety of the transporta-tion system could be greatly enhanced by replacingbuses purchased before this date. This unanticipatedoutcome became the basis for significant changes inthe system's transportation policy.

Evaluations can serve many different needs and pro-vide critical da fordecision-making at all steps ofprojectdevelopment and implementation. Although some peoplefeel that evaluation is an act that is done to a project, ifdone well, an evaluation is really done for the project.

It is important to remember that evaluation is not a

10 19 EHR/NSF Evaluation Handbook

Page 20: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EVALUATION PROTOTYPES CHAPTER ONE

REFERENCES

single thing, it is a process. When done well, evalua-tion can help inform the managers of the project as itprogresses, can serve to clarify goals and objectives,and can provide important information on what is, oris not, working, and why.

This chapter has presented information to help Prin-cipal Investigators and project evaluators understandthe various types of evaluation, the different stages inthe evaluation process at which they occur, and thedifferent kinds of information they provide. To sum-marize this information, a restatement of the impor-tant issues has been developed (see pages 12 and 13)to serve as a "shorthand" guide. For additional discus-sion of the various types of evaluation prototypes seeRossi and Freeman (1993). Chapter Five in this Hand-book presents some additional examples of evalua-tions that further illustrate these roles and theirdifferences.

Rossi, P. H. & Freeman, H. E. (1993). aaluation:ASystematic Approach (5th Edition). Newbury Park, CA:Sage.

Joint Committee on Standards for Educational Evalu-ation. (1981). Standardsfor Evaluation ofEducationalPrograms, Projects. and Materials. New York, NY:McGraw-Hill.

EHR/NSF Evaluation Handbook 11

Page 21: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER ONE OVERVIEW

Overview of Evaluation Prototypes

Planning Evaluation:

A Planning Evaluation assesses the understanding of project goals, ob-jectives, strategies and timelines.

It addresses the following types of questions:

Why was the project developed? What is the problem or need it is at-tempting to address?

Who are the stakeholders? Who are the people involved in the project?Who are the people interested in the project who may not be involved?

What do the stakeholders want to know? What questions are most impor-tant to which stakeholders? What questions are secondary in importance?Where do concerns coincide? Where are they in conflict?

Who are the participants to be served?

What are the activities and strategies that will involve the participants?What is the intervention? How will participants benefit? What are the ex-pected outcomes?

Where will the program be located (educational level, geographical area)?

How many months of the school year or calendar year will the programoperate? When will the program begin and end?

How much does it cost? What is the budget for the program? What human,material, and institutional resources are needed? How much is needed forevaluation? for dissemination?

What are the measurable outcomes? What is the expected impact of theproject in the short run? the longer run?

What arrangements have been made for data collection? What are theunderstandings regarding record keeping, responding to surveys, and par-ticipation in testing?

Formative Evaluation

12

A Formative Evaluation assesses ongoing project activities. It consists oftwo types: Implementation Evaluation and Progress Evaluation.

Implementation Evaluation

An Implementation Evaluation assesses whether the project is beingconducted as planned. It addresses the following types of questions:

21EHR/NSF Evaluation Handbook

Page 22: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

OVERVIEW CHAPTER ONE

Were the appropriate participants selected and involved in theplanned activities?

Do the activities and strategies match those described in the plan? If not,are the changes in activities justified and described?

Were the appropriate staff members hired, and trainea, and are theyworking in accordance with the proposed plan? Were the appropri-ate materials and equipment obtained?

Were activities conducted according to the proposed timeline? byappropriate personnel?

Was a management plan developed and followed?

Progress Evaluation

A Progress Evaluation assesses the progress made by the partici-pants in meeting the project goals. It addresses the following typesof questions:

Are the participants moving toward the anticipated goals of theproject?

Which of the activities and strategies are aiding the participants tomove toward the goals?

Summative Evaluation

A Summative Evaluation assesses project successthe extent to whichthe completed project has met its goals. It addresses the following typesof questions:

Was the project successful?

Did the project meet the overall goal(s)?

Did the participants benefit from the project ?

What components were the most effective?

Were the results worth the project's cost?

Is this project replicable and transportable?

EHR/NSF Evaluation Handbook n4 4 13

Page 23: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER IWO: THE EVALUATION PROCESS AN OVERVIEW

In the preceding chapter, we outlined the types of evalu-ations that Principal Investigators and Project Evaluatorsmay want to carry out. In this chapter we talk furtherabout how to cany out an evaluation, expanding, inparticular, on two types of studies, Formative andSummative. In the sections that follow we provide anorientation to some of the basic language of evaluation, aswell as some hints about technical, practical, and politicalissues that should be kept in mind in conducting evalu-ation efforts. Our goal is to capture a snapshot of thevarious pieces that make up the evaluation process fromplanning to report writing.

This overview is limited to topics pertinent to contentand technique and does not cover other practicalissues, such as budget planning, time tables, etc.Information on such topics can be found in the ProjectEvaluation Kit described in Chapter 8 (Bibliography).We have also limited the discussion to the types ofevaluation most frequently carried out for NSF.

What are the Step.; in Conducting a Formative or Summative Evaluation?

Whether they are Summative or Formative, evalua-tions can be thought of as having five phases:

Develop evaluation questions

Match questions with appropriate information-gathering techniques

Collect data

Analyze data

Provide information to interested audiences.

All live phases are critical for provision of useful informa-tion. If the information gathered is not perceived asvaluable or useful (the wrong questions were asked) or theinformation is not credible or feasible (the wrong tech-niques were used), or the report is presented too late or iswritten inappropriately, then the evaluation will notcontribute to the decisionmaking process.

In the sections below we provide an overview of eachof these phases, describing the activities that need to

EFIR/NSF Evaluation Handbook 15

23

Page 24: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

take place in each. This overview is intended to providea basic understanding of conducting an evaluationfrom start to finish. In Chapters Three and Four weprovide greater detail in selected areas. A checklistsummarizing the complete process is presented at theend of this chapter.

How Do You Develop Evaluation Questions?

Getting started right can havea major impact on the

propross of an evaluationall along the way.

The development of evaluation questions consists ofseveral steps:

Clarify goals and objectives of the evaluation

Identify and involve key stakeholders and audiences

Describe the intervention to be evaluated

Formulate potential evaluation questions ofinterest to all stakeholders and audiences

Determine resources available

Prioritize and eliminate questions.

Although it may sound trivial, at the outset of anevaluation it is important to describe the project orintervention briefly and clarify goals and objectives ofthe evaluation. Getting started right can have a majorimpact on the progress of an evaluation all along iheway. Patton (1990) suggests considering the followingquestions in developing an evaluation approach:

Who is the information for and who will use thefindings?

What kinds of information are needed?

How is the information to be used? For whatpurpose s evaluation being conducted?

When is the information needed?

What resources are available to conduct theevaluation?

Given the answers to the preceding questions,hat methods are appropriate?

A critical component of clarifying goals and objectivesis the identification of the evaluation's focus. Is the

16 EHR/NSF Evaluation Handbook

24

Page 25: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

It is critical to identify the majorstakeholders, their questions,

and their needs for information.

evaluation to be Formative, looking at, for example,whether or not a teacher enhancement activity hasbeen implemented as planned? Or is it Summative,looking at the impact of the program on teachingpractices and, ultimately, student learning?

Equally important is the identification of either stake-holders in the project or. potential audiences for theevaluation information. In all projects, multiple audi-ences are likely to be involved. Being clear about youraudience is very important as different audiences willhave different infonnation needs. For example, thekinds of information needed by those who are con-cerned about the day-to-day operations of a projectwill be very different from those needed by policy-makers who may be dealing with more long-termissues or who have to make funding decisions.

The next step is a goal-oriented description of theproject including the rationale given for its existenceand its goals and objectives as seen by the stakehold-ers. The essence of the intervention should also bedocumented: where it is situated, who is involved, howit is managed, and how much it costs. An in-depthunderstanding of the intervention is usually neces-sary to determine the full range of evaluation ques-tions. This type of goal-centered description is oftena significant part of the evaluation effort.

After the purpose and stakeholders are identified andthe project is described, specific questions about theproject should be formulated. The process of identify-ing target audiences and formulating potential ques-tions will usually result in many more questions thancan be addressed in a single evaluation effort. Thiscomprehensive look at potential questions, however,makes all of the possibilities explicit to the plamiers ofthe evaluation and allows them to make an informedchoice among evaluation questions. Each potentialquestion should be considered for inclusion on thebasis of the following criteria:

Who would use the information

Whether th: answer to the question wouldprovide information not now available

Whet_her information is important te a majorgroup or several stakeholders

Whether information would be of continuing interest

EFIR/NSF Evaluation Handbook

2517

Page 26: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

A general guideline is forevaluations to be funded at

about 5-10 percent of the totalproject cost.

Whether it would be possible to obtain theinformation, given financial and humanresources.

Whether the time span required to obtain theinformation would meet the needs of decisionmakers,

These criteria determine the relevance of each poten-tial question.

The final select ion of quest ions depends heavily on theresources available. Some evaluation activities aremore costly than others. For example, it. may be t hatthe only way to answer the question: "How has aproject designed to enhance teachers' classroom ac-tivities affected classroom practieesr is through ex-tensive classroom observations, an expensive andtime-consuming technique. If sufficient ftinds arc notavailable to cany out observations, it may be neces-sary to reduce the sample or use a different datacollection method such as a survey. A general guide-line is to allocate 5-10 percent of project costs for theevaluation of large -scale projects (those exceeding$100,000); for smaller projects, the percentage mayneed to be higher to meet minimum costs of fieldingevaluation activities.

How 0o You Determine the Information-Gathering Techniques?

The next stage is the determination of the appropriateinformation-gathering techniques, including severalsteps:

Select a general methodological approach

Determine what sources of data would providethe information needed and assess the feasibilityof the alternatives

Select data collection techniques that wouldgather the desired information from the identifiedsources

Develop a design matrix.

After the evaluation questions have been formulated,the most appropriate methods for obtaining answersmust be chosen. In determining what approach to use,some initial questions need to be answered. First, isit better to do case studies, exploring the experiences

18 EHR/NSF Evaluation Handbook

26

Page 27: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

If you are limited in your evaluationresources, it is best to stick to the

simpler approaches.

of a small number of participants in depth or is itbetter to use a survey approach? In the latter case, doyou need to survey all participants or can you select asample? Do you want to look only at what happens toproject participants or do you want to compare theexperiences of participants with those of some appro-priately selected comparison group of nonpartici-pants? How you answer some of these questions willaffect the kinds of conclusions you can draw from yourstudy. Rigorous, "controlled" designs are not alwaysneeded for Formative (Process) Evaluations, althoughthey are always the preferred design. But SummativeEvaluations, or Impact Assessments, gain a great dealfrom being based on experimental or quasi-experi-mental designs. For more information on this and theimplications of different evaluation designs, see Cookand Campbell (1979).

Next you need to determine the kinds of data you wantto use. Some alternatives are listed in Exhibit 2. Whichone or ones to use depends on a number of factors,including the questions, the timeline and the re-sources available. Another factor to take into accountis the technical skill level of the evaluator or evaluationteam. Some of the techniques require more skills thanothers to design and analyze. If you are limited in yourevaluation resources, it is best to stick to the simplerapproaches. For example, observational techniquescan produce a rich database which, analyzed properly,can be highly informative. The trick here is to designinstruments which are either suitable for statisticalanalysis, or for other analytic strategies which havebeen developed for case study evidence (Yin, 1989). Inthe absence of careful advance planning for theanalysis, many an evaluator has wound up with amassive investment (both in time and in money) ofdata collected via observation that elude reasonableanalysis.

Finally, you need to decide on the appropriate mix ofdata collection techniques, including both quantita-tive and qualitative approaches.

In a broad sense, quantitative data can be defined asany data that can be represented numerically, whereasqualitative data are more frequently expressed throughnarrative description. Quantitative data also are use-ful in measuring the reactions or skills of large groupsof people on a limited set of questions, whereasqualitative data provide in-depth information on asmaller number of cases (Patton, 1990). These distinc-

EHR/NSF' Evaluation Handbook 19

27

Page 28: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

Exhibit 2

Sources and Techniques for Collecting Evaluation Information

I. Data Collected Directly From Individuals Identified as Sources of Information

A. Self-Reports: (from participants and control group members)

1. Diaries or Anecdotal Accounts

2. Checklists or Inventories

3. Rating Scales

4. Semantic Differentials

5. Questionnaires

6. Interviews

7. Written Responses to Requests for Information (for example, letters)

8. Sociometric Devices

9. Projective Techniques

B. Products from participants:

1 . Tests

a. Supplied answer (essay, completion, short response, and problem-solving)

b. Selected answer (multiple-choice, true-false, matching, and ranking)

2. Samples of Work

II. Data Collected by an Independent Observer

A. Written Accounts

B. Observation Forms:

1 . Observation Schedules

2. Rating Scales

3. Checklists and Inventories

III. Data Collected by a Mechanical Device

A. Audiotape

B. Videotape

C. Time-Lapse Photographs

D. Other Devices:

1. Graphic Recordings of Performance Skills

2. Computer Collation of Student Responses

IV. Data Collected by Use of Unobtrusive Measures

V. Data Collected from Existing Information Resources

A. Review of Public Documents (proposals, reports, course outlines, etc.)

B. Review of Institutional or Group Files (files of student records, fiscal resources, minutes of meetings)

C. Review of Personal Files (correspondence files of individuals reviewed by permission)

D. Review of Existing Databases (statewide testing program results)From: Education Evaluation: Alternative Approaches and Practical Guidelines by Blaine R. Worthenand James R. Sanders. Copyright ©1987 by Longman Publishing Group.

20 28 EI-112/NSF Evaluation Handbook

Page 29: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

tions are not, however, absolute. Rather, they can bethought of as representing two ends of a continuumrather than two discrete categories. Furthermore, insome instances qualitative data can be transformedinto quantitative data using judgmental coding (forexample grouping statements or themes into largerbroad categories and obtaining frequencies). Con-versely, well-designed quantitative studies will allowfor qualitative inputs.

Both types of data can provide bases fordecisionmaking; both should be considered in plan-ning an evaluation. And evaluations frequently use amix of techniques in any one study. Further details ondata collection and analysis techniques and the prosand cons of different alternatives are presented inChapter Three of this Handbook.

Once these decisions are made it is very helpful tosummarize them in a "design matrix." Although thereis no hard and fast rule, a design matrix usuallyincludes the following elements:

General evaluation questions

Evaluation subquestions

Variables to be exaniined and instruments/approaches for gathering the data

Respondents

Data collection schedule.

Exhibit 3 presents an example of a design matrix fora study of the effects of a teacher enhancementprogram.

How Do You Conduct Data Collection?

Ont the appropriate information-gathering techniqueshave been determined, the information must be gath-ered. Both technical and political issues need to beaddressed. The technical issues are discussed inChapter Three. The political factors to be kept in mindarc presented below:

Obtain necessary clearances and permission

Consider the needs and sensitMties of therespondents

ELI R/NSF Evaluat n I 1 1 andbook 2 I

Page 30: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

Exhibit 3: Summar/ of the Design for a Study of Project SUCCEED

Question I: Did project SUCCEED change teachers mathematics instructional practices?

Subquestion Data Collection Approach Respondents Schedule

la. Did teachers usedifferent materials?

1 b. Did teacherschange testingpractices?

lc. Was cooperativelearning increased?

Questionnaire

Observation

Questionnaire

Questionnaire

Observation

TeachersSupervisorsNA

Teachers

Teachers

NA

Pre/post training

3 x during year

End of year

End of year

3 x during year

I) -

Question 2: What impact did project SUCCEED have on teachers' use of planning time?

Subquestion Data Collection Approach Respondents Schedule

2a. Did teachers spendmore time plan-ning for instruction?

2b. Did teachers de-velop lesson plansreflecting newapproaches?

2c. Did teachers use thesample lessons asmodels?

Questionnaire

Questionnaire

Review of plans

Questionnaire

Review of plans

Teachers

Teachers

NA

Teachers

NA

End of year

End of year

3 x during year

End of year

3 x during year

22 30 EHR/NSF Evaluation Handbook

Page 31: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

The main idea here is to provideincentives for peopleor organizations to

take the time to participatein your evaluation.

Make sure your data collectors are adequatelytrained and will operate in an objective, unbiasedstyle

Cause as little disruption as possible to theongoing effort.

First, before data are collected, the necessary clear-ances and permission must be obtained. Many groups,especially school systems, have a set of establishedprocedures for gaining clearance to collect data onstudents, teachers, or projects. This may include whois to receive/review a copy of the report, restrictions onwhen data can be collected, or procedures to safe-guard the privacy of students or teachers. Find outwhat these are and address them as early as possible,preferably as part of the initial proposal development.When seeking cooperation, it is always helpful to offerto provide feedback to the participants on what islearned. Personal feedback or a workshop in whichfindings can be discussed is frequently looked uponfavorably. If this is too time-consuming, a copy of thereport or executive summary may well do. The mainidea here is to provide incentives for people or organi-zations to take the time to participate in your evaluation.

Second, the needs of participants must be considered.Being part of an evaluation can be very threatening toparticipants. Participants should be told clearly andhonestly why the data are being collected and the useto which the results will be put. On most survey typestudies, assurances are given and honored that nopersonal repercussions will result from informationpresented to the evaluator and, if at all possible,individuals and their responses will not be publiclyassociated in any report. This guarantee of anonymityfrequently makes the difference between a cooperativeand a recalcitrant respondent. There may, however, besome cases when identification of the respondent isdeemed necessary, perhaps to enforce the credibilityof an assertion. In such cases, the evaluator shouldseek informed consent before including such informa-tion. Informed consent may also be advisable where asensitive comment is reported which could be identifiedwith a given respondent, despite the fact that the reportitself includes no names. Common sense is the key here.

Third, data collectors must be carefully trained andsupervised, especially where multiple data collectorsare used. They must be trained to see things in thesame way, to ask the same questions, to use the same

E1112/NSF Evaluation Handbook

3123

Page 32: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

Checks need to be carriedout to make sure Mat

well-trained data collectorsdo not "drift" away from

the prescribed procedures.

How Do You Analyze the Data?

prompts. Periodic checks need to be carried out tomake sure that well-trained data collectors do not"drift" away from the prescribed procedures over time.(More details on training of data collectors are pre-sented in Chapter Three.)

In addition, it is important to guard against possibledistortion of data due to well intentioned but inappro-priate "coaching" of respondents an error frequentlymade by inexperienced or overly enthusiastic staff.They must be warned against providing value-ladenfeedback to respondents or to engage in discussionsthat might well bias the results. One difficult butimportant task is understanding one's own biases andmaking sure that they do not interfere with the workat hand. This is a problem all too often encountered whendealing with volunteer data collectors, such as parents ina school or teachers in a center. They volunteer becausethey are interested in, advocates for, or critics of, theproject that is being evaluated. Unfortunately, thedata they produce may reflect their own perceptions ofthe project, as much or more than that of the respon-dents, unless careful training is undertaken to avoidthis "pollution." Bias or perceived bias may compro-mise the credibility of the findings and the ultimate useto which they are put. An excellent source of informa-tion on these issues is the section on accuracy stan-dards in Standards for Evaluation of EducationalPrograms, Projects and Materials (Joint Committee onStandards for Educational Evaluation, 1981).

Finally, the data should be gathered causing as littledisruption as possible. Among other things, this meansbeing sensitive to the schedules of the people or theproject, as well as the schedule of the evaluation itself.It also may mean changing approaches as situationscome up. For example, instead of asking a respondentto provide data on the characteristics of proj ect partici-pantsa task that may require considerable time onthe part of the respondent to pull the data together anddevelop summary statisticsthe data collector mayhave to work from raw data, applications, monthlyreports, etc. and personally do the compilation.

Once the data are collected they must be analyzed andinterpreted. The steps to be followed in preparing thedata for analysis and interpretation differ, dependingon the type of data. The interpretation of qualitativedata may in some cases be limited to descriptive

24 32EHR/NSF Evaluation Handbook

Page 33: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

Data analysis is aniterative process.

narratives, but other qualitative data may lend them-selves to systematic analyses through the use ofquantitative approaches such as thematic coding orcontent analysis. Analysis includes several steps:

Check the raw data and prepare data for analysis

Conduct initial analysis based on the evaluationplan

Conduct additional analyses based on the initialresults

Integrate and synthesize findings.

The first step in quantitative data analysis is the checkingof data for responses that may be out of line or unlikely.Such instances include: selecting more than one answerwhen only one can be selected; always choosing the thirdalternative on a multiple-choice test of science concepts;reporting allocations of time that add up to more than 100percent; inconsistent answers, etc. Where such problem-atic responses are found, it is frequently necessary toeliminate the item or items from the data to be analyzed.

After this is done, the data are prepared for computeranalysis; usually this involves coding and entering(keying) the data with verification and quality controlprocedures in place.

The next step is to carry out the data analysis specifiedin the evaluation plan. While new information gained asthe evaluation evolves may well cause some analysesto be added or subtracted, it is a good idea to start withthe set of analyses that seemed to be of interestoriginally. For the analysis of both qualitative andquantitative data there are statistical programs cur-rently available on easily accessible software that makethe data analysis task considerably easier today thanit was 25 years ago. These should be used. Analystsstill need to be careful, however, that the data sets theyare using meet the assumptions of the techniquebeing used. For example, in the analysis of quantita-tive data, different approaches may be used to analyzecontinuous data as opposed to categorical data. Usingan incorrect technique can result in invalidation of thewhole evaluation project. (See Chapter Three for morediscussion of alternative analytic techniques.)

It is very likely that the initial analyses will raise asmany questions as they answer. The next step, there-

EHR/NSF Evaluation Handbook 25

33

Page 34: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO THE EVALUATION PROCESS: AN OVERVIEW

fore, is conducting a second set of analyses to addressthese further questions. lf, for example, the firstanalysis looked at overall teacher performance, asecond analysis might want to subdivide the totalgroup into subunits of particular interesti.e., moreexperienced versus less experienced teachersandexamine whether any significant differences werefound between them. These reanalysis cycles can gothrough several iterations as emerging patterns ofdata suggest other interesting avenues to explore.Sometimes the most intriguing of these are resultswhich emerge from the data; ones that were notanticipated or looked for. In the end, it becomes amatter of balancing the time and money available,against the inquisitive spirit, in deciding when theanalysis task has been completed.

The final task is to choose the analyses to be pre-sented, to integrate the separate analyses into overallpictures, and to develop conclusions regarding whatthe data show. Sometimes this integration of findingsbecomes very challenging as the different data sourcesdo not yield completely consistent results. While it isalways preferable to produce a report that is able toreconcile differences and explain apparent contra-dictions, sometimes the findings must simply beallowed to stand as they are, unresolved and thought-provoking.

How Do You Communicate Evaluation Results?

The final stage of the Project Evaluation is reportingwhat has been found. While reporting can be thoughtof as simply creating a written document, successfulreporting rests on giving careful thought to the cre-ation and presentation of the information. In fact,while funding agencies like NSF require a writtenreport, many projects use additional strategies forcommunicating evaluation findings to other audi-ences.

The communication of evaluation findings involvesseveral steps:

Provide information to the targeted audiences

Customize reports and other presentations to makethem compelling

Deliver reports and other presentations in time tobe useful.

26 34 Elik/NSF Evaluation Handbook

Page 35: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

THE EVALUATION PROCESS: AN OVERVIEW CHAPTER TWO

By involving the stakeholders atthe end of the study, the utility

and attention given to theevaluation findings are sure

to be increased.

Providing the evaluation information should not presenta problem if the evaluation has been successful so far.and if some simple steps are followed. Again, specialattention should be given to the stakeholders and theconstructive part they can play. The specification ofquestions and selection of data-gathering techniquesshould have already involved the stakeholders so thatthe information should be relevant and important tothem. By also involving the stakeholders at the end ofthe study, the utility and probable attention given tothe evaluation findings are sure to be increased. Oneway of accomplishing this is through a pre-releasereview of the report with selected stakeholder repre-sentatives. Such a session provides an importantopportunity for discussion of the findings, for resolv-ing any final issues that may arise and for setting thestage for the next steps to be taken as a result of thesuccesses and failures that the data may show.

Second, the information must be delivered when it isneeded. Sometimes there is leeway in when theinformation will be used; but the time of decision--making is often fixed, and information that arrives toolate is useless. There is nothing so frustrating to aPrincipal Investigator than being told by a fundingagency or community group:

"Oh, I wish I had known that two monthsago! That's when I had to make some deci-sions about the projects we were going to

support next year."

Our earlier discussion stressed the importance ofagreeing up front what is needed and when the needsmust be met. As the evaluation is carried out, theimportance of meeting the agreed-upon time schedulemust be kept in mind.

Finally, the information needs to be provided in amanner and style that is appropriate, appealing, andcompelling to the person being informed. For example,a detailed numerical table with statistical test resultsmight not be the best way to provide a school boardmember with achievement data on students. Differentreports may have to be provided for different audi-ences. And, it may well be that a written report is noteven the preferred alternative. While most evaluationswill include some written product, other alternativesare becoming increasingly popular.

It should be noted that while discussions of commu-

EHR/NSF Evaluation Ilancibook 35 27

Page 36: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWO EVALUATION PROCESS: AN OVERVIEW

nicating study results generally stop at the point ofpresenting a final report of findings, there are impor-tant additional steps that should be considered. Wherea new product or practice turns out to be successful,as determined by a careful evaluation, disseminationis an important next step. This topic is covered in aseparate NSF publication.

Summary

While there are technical skillsneeded to do an evaluation,

there is also "commonsense" involved.

REFERENCES

There are several phases to conducting and imple-menting an evaluation. No one stage is more impor-tant than the rest. And, as can be seen from thediscussion of the role of the stakeholders in both thefirst step developing questionsand the lastpro-vision of informationthe groundwork laid at theearliest stages can have important implicat ions for thesuccess of the evaluation in the long run.

Evaluation isn't easy, but there also is very littlemystery about the steps that need to be taken and theactivities that need to be carried out. While therecertainly are technical skills needed to do an evalua-tion that is helpful and credibleand that is whytrained evaluators are importantthere is also a lot of"common sense" involved. Sound advice is to blendthese two factorstechnical skills and common sense.In the best evaluations, both of these inevitably exist.

Cook, T. D. & Campbell, D. T. (1979). Quasi-experi-mentation: Design and Analysis Issues or Field. Set-tings, Chicago, IL: Rand McNally.

Joint Committee on Standards for Educational Evalu-ation. (1981). Standardsfor Evaluation ofEducationalPrograms, Projects. and Materials. New York, NY:McGraw-Hill.

Patton, M. Q. (1990). Qualitative Evaluation and Re-search Methods. Newbury Park, CA: Sage.

Worthen, B. & Sanders. J. (1987). Educational Evalu-ation: Alternative Approaches and Practical Guide-lines. New York: Longman, Inc.

Yin, R. (1989). Case Study Research. Newbury Park,CA: Sage.

28

3GEl IR/NSF Evaluation I landbook

Page 37: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

TIPS CHAPTER TWO

Tips for Conducting an Evaluation

1. Develop Evaluation Questions

* Clarify goals and objectives of the evaluation.

Identify and involve key otakeholders and audiences.

* Describe the intervention to be evaluated.

Formulate potential evaluation questions of interest to all stakeholders andaudiences.

O Determine resources available.

Prioritize and eliminate questions.

2. Match Questions with Appropriate Information-Gathering Techniques

Select a general methodological approach.

Determine what sources of data would provide the information needed.

Select data collection techniques that would gather the desired informationfrom the identified sources.

3. Collect Data

Obtain the necessary clearances and permission.

Consider the needs and sensitivities of the respondents.

Make sure data collectors are adequately trained and will operate in an ob-jective, unbiased manner.

Cause as little disruption as possible to the ongoing effort.

4. Analyze Data

Check raw data and prepare data for analysis.

Conduct initial analysis based on the evaluation plan.

Conduct additional analyses based on the initial results.

Integrate and synthesize findings.

5. Provide Information to Interested Audiences

Provide information to the targeted audiences.

Deliver reports and other presentations in time to be useful.

Customize reports and other presentations.

EHR/NSF Evaluation Handbook 37 29

Page 38: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER TWEE: DESIGN, DATA COLLECTION AND DATA ANALYSIS

In Chapter Two we outlined the steps in the develop-ment and implementation of an evaluation. Anothername for that chapter could be "the Soup to Nuts" ofevaluation because of its broad-based coverage of issues.In this chapter we focus more closely on selectedtechnical issues, the "Nuts and Bolts" of evaluation,issues that generally fall into the categories of design,data collection and analysis.

In selecting these technical issues, we were guided bytwo priorities:

We devoted most attention to topics rel-evant to quantitative evaluations, because,as emphasized in the introduction, in orderto be responsive to executive and congres-sional decisionmakers, NSF is usually re-quired to furnish outcome information basedon quantitative measurement.

We have given the most extensive coverageto topics for which we have located fewconcise reference materials suitable for NSF/EHR project evaluators. But for all topics, weurge project staff who plan to undertakecomprehensive evaluations to make use ofthe reference materials mentioned in thischapter and in the annotated bibliography.

The chapter is organized into four sections:

How do you design an evaluation?

How do you choose a specific data collectiontechnique?

What are some major concerns when collectingdata?

How do you analyze the data you have collected?

How Do You Design an Evaluation?

Once you have decided the goals foryour study and thequestions you want to address, it is time to design thestudy. What does this mean? According to Scriven( 1 99 1) design means:

EHR/NSF Evaluation Handbook 31

33

Page 39: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Thoughtful analysis, sensitivity,common sense, and creativity

are all needed to make surethat the actual evaluationprovides information that

is useful and credible.

Choosing an Approach

"The process of stipulating theinvestigatory procedures to

be followed in doing acertain evaluation."

Designing an evaluation is one of those good newsbad news" stories. The good news is that there aremany different ways to develop a good desigii. The badnews is that there are many ways to develop baddesigns. There is no formula or simple algorithm thatcan be relied upon in moving from questions to a.nactual study design. Thoughtful analysis. sensitivity,common sense, and creativity are all needed to makef',ure that the actual evaluation provides informationthat is useful and credible.

This section examines some issues to consider indevelc?ing designs that are both useful and method-ologically sound. They are:

Choosing an approach

Selecting a sample

Deciding now many times to measure

Since there are no hard and fast rules about designingthe study, how should the evaluator go about choosingthe procedures to be followed? This is usually a 2-stepprocess. In step 1, the evaluator makes a judgmentabout the main purpose of the evaluation, and aboutthe over-all approach which will provide the bestframework for this purpose. This judgment will lead toa decision whether the methodology will be essentiallyqualitative (relying on case studies, observations, anddescriptive materials) or whether the method shouldrely on statistical analyses, or whether a combinedapproach would be best. Will control or comparisongroups be part of thc design? if so, how should thesegroups be selected?

While some evaluation experts feel that qualitativeevaluations sbould not be treated as a technical,scientific process (Guba and Lincoln, 1989) others (forexample, Yin, 1989) have adopted design strategieswhich satisfy rigorous scientific requirements. Con-versely, competently executed quantitative studieswill have qualitative components. The experiencedevaluator will want to see a project in action and

32 El IR/NSF Evaluation Handbook

33

Page 40: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

An important considerationat this point is minimizing

interference withproject functioning.

conduct observations and informational interviewsbefore designing instruments for quantitative evalua-tion; he or she will also provide opportunities for"open-ended" responses and comments during datacollection.

There is a useful discussion about choosing thegeneral evaluation approach in Herman, Morris, andFitz-Gibbon (1987) which concludes with the follow-ing observation:

"There is no single correct approach to allevaluation problems. The message is this:some will need a quantitative approach;some will need a qualitative approach;

probably most will benefit from acombination of the two."

In all cases, once fundamental design decisions havebeen made, the design task generally follows the samecourse in step 2. The evaluator:

Lists the questions which were raised bystakeholders and classifies them as requiring anImplementation. Progress or SummativeEvaluation.

Identifies procedures which might be used toanswer these questions. Some of theseprocedures probably can be used to answerseveral questions; clearly, these will havepriority.

Looks at possible alternative methods, takinginto account strength of the findings yielded byeach approach (quality) as well as practicalconsiderations especially time and costconstraints, staff availability, access toparticipants, etc.

An important consideration at this point is minimizinginterference with project functioning: making as fewdemands as possible on project personnel and partici-pants, and avoiding procedures which may be per-ceived as threatening or critical.

All in all, the evaluator will need to use a great deal ofjudgment in making choices and adjusting designs,and will seldom be in a position to fully implement textbook recommendations. Some of the examples de-tailed in Chapter Six illustrate this point.

EHR/NSI Evaluation Handbook 4 0 33

Page 41: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

When and How to Sample

When planning allocation ofresources, evaluators should givepriority to procedures which will

reduce sample bias and responsebias, rather Man to the selection

of larger samples.

It is sometimes assumed that an evaluation mustinclude all of the persons who participate in a project.Thus in teacher enhancement programs, all teachersneed to be surveyed or observed; in studies of instruc-tional practices, all students need to be tested; and instudies of reform, all legislators need to be interviewed.This is not the case.

Sampling may be considered or necessary for qualita-tive and quantitative studies. For example, if a projectis carried out in a large number of sites, the evaluatormay decide to carry out a qualitative study in only oneor a few or them. When planning a survey of projectparticipants, the investigator may decide to samplethe participant population, if it is large. Of course, ifthe project involves few participants, sampling isunnecessary and inappropriate.

For qualitative studies, purposeful sampling is oftenmost appropriate. Purposeful sampling means thatthe evaluator will seek out the case or cases which aremost likely to provide maximum information, ratherthan a "typical" or "representative" case. The goal of thequalitative evaluation is to obtain rich, in-depth infor-mation, rather than information from which generali-zations about the entire project can be derived. For thelatter goal a quantitative evaluation is needed.

For quantitative studies, some form of random sam-pling is the appropriate method. The easiest way cfdrawing random samples is to use a list of participants(or teachers, or classrooms, or sites), and select every2nd or 5th or 10th name, depending on the size of thepopulation and the desired sample size. A stratifiedsample may be drawn to insure sufficient numbers ofrare units (for example, minority members, or schoolsserving low-income students).

The most common misconception about sampling isthat large samples are the best way of obtainingaccurate findings. While it is true that larger sampleswill reduce sampling error (the probability that ifanother sample of the same size were drawn, differentresults might be obtained), sampling en-or is thesmallest of the three components of error which affectthe soundness of sample designs. Two other errorssample bias (primarily due to loss of sample units)and response bias (responses or observations whichdo not reflect "true" behavior, characteristics or atti-

34 EHR/NSF Evaluation Handbook

Page 42: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

tudes)are much more likely to jeopardize validity offindings. (Sudman, 1976). When planning allocationof resources, evaluators should give priority to proce-dures which will reduce sample bias and responsebias, rather than to the selection of larger samples.

Let's talk a little more about sample and response bias.Sample bias occurs most often because of non-response(selected respondents or units are not available orrefuse to participate, or some answers and observa-tions are incomplete). Response bias occurs becausequestions are misunderstood or poorly formulated, orbecause respondents deliberately equivocate (for ex-ample to protect the project being evaluated). Inobservations, the observer may misinterpret or miss whatis happening. Exhibit 4 describes each type of bias andsuggests some simple ways of minimizing them.

Exhibit 4

Three Types of Errors and Their Remedies

SamplingError

Using a sample, not theentire population to bestudied.

Larger samplesthese reduce but donot eliminate sampling error.

SampleBias

Some of those selected toparticipate did not do so orprovided incompleteinformation.

Repeated affempts to reach non-respondents. Prompt and carefulediting of completed instruments toobtain missing data: comparison ofcharacteristics of non-respondents withthose of respondents to describe anysuspected differences that may exist.

ResponseBias

Responses do not reflect"true" opinions or behaviorsbecause questions weremisunderstood orrespondents chose not totell the truth.

Careful pretesting of instruments torevise mis-understood, leading, orthreatening questions. No remedy existsfor deliberate equivocation in self-administered interviews, but it can bespoffed by careful editing. In personalinterviews, this bic* can be reduced bya skilled interviewer.

4 rl4EHR/NSF Evaluation Handbook 35

Page 43: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Determining an adequate sample size sounds threat-ening, but is not as difficult as it might seem to be atfirst. Statisticians have computed recommendedsample sizes forvarious populations. (See Fitz-Gibbonand Morris, 1987.) For practical purposes, however, inproject evaluations, sample size is primarily deter-mined by available resources, by the planned analy-ses, and by the need for credibility.

In making sampling decisions, the overriding consid-eration is that the actual selection must be done byrandom methods, which usually means selectingevery nth case fro; n listings of units (students, instruc-tors, classrooms). Sudman (1976) emphasizes thatthere are many scientifically sound sampling methodswhich can be tailored to all budgets:

"In far too many cases, researchers areaware that powerful sampling methods

are available, but believe they cannot usethem because these methods are too diffi-

cult and expensive. Instead incrediblysloppy ad hoc procedures are invented,

often with disastrous results."

Deciding How Many Times to Measure

Impact measures are almostalways measures of change.

For all types of evaluations (Implementation, Progress,and Surnrnative) the evaluator must decide the fre-quency of data collection and the method to be usedif multiple observations are needed.

For many purposes, it will be sufficient to collect dataat one point in time; for others one time data collectionmay not be adequate. Implementation Evaluationsmay utilize either multiple or one-time data collectionsdepending on the length of the project and any prob-lems that may be uncovered along the way. ForSumrnative Evaluations, a one-time data collectionmay be adequate to answer some evaluation ques-tions: How many students enrolled in the project? Howmany were persisters versus dropouts? What were themost popular project activities? Usually, such datacan be obtained from records. But impact measuresare almost always measures of change. Has the projectresulted in higher test scores? Have teachers adopteddifferent teaching styles? Have students become moreinterested in considering science-related careers? Ineach of these cases, at a minimum two observationsare needed: baseline (at project initiation) and at a laterpoint, when the project has been operational long

;

(4.536 EHR/NSF Evaluation Handbook

Page 44: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

There is no hard and fast rule fordeciding when changes shouldor should not be made; in the

end technical concernsmust be balanced with

common sense.

enough for possible change to occur.

Quantitative studies using data collected from thesame population at different points in time are calledlongitudinal studies. They often present a dilemmafor the evaluator. Conventional wisdom suggests thatthe correct way to measure change is the "panelmethod," by which data are obtained from the sameindividuals (students, teachers, parents, etc.) at differ-ent points in time. While longitudinal designs whichrequire interviewing the same students or observ-ing the same teachers at several points Jn time arebest, they are often difficult and expensive to carryout because students move, teachers are re-as-signed, and testing programs are changed. Fur-thermore loss of respondents due to failure to locateor to obtain cooperation from some segment of theoriginal sample is often a major problem. Depend-ing on the nature of the evaluation, it may bepossible to obtain good results with successivecross-sectional designs, which means drawing newsamples for successive data collections from thetreatment population. (See Love, 1991 for a fullerdiscussion of logistics problems in longitudinaldesigns.)

For example, to evaluate the impact of a program offield trips to museums and science centers for 300high school students, baseline interviews can beconducted with a random sample of 100 studentsbefore the project start.. Interviewing another randomsample of 100 students after the project has beenoperational for one year is an acceptable technique formeasuring project effectiveness, provided that at bothtimes samples were randomly selected to adequatelyrepresent the entire group of students involved in theproject. In other cases, this may be impossible.

Designs that involve repeated data collection usuallyrequire that the data be collected using identicalsurvey instruments at all times. Changing questionwording or formats or observation schedules betweentime 1 and time 2 impairs the validity of the timecomparison. At times, evaluators find after the firstround of data collection that their instruments wouldbe improved by making some changes, but they do soat the risk of not being able to use altered items formeasuring change. Depending on the particular cir-cumstances, it may be difficult to sort out whether achanged response is a treatment effect or the effect ofthe modified wording. There is no hard and fast rule

4 4

EHR/NSE Evaluation Handbook 37

Page 45: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

for deciding when changes should or should not bemade; in the end technical concerns must be balancedwith common sense.

How Do You Choose a Specific Data Collection Technique?

In Chapter INvo we provided an overview of ways inwhich evaluators can go about collecting data. Asshown in that chapter, there are many different waysto go about answering the same questions. However,the great majority of evaluation designs for projectssupported by NSF/ EHR rely at least in part on quan-titative methods using one or several of the followingtechniques:

Surveys

Surveys based on self-administeredquestionnaires or interviewer administeredinstruments

Focus groups

Results from tests given to students

Observations (most often carried out inclassrooms)

Review of records and data bases (not createdprimarily for the evaluation needs of the project).

The discussion in this section focuses on these tech-niques. Evaluators who are interested in using tech-niques not discussed here (for example designs usingunobtrusive measures or videotaped observations)will find relevant information in some of the referencebooks cited in the bibliography.

Surveys are a popular tool for project evaluation.They are especially useful for obtaining informationabout opinions and attitudes of participants orother relevant informants, but they are also usefulfor the collection of descriptive data, for examplepersonal and background characteristics (race,gender, socio-economic status) of participants. Sur-vey findings usually lend themselves to quantita-tive analysis; as in opinion polls, the results can beexpressed in easily understood percentages ormeans. As compared to some other data collectionmethods, (for example in-depth interviews or ob-servations) surveys usually provide wider ranging

38 EHR/NSF Evaluation Handbook

45

Page 46: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

Focus groups

but less detailed data and some data may be biasedif respondents are not truthful. However, much hasbeen learned in recent years about improving sur-vey quality and coverage and compared to moreintensive methods, surveys are relatively inexpen-sive and easier to analyze using statistical software.

The cheapest surveys are self-administered: a ques-tionnaire is distributed (in person or by mail) to eligiblerespondents. Relatively short and simple question-naires lend themselves best to this treatment. Themain problem is usually non-response: persons notpresent when the questionnaire is distributed areoften excluded, and mail questionnaires will yieldrelatively low response rates, unless a great deal ofcareful preparation and follow-up work is done.

When answers to more numerous and more complexquestions are needed, it is best to avoid self-adminis-tered questionnaires and to employ interviewers to askquestions either in a face to face situation or over thetelephone. Survey researchers often differentiate be-tween questionnaires, where a series of preciselyworded questions are asked, and interviews which areusually more open-ended, based on an interview guideor protocol and yield richer and often more interestingdata. The trade-off is that interviews take longer, arebest done face-to-face, and yield data which are oftendifficult to analyze. A good compromise is a structuredquestionnaire which provides some opportunity foropen-ended answers and comments.

The choice between telephone and personal interviewsdepends largely on the nature of the projects beingevaluated and the characteristics of respondents. Forexample, as a rule children should be interviewed inperson, as should be respondents who do not speakEnglish, even if the interview is conducted by a bi-lingual interviewer.

Creating a good questionnaire or interview instrumentrequires considerable knowledge and skill. Question word-ing and sequencing are yew important in obtaining validresults, as shown by many studies. For a fuller discussion,see Fowler (1993, ch. 6) and Love (1991, ch. 2).

Focus groups have become an increasingly popularinformation gathering technique. Prior to designingsurvey instruments, a number of persons from the

El IR/NSF Evaluation Handbook 4G 39

Page 47: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Test Scores

population to be surveyed are brought together todiscuss, with the help of a leader, the topics which arerelevant to the evaluation and should be included indeveloping questionnaires. Terminology, comprehen-sion, and recall problems will surface, which should betaken into account when questionnaires or interviewguides are constructed. This is the main role for focusgroups in Summative Evaluations. However, theremay be a more substantive role for focus groups inProgress Evaluations, which are more descriptive innature and often do not rely on statistical analyses.(See Stewart and Shamdasani, 1990 for a full discus-sion of focus groups.)

The usefulness of focus groups depends heavily on theskills of the moderator, the method of participantselection and last, but not least, the understanding ofevaluators that focus groups are essentially an exer-cise in group dynamics. Their popularity is highbecause they are a relatively inexpensive and quickinformation tool, but while they are very helpful in thesurvey design phase, they are no substitute for sys-tematic evaluation procedures.

Many evaluators and program managers feel that if aproject has been funded to improve the academic skillsof students so that they are prepared to enter scientificand technical occupations, improvements in test scoresare the best indicator of a project's success. Test scoresare often considered "hard" and therefore presumablyobjective data, more valid than other types of measure-ments such as opinion and attitude data, or gradesobtained by students. But these views are not unani-mous, since some students and adults are poor test-takers, and because some tests are poorly designed andmeasure the skills of some groups, especially Whitemales, better than those of women and minorities.

Until recently, most achievement tests were eithernorm-referenced (measuring how a given studentperformed compared to a previously tested popula-tion) or criterion-referenced (measuring if a studenthad mastered specific instructional objectives andthus acquired specific knowledge and skills). Mostschool systems use these types of tests, and it hasfrequently been possible for evaluators to use dataroutinely collected in the schools as the basis for theirsummative studies.

40 EHR/NSF Evaluation Handbook

47

Page 48: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

Because of the many criticisms which have beendirected at tests currently in use, there is now a greatdeal of interest in making radical changes. Experi-ments with performance assessment are under wayin many states and communities. Performance testsare designed to measure problem solving behaviors,rather than factual knowledge. Instead of answeringtrue/false or multiple choice formats, students areasked to solve more complex problems, and to explainhow they go about arriving at answers and solvingthese problems. Testing may involve group as well asindividual activities, and may appear more like aproject than a traditional "test." While many educatorsand researchers are enthusiastic about these newassessments, it is not likely that valid and inexpensiveversions of these tests will be ready for widespread usein the near future.

A good source of information about test vendors andfor the use of currently available tests in evaluation isMorris. Fitz-Gibbon and Lindheim (1987). An exten-sive discussion of performance-based assessment byLinn, Baker, and Dunbar can be found in EducationalResearcher (Nov. 1991).

Whatever type of test used, there are two criticalquestions that must be considered before selecting atest and using its results:

Is there a match between what the test measuresand what the project intends to teach? If ascience curriculum is oriented toward teachingprocess skills, does the test measure these skillsor more concrete scientific facts?

Has the program been in place long enough forthere to be an impact on test scores? With mostprojects, there is a start-up period during whichthe intervention is not fully in place. Looking fortest score improvements before a project is fullyestablished can lead to erroneous conclusions.

A final note on testing and test selection. Evaluatorsmay be tempted to develop their own test instrumentsrather than relying on ones that exist. While this mayat times be the best choice, it is not an option to beundertaken lightly. Test development is more thanwriting down a series of questions, and there are somestrict standards formulated by the American Psycho-logical Association that need to be met in developinginstruments that will be credible in an evaluation. If at

EHR/NSF Evaluation Handbook 43 41

Page 49: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Observations

all possible, use of a reliable and validated, establishedtest is best.

Surveys and tests can provide good measurements ofthe opinions, attitudes, skills, and knowledge of indi-viduals; surveys can also provide information aboutindividual behavior (how often do you go to your locallibrary? what did you eat for breakfast this morning?),but behavioral information is often inaccurate due tofaulty recall or the desire to present oneself in afavorable light. When it comes to measuring groupbehavior (did most children ask questions during thescience lesson? did they work cooperatively? at whichmuseum exhibits did the students spend most of theirtime?) systematic observations are the best methodfor obtaining good data.

Evaluation experts clistinguish between three obser-vation procedures: (1) systematic observations, (2)anecdotal records (semi-structured), and (3) observa-tion by experts (unstructured). For NSF/EHR projectevaluations, the first and second are most frequentlyused, with the second to be used as a planning step forthe development of systematic observation instru-ments.

Procedure one yields quantitative information, whichcan be analyzed by statistical mett ods. To carry outsuch quantifiable observations, subject-specific in-struments will need to be created by the evaluator totit the specific evaluation. A good source of informationabout observation procedures, including suggestionsfor instrument development, can be found in Henerson.Morris and Fitz-Gibbon (1987, ch. 9).

The main disadvantage of the observation technique isthat behaviors may change when observed. This maybe especially true when it comes to teachers andothers who feel that the observation is in effect carriedout for the purpose of evaluating their performance,rather than the project's general functioning. Butbehavior changes for other reasons as well, as noted along time ago when the "Hawthorne effect" was firstreported. Techniques have been developed to deal withthe biasing effect of the presence of observers: forexample, studies have used participant observers, butsuch techniques can only be used if the study does notcall for systematically recording observations as events

42 EHR/NSF Evaluation Handbook

Page 50: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

occur. Another possible drawback is that perhapsmore than any other data collection method, theobservation method is heavily dependent on the train-ing and skills of data collectors. This topic is more fullydiscussed later in this chapter.

Review of Records and Data Bases

Most agencies and funded projects maintain sys-tematic records of some kind about the populationthey serve and the services they provide, but theextent of available information and their accessibil-ity differ widely. The existence of a comprehensiveManagement Information System or data base is ofenormous help in answering certain evaluationquestions which in their absence may require spe-cial surveys. For example. simply by looking atpersonal characteristics of project participants,such as sex, ethnicity, family status etc. evaluatorscan judge the extent to which the project recruitedthe target populations described in the projectapplication. As mentioned earlier, detailed projectrecords will greatly facilitate the drawing of samplesfor various evaluation procedures. Project rerordscan also identify problem situations or events (forexample exceptionally high drop-out rates at onesite of a multi-site project, or high staff turnover)which might point the evaluator in new directions.

Existing data bases which were originally set up forother purposes can also play a very important role inconducting evaluations. For example, if the projectinvolves students enrolled in public or private institu-tions which keep comprehensive and/or computer-ized files, this would greatly facilitate the selection of"matched" control or comparison groups for complexoutcome designs. However, gaining access to suchinformation may at times be difficult because of rulesdesigned to protect data confidentiality.

Exhibit 5 summarizes the advantages and drawbacksof the various data collection procedures.

What are Some Major Concerns When Collecting Data?

It is not possible to discuss in one brief chapter thenitty-gritty of all data collection procedures. The readerwill want to consult one or more of the texts recorn-mended in the bibliography before attacking any onespecific task. Before concluding this chapter, we wantto address two issues, however, which affect all data

El IR/NSF Evalt tanon I landbook 50 43

Page 51: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Exhibit 5

Advantages

Procedure

..

and Drawbacks of Various Data

Advantages

.Collection Procedures

Disadvantages

Self-administeredquestionnaire

Inexpensive. Can be quicklyadministered If distributed togroup. Well suited for simple andshort questionnaires,

No control for misunderstoodquestions, missing data, or untruthfulresponses. Not suited for explorationof complex issues.

Intervieweradministeredquestionnaires(by telephone)

Relatively Inexpensive. Avoidssending staff to unsafeneighborhoods or difficultiesgaining access to buildings withsecurity arrangements. Best suitedfor relatively short and non-sensitivetopics.

Proportion of respondents without aprivate telephone may be high insome populations. As a rule notsuitable for children, older people,and non-English speaking persons.Not suitable for lengthyquestionnaires and sensitive topics.Respondents may lack privacy.

Intervieweradministeredquestionnaires(in person)

Interviewer controls situation, canprobe irrelevant or evasiveanswers; with good rapport, mayobtain useful open-endedcomments.

Expensive. May present logisticsproblems (time, place, privacy,access, safety). Often requireslengthy data collection periodunless project employs largeinterviewer staff.

Open-endedinterviews(in person)

Usually yields richest data, details,new insights. Best if in-depthinformation is wanted.

Same as above (intervieweradministered questionnaires); alsooften difficult to analyze.

Focus groups Useful to gather ideas, differentviewpoints, new insights, improvingquestion design.

Not suited for generalizations aboutpopulation being studied.

Tests Provide "hard" data whichadministrators and fundingagencies often prefer; relativelyeasy to administer; goodinstruments may be available fromvendors.

Available instruments may beunsuitable for treatment population;developing and validating new,project-specific tests may beexpensive and time consuming.Objections may be raised becauseof test unfairness or bias.

Observations If well executed, best for obtainingdata about behavior of individualsand groups.

Usually expensive. Needs wellqualified staff. Observation mayaffect behavior being studied.

44

51.EI-112/NSF Evaluation llanclhook

Page 52: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

collections and deserve special mention here: theselection, training, and supervision of data collectors,and pretesting of evaluation instruments.

Selection, Training and Supervision of Data Collection

Selection

All too often, project administrators, and even evalu-ators, believe that anybody can be a data collector andtypically base the selection on convenience factors: anavailable research assistant, an instructor or clerkwilling to work overtime, college students available forpart-time or sporadic work assignments. All of thesemay be suitable candidates, but it is unlikely that theywili be right for all data collection tasks.

Most data collection assignments fall into one of threecategories:

Clerical tasks (abstracting records, compilingdata from existing lists or data bases, keepingtrack of self-administered surveys)

Personal interviewing (face-to-face or bytelephone) and test administration

Observing and recording observations.

There are some common requirements for the suc-cessful completion of all of these tasks: .1 good under-standing of the project, ability and discipline to followinstructions consistently and to give punctilious anddetailed attention to all aspects of t he data collection.Equally important is lack of bias, and lack of vestedinterest in the outcome of the evaluation. For thisreason, as previously mentioned (Chapter Two) it isusually unwise to use volunteers or regular projectstaff as data collectors.

Interviewers need additional qualities: a pleasantvoice and tactful personal manner and the ability toestablish rapport with respondents. For some datacollections, it may be advisable to attempt a matchbetween interviewer and respondent (for example withrespect to ethnicity, or age.) The need for fluency in alanguage other than English (usually Spanish) mayalso be needed: in this case it is important that theinterviewer be bi-lingual, with U.S. work experience,so that instructions and expected performance stan-

EHR/NSF Eva hitt! ton Handbook 45

Page 53: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Training

Supetvision

dards are well understood.

Observers need to be highly skilled and competentprofessionals. Although they too will need to followinstructions and complete structured schedules, it isoften important that they alert the evaluator to un-anticipated developments. Depending on the nature ofthe evaluation, their role in generating informationmay be crucial: often they are the eyes and the ears ofthe evaluator. They should also be familiar with thesetting in which the observations take. place. so thatthey know what to look for. For example teachers (orformer teachers or aides) can make good classroomobservers, although they should not be used in schoolswith which they are or were affiliated.

in all cases, sufficient time must be allocated totraining. Training sessions should include performingthe actual task (extracting information from a database, conducting an interview, performing an obser-vation). Training techniques might include role-play-ing (for interviews) or comparing recorded observationsof the same event by different observers. When theproject enters a new phase (for example when a secondround of data collection starts) it is usually advisableto schedule another training session, and to checkinter-rater reliability again.

If funds and technical resources are available, othertechiiiques (fOr example videotaping of personal inter-views or recording of telephone interviews) can also beused for training and quality control after permissionhas been obtained from participants.

Only constant supervision will ensure quality controlof the data collection. The biggest problem is notcheating by interviewers or observers (although thiscan never be ruled out), but gradual burnout: moretranscription errors, more missing data, fewer probesor follow-ups, fewer open-ended comments on obser-vat ion schedules.

The project evaluator should not wait to review com-pleted work until the end of the data collection, butshould do so at least once a week. See Fowler (1991)and Henerson, Morris and Fitz-Gibbon (1987) forfurther suggestions on interviewer and observer re-

46 53 Elilt/NSI,' Evaluation liandbook

Page 54: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

Pretest of Instruments

Pre-testing is a step that manyevaluators "skip" because of timepressures. However, as has beenshown many times, they may do

so at their own peril.

Data Analysis: Qualitative Data

cruitment and training.

When the evaluator is satisfied with the instrumentsdesigned for the evaluation, and before starting anydata collection in the field, all instruments should bepre-tested to see if they work well under field condi-tions. The pre-test also reveals if questions are under-stood by respondents and if they capture theinformation sought by the evaluator. Pre-testing is astep that many evaluators "skip" because of timepressures. However, as has been shown many times,they may do so at their own peril. The time taken upfront to pre-test instruments can result in enormoussavings in time (and misery) later on.

The usual procedure consists of u! ing instrumentswith a small number of cases (for example abstractingdata from 10 records, asking 10-20 project partici-pants to fill out questionnaires, conducting interviewswith 5 to 10 subjects, or completing half a dozen classroom observations). Some of the shortcomings of theinstruments will be obvious as the completed formsare reviewed, but most important is a debriefingsession with data collectors and in some instanceswith the respondents themselves, so that they canrecommend to the evaluator possible modifications ofprocedures and instruments. It is especially impor-tant to pre-test self-administered instruments, wherethe respondent cannot ask an interviewer for help inunderstanding questions. Such pre-tests are best doneby bringing together a group of respondents, askingthem first to complete the questionnaire, and thenleading a discussion about clarity of instructions, andunderstanding the questions and expected answers.

Analyzing the plethora of data yielded by comprehen-sive qualitative evaluations is a difficult task, andthere are many instances of frequent failure to fullyanalyze the results of long and costly data collections.While lengthy descriptive case studies are extremelyuseful in furthering the understanding of social phe-nomena and the implementation and functioning ofinnovative projects, they are ill-suited to outcomeevaluation studies for program managers and fundingagencies. However, more recently, methods have beendevised te classify qualitative findli !gs through the useof a special software program (E hnograph) and di-

EHR/NSF Evalp.ation Handbook

5447

Page 55: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

verse thematic codes. This approach may enableinvestigators to analyze qualitative data quantitativelywithout sacrificing the richness and character ofqualitative analysis. Content analysis which can beused for the analysis of unstructured verbal data, isanother available technique for dealing quantitativelywith qualitative data. Other approaches, includingsome which also seek to quantify the descriptiveelements of case studies, and others which addressissues of validation and verification also suggest thatthe gap between qualitative and quantitative analysesis narrowing. Specific techniques for the analysis ofqualitative data can be found in some of the textsreferenced at the end of this Chapter.

Data Analysis: Quantitative Dda

In Chapter Two, we outlined the major steps requiredfor the analysis of quantitative data:

Check the raw data and prepare data for analysis

Conduct initial analysis based on evaluation plan

Conduct additional analyses based on initialresults

Integrate Pnd synthesize findings.

In this chapter, we provide some additional advice oncarrying out these steps.

Check the Raw Data and Prepare Data for Analysis

In almost all instances, the evaluator will conduct thedata analysis with the help of a computer. Even if thenumber of cases is small, the volume of data collectedand the need for accuracy, together with the availabil-ity of PC's and user-friendly software, make it unlikelythat evaluators will do without computer assistance.

The process of preparing data for computer analysisinvolves data checking, data reduction, and datacleaning.

Data checking can be done as a first step by visualinspection of the raw data; this check may turn upresponses which are out-of-line, unlikely, inconsis-tent or suggest that a respondent answered questionsmechanically (for example chose always the thirdresponse category in a self-administered question-

48 EHR/NSF Evaluation Handbook

Page 56: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

Solid data preparation procedureshelp avoid "GIGO"- garbage in,

garbage out.

naire).

Data reduction consists of the following steps:

Deciding on a file format. (This is usuallydetermined by the software to be used.)

Designing codes (the categories used to classifythe data so that they can be processed bymachine) and coding the data. If instruments are"pre-coded," for example if respondents wereasked to select an item from a checklist, coding isnot necessary. It is needed for "open-ended"answers and comments by respondents andobservers.

Data entry (keying the data onto tapes or disks sothat the computer can read them).

Many quality control procedures for coding open-ended data and data entry have been devised. Theyinclude careful training of coders, frequent check-ing of their work, and verification of data entry bya second clerk.

Data cleaning consists of a final check on the data filefor accuracy, completeness and consistency. At thispoint, coding and keying errors will be detected. (Fora fuller discussion of data preparation procedures, seeFowler, 1991).

If these data preparation procedures have been care-fully carried out, chances are good that the data setswill be error-free from a technical standpoint and thatthe evaluator will have avoided the "GIGO" (garbage in,garbage out) problem which is far from uncommon inanalyses based on computer output.

Conduct Initial Analysis Based on the Evaluation Plan

In fact, much can be learned fromfairly uncomplicated techniques

easily mastered by personswithout a strong background in

mathematics or siatistics.

The evaluator is now ready to start generating infor-mation which will answer the evaluation questions. Todo so, it is usually necessary to deal with statisticalconcepts and measurements, a prospect which someevaluators or principal investigators may find terrify-ing. In fact, much can be learned from fairly uncom-plicated techniques easily mastered by persons withouta strong background in mathematics or statistics.Many evaluation questions can be answered throughthe use of descriptive statistical measures, such asfrequency distributions (how many cases fall into a

EHR/NSF Evaluation Handbook 49

5

Page 57: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

INWIN

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

given category), and measures of central tendency(such as the mean or median which refer to statisticalmeasures which seek to locate the "average" or thecenter of a distribution).

For frequency distributions, the question is mostoften a matter of presenting such data in the mostuseful form for project managers and stakeholders.Often the evaluator will look at detailed distributionsand then decide on a summary presentation, usingtables or graphics. An example is the best way ofillustrating these various issues.

Let us assume that a project had recruited 200 highschool students to meet with a mentor once a weekover a one year period. One of the evaluation questionswas: "How long did the original participants remain inthe program?' Let us also assume that th e data wereentered in weeks. If we ask the computer to give us afrequency distribution, we get a long list (ii every weekat least one participant dropped out. we may end upwith 52 entries for 200 cases). Eyeballing this un-wieldy table, the evaluator noticed several interestingfeatures: only 50 participants (1/4th of the total)scayed for the entire length of the program: a fewpeople never showed up or stayed only for 1 session.To answer the evaluation question in a meaningfulway, the evaluator decided to ask the computer togroup data into a shorter table, as follows:

Length of Participation

Time No. ofPailicipants

1 week or less 10

2-15 weeks 3016-30 weeks 6631-51 weeks 44

52 weeks 50

A bar chart might be another way of presentingthese data as shown in Exhibit 6.

Let us now assume that the evaluator would like asingle figure which would provide some indication ofthe length of time during which participants remainedin the project. There are three measures of centraltendency which provide this answer, the mean (or

50 El IR/NSF Evaluation I landbook

Page 58: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

Exhibit 6

gth of Participation in MentOring Project (200 High School Students)

60

50

40

30

20

10

0-1 Week 2-15 Weeks 16-30 Weeks 31-51 Weeks 52 Weeks

arithmetic average), the median (the point at whichhalf the cases fall below and half above), and the mode,which is the category with the largest number of cases.Each of these require that the data meet specificconditions and each has advantages and drawbacks.(See glossary for details.)

In the above example, the only way of computing themean, median, and mode would be from the raw data,prior to grouping the data as shown in Exhibit 7.However, to simplify the discussion we will just dealwith the mean and median (usually the most mean-ingful measures for evaluation purposes), which canbe computed from grouped data. The mean would beslightly above 30 weeks, the median would be slightlyabove 28 weeks. The mean is higher because of theimpact of the last two categories (31-51 weeks and 52weeks). Both measures are "correct," but they tellslightly different things about the length of timeparticipants remained in the project; the average was30 weeks, which may be a useful figure for estimatingfuture project costs: half of all participants stayed for28 weeks or less, which may be a useful figure fordeciding how to time retention efforts. Exhibit 7illustrates differences in the relative position of themedian, mean, and mode depending on the nature of

El1R/NSF Evaluatkm Handbook 53 51

Page 59: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Exhibit 7

Comparison of Central Tendency MeasuresIn Skewed Score Disttibutions

Positively Skewed Distribution

Median

Mode Mean

Negatively Skewed Distribution

Median

Mean Mode

the data, such as a positively skewed distribution oftest scores (more test scores at the lower end of thedistribution) and for a negatively skewed distribution(more scores at the higher end).

In many evaluation studies, the median is the pre-ferred measure of central tendency because for mostanalyses, it describes the distribution of the databetter than the mode or the mean. For a usefuldiscussion of these issues, see Jaeger (1990).

Conduct Additional Analyses Based on the Initial Results

The initial analysis may give the evaluator a good feelfor project operations, levels of participation, projectactivities, and the opinions and attitudes of partici-

52 El1R/NSF Evaluation Handbook

5J

Page 60: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

One rule of thumb is that aminimum of 20 cases ore

needed in each subgroupfor analysis and for

the use of statistical teststo judge the extent to which

observed differences ore "real"or due to sampling error.

pants, staff, and others involved in the project, but itoften raises new questions. These may be answered byadditional analyses which examine the findings ingreater detail. For example, as discussed in some ofthe earlier examples, it may be of interest to comparemore and less experienced teachers' assessment of theeffectiveness of new teaching materials, or to comparethe opinions of men and women who participated in amentoring program. Or it might be useful to comparethe opinions of women who had female mentors withthose of women who had male mentors. These moredetailed analyses are often based on cross-tabula-tions, which, unlike frequency distributions, dealwith more than one variable. If, in the earlier exampleabout length of participation in mentoring programs,the evaluator wants to compare men and women, thecross-tabulation would look as follows:

Length of Participation by Sex

All Students Men Women

1 weekor less 10 10 0

2-15 weeks 30 20 1016-30 weeks 66 40 2631-51 weeks 44 10 34

52 weeks 50 20 30

Exhibit 8, a bar graph, is a better way of showing thesame data. Because the table and graph show that onthe whole women dropped out later than men, but thatmost of them also did not complete the entire program,the evaluator may want to re-group the data, forexample break down the 31-51 group further to see ifmost women stayed close to the end of the program.

Cross-tabulations are a convenient technique for ex-amining several variables simultaneously: however,they are often inappropriate because sub-groups be-come too small. One rule of thumb is that a minimumof 20 cases are needed in each subgroup for analysisand for the use of statistical tests to judge the extentto which observed differences are "real" or due tosampling error. In the above example, it might havebeen of interest to look further at men and women indifferent ethnic groups (African American men, Afri-can American women, White men and White women)but among the 200 participants there might not havebeen a sufficient number of African American men orWhite women to carry out the analysis.

EHR/NSF Evaluation Handbook6 k)

53

Page 61: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Exhibit 8

Length of Participation in Mentoring Project(200 High School Students, 100 Men and 100 Women)

1 Week or Less

2-15 Weeks

16-30 Weeks

31-51 Weeks

52 Weeks

No. of Participants10 20 30 40 50 60

Ar 4

Men

I Women

Exploring the data by variousstatistical procedures in order to

detect new relationships andunanticipated findings is perhapsthe most exciting and gratifying

evaluation task.

There are other techniques for examining differencesbetween groups and testing the findings to see if theobserved differences are likely to be "true" ones. To useany one of them, the data must meet specific condi-tions. Correlation, t-tests, chi-square, and varianceanalysis are among the most frequently used and havebeen incorporated in many standard statisticalpackages. More complex procedures, designed toexamine a large number of variables and measuretheir respective importance, such as factor analysis.regression analysis. and analysis of co-variance arepowerful statistical tools, but their use requires ahigher level of statistical knowledge. There are specialtechniques for the analysis of longitudinal (panel)data. Many excellent sources are available for decidingabout the appropriateness and usefulness of thevarious statistical methods (Jaeger, 1990; Fitz-Gib-bon and Morris, 1987).

Exploring the data by various statistical procedures inorder to detect new relationships and unanticipatedfindings is perhaps the most exciting and gratifyingevaluation task. It is often rewarding and useful to keepexploring new leads, but the evaluator must not lose trackof time and money constraints and needs to recognizewhen the point of diminishing returns has been reached.

54 EHR/NSF Evaluation Handbook

61

Page 62: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

By following the suggestions made so far in thischapter, the evaluator will be, able to answer manyquestions about the project asked by stakeholders con-cerned about implementation, progress, and some out-comes. But the question most often asked by fundingagencies, planners and policy makers who might wantto replicate a project in a new setting is the question:Did the program achieve its objectives? Did it work?What feature(s) of the project were responsible for itssuccess orfailure? Outcome evaluation is the evaluator'smost difficult task. It is especially difficult for anevaluator who is familiar with the conceptual andstatistical pitfalls associated with program evaluation.To quote what is considered by many the classic textin the field of evaluation (Rossi and Freeman, 1993):

"The choice (of designs) always involvestrade-offs, there is no single,

always-best design that can beused as the 'gold standard'."

Why is outcome evaluation or impact assessment sodifficult?The answer is simply that educational projectsdo not operate in a laboratory setting, where "pure"experiments can yield reliable findings pointing tocause and effect. If two mice from the same litter arefed different vitamins, and one grows faster than theother, it is easy to conclude that vitamin x affectedgrowth more than vitamin y. Some projects will try tomeasure impact of educational innovations by usingthis scientific model: observing and measuring out-comes for a treatment group and a matched compari-son group. While such designs are best in theory. theyare by no means fool-proof: the literature abounds instories about "contaminated" control groups. For ex-ample, there are many stories about teachers whosestudents were to be controls for an innovative pro-gram, and who made special efforts with their stu-dents so that their traditional teaching style wouldyield exceptionally good outcomes. In other cases.students in a control group were subsequently en-rolled in another experimental project. But even if thecontrol group is not contaminated, there are innumer-able questions about attributing favorable outcomesto a given project. The list of possible impediments isIbrmidable. Most often cited is the fallacy of equatinghigh correlation with causality. If attendance in thementoring program correlated with higher test scores,was it because the program stimulated the students tostudy harder and helped them to understand scien-tific concepts better? Or was it because those who

EHR/NSF Evaluat ion Handbook 55

6

Page 63: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THRFr DESIGN, DATA COLLECTION AND DATA ANALYSIS

chose to participate were more interested in sciencethan their peers? Or was because the school changedits academic curriculum requirements? Besides poordesign and measurements, the list of factors whichmight lead to spurious outcome assessments includesinvalid outcome measures as well as competing expla-nations, such as changes in the environment in whichthe project operated and Hawthorne effects. On thebasis of his many years of experience in evaluationwork, Rossi and Freeman (1993) formulated The IronLaw of Evaluation Studies':

"The better an evaluation study istechnically, the less likely it is to show

positive program effects."

There is no formula which can guarantee a flawlessand definitive outcome assessment. Together with acommand of analytic and statistical methods, theevaluator needs the ability to view the project in itslarger context (the real world of real people) in order tomake informed judgments about outcomes which canbe attributed to project activities. And, at the risk ofdisappointing stakeholders and funding agencies, theevaluator must stick to his guns if he feels thatavailable data do not enable him to give an unqualifiedor positive outcome assessment. This issue is furtherdiscussed in Chapter Four.

integrate and Synthesize Findings

When the data analysis has been completed, the finaltask is to select and integrate tables, graphs andfigures which constitute the salient findings and willprovide the basis for the final report. Usually theevaluator must deal with several dilemmas:

How much data must be presented to support aconclusion?

Should data be included which are interesting orprovocative, but do not answer the originalevaluation questions?

What to do about inconsistent or contradictoryfindings?

Here again, there are no hard and fast rules. Becauseusually the evaluator will have much more informa-tion than can be presented, judicious selection shouldguide the process. It is usually unnecessary to belabor

56 63El IR/NSF Evaluation Handbook

Page 64: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

DESIGN, DATA COLLECTION AND DATA ANALYSIS CHAPTER THREE

REFERENCES

a point by showing all the data on which the conclu-sion is based: just show the strongest indicator. On theother hand, "interesting" data which do not answerone of the original evaluation questions should beshown if they will help stakeholders to understand orseek to address issues of which they may not havebeen aware. A narrow focus of the evaluation mayfulfill contractual or formal obligations, but it deprivesthe evaluator of the opportunity to demonstrate sub-stantive expertise and the stakeholders of the fullbenefit of the evaluator's work. Finally, inconsistent orcontradictory findings should be carefully examinedto make sure that they are not due to data collectionor analytic errors. If this is not the case, they should beput on the table, as pointing to issues which may needfurther thought or examination.

American Psychological Association, FAucational Re-search Association, and National Council on Measure-ment in Education (1974). Standards for Educationaland Psychological Tests. Washington, DC: AmericanPsychological Association.

Fitz-Gibbon, C. T. & Morris, L. L. (1987). How toDesign a Program Evaluation. Newbury Park, CA:Sage.

Fowler, F. J. (1993). Survey Research Methods.Newbury Park, CA: Sage.

Guba, E. G. & Lincoln, Y. S. (1989). Fourth GenerationEvaluation. Newbury Park, CA: Sage.

Henerson, M. E., Morris, L. L., & Fitz-Gibbon, C. T. (1987).How to Measure Attitudes. Newbury Park, CA: Sage.

Herman, J. L., Morris, L. L., & Fitz-Gibbon, C. T.(1987). Evaluators Handbook. Newbury Park, CA:Sage.

Jaeger, R. M. (1990). StatisticsA Spectator Sport.Newbury Park, CA: Sage.

Linn, R L., Baker, E. L., & Dunbar, S. B. "Complexperformance-based assessment: expectations and vali-dation criteria." Educational Reseamher, 20-8, 1991.

Love, A. J . (ed.) (1991). Evaluation Methods Sourcebook.Ottawa, Canada: Canadian Evaluation Society.

EHR/NSF Evaluation Handbook 64 57

Page 65: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER THREE DESIGN, DATA COLLECTION AND DATA ANALYSIS

Morris, L. L., Fitz-Gibbon, C. T., & Lindheim, E.(1987). How To Measure Performance and Use Tests.Newbury Park, CA: Sage.

Rossi, P. H. & Freeman, H .E. (1993). EvaluationASystematic Approach (5th Edition). Newbury Park, CA:Sage.

Scriven, M. (1991). Evaluation Thesaurus. NewburyPark, CA: Sage.

Seidel, J. V.. Kjolseth, R & Clark, J. A. (1988). TheEthnograph. Uttleton, CO: Qualls Research Associates.

Stewart, P. W. & Sharndasani, P. N. (1990). FocusGroups. Newbury Park, CA : Sage.

Sudman, S. (1976). Applied Sampling. New York:Academic Press.

Yin, R. (1989). Case Study Research. Newbury Park,CA: Sage.

58 6 5EHR/NSF Evaluation Handbook

Page 66: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR: REPORTING

The product of an evaluation is almost always a formalreport. While the report frequently may be supple-mented by other forms of oral communicationover-heads, conference presentations, workshopsa formalwritten report is standard for NSF. Depending onexactly who the audience is, a specific report may varyin format, length, and level of technical discussion. Forexample, a report to a Board of Education will be farmore concise, and less technical, than a report to aprofessional association or a funding agency.

In this chapter, we discuss the development of a formalreport for an agency like the National Science Founda-tion. The specific type of report on which we focus isone that would be the product of an experimental orquasi-experimental design. For details on developingreports for other methodologies, specifically, casestudies, see Yin, (1989).

What are the Components of a Formal Repod?

Most reports include five major sections. The majorsections are:

The Background Section

Background

Evaluation Study Questions

Sample, Data Collection, Instrumentation

Findings

Conclusions (and recommendations).

The background section includes and describes thefollowing: (1) the problem or needs addressed; (2) thestakeholders and their information needs; (3) theparticipants; (4) the project's objectives; (5) the activi-ties and components; (6) location and planned longev-ity of the project; (7) the resources used to implementthe project; and (8) the project's expected measurableoutcomes.

Notable constraints that existed in what the evalua-tion was able to do are also pointed out in this section.

EHR/NSF Evaluation Handbook

61359

Page 67: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR REPORTING

Evaluation Study Questions

Evaluation Procedures

For example, it may be important to point out that theconclusions are limited by the fact that no appro-priate comparison group was available or that onlythe short term effects of program participationcould be examined.

The evaluation is based on the need for specificinformation; stakeholders such as Congress, NSF-funded program and project directors, and the partici-pants, have distinct needs. There are many questionsto be asked about a project. However, all of thesequestions cannot be answered at one time. Thissection of the report describes the questions that thestudy addressed. As relevant, it also points out someimportant questions that could not be addressed dueto factors such as time, resources, or inadequacy ofavailable data gathering techniques.

This section of the report describes the groups thatparticipated in the evaluation study. For quantitativestudies it describes who these groups were and how theparticular sample of respondents included in the studywas selected from the total population available, if sarn-plingwas used. Important points noted are how represen-tative the sample was of the total population; whether thesample volunteered (self-selected) or was chosen usingsome sampling strategy by the evaluator; and whether ornot any comparison or control groups were included.

This section also describes the types of data col-lected and the instruments used for the data collec-tion activities. For example, they could be;

Quantitative data for identified critical indicators,e.g., grades for specific subjects, Grade PointAverages (GPA's)

Ratings obtained in questionnaires and interviewsdesiped for project dilectors, students, faculty, andgraduate students

Descriptions of classroom activities fromobservations of key instructional components ofthe project

Examinations of extant data records, e.g.. letters,planning papers, and budgets.

606 "

EHR/NSF Evaluation Handbook

Page 68: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

REPORTING

Data Analysis

Findings

CHAPTER FOUR

,. is helpful at the end of this section to include a "matrix"r 13ble which sant aarizes the evaluatk,n questions, theanables, the data-gathering approaches, the respon-

:lents, and the data collection schedule.

This section describes the techniques used to analyzethe data collected above. It describes the variousstages of analysis that were implemented and thechecks that were carried out to make sure that thedata were free of as many confounding factors aspossible. Frequently, this section contains a discus-sion of the techniques used to make sure that thesarnpk of participants that actually participated inthe study was, in fact, representative of the groupsfrom which they came. (That is, there is sometimes animportant distincticn between the characteristics ofthe sample that was selected for participation in theevaluation study and the characteristics of those whoactually participatedreturned questionnaires, at-tended focus groups, etc.)

Again, a summary matrix is a \rely useful illustrative tool.

This section presents the results of the analysesdescribed previously. The findings are usually or-ganized in terms of the questions presented in thesection on Evaluation Study Questions. Eachquestion is addressed, regardless of whether or nota satisfactory answer is provided. It is just asimportant to point out where the data are inconclu-sive, as where the data provide a positive or nega-tive answer to an evaluation question. Visuals suchas tables and graphical displays are an appropriatecomplement to the narrative discussion.

While the discussion in the findings section usuallyfocuses most heavily on quantitative information,qualitative information may also be included. Infact, including both can turn a rather "dry" discus-sion of results into a more meaningful communica-tion of study findings. An easy way to do this is toinclude quotes from the project participant s whichhelp to illustrate the point being made.

At the end of the findings section, it is helpful to havea summary that presents the major conclusions. Here"major" is defined in terms of both the priority of the

El IR/NSF Evaluation Handbook 63 61

Page 69: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR REPORTING

question in the evaluation and the strength of thefinding from the study. For example, in a SummativeEvaluation, the summary of fmdings would alwaysinclude a statement of what was learned with regardto outcomes, regardless of whether or not the datawere conclusive.

Conclusions (and Recommendations)

Other Sections

The conclusions section reports the findings withmore broad-based and summative statements. Thesestatements must relate to the fmdings of the project'sevaluation questions and to the goals of the overallprogram. Sometimes the conclusions section goes astep further and includes recommendations either forthe Foundation or for others undertaking projectssimilar in goals, focus, and scope. Care must be takento base any recommendations solely on robust findingsand not on anecdotal evidence, no matter how persuasive.

In addition to these six major sections, formal reportsalso include one or more summary sections. Thesewould be:

An Abstracta summary of the study and itsfindings presented in approximately one-half apage of text.

An Executive Summarya summary which maybe as long as four pages that provides anoverview of the evaluation, its findings, andimplications. Sometimes the Executive Summaryalso serves an a non-technical digest of theevaluation report.

How Do You Develop an Evaluation Report?

Although we usually think about report wi iting as thela.,t step in an evaluation study, a good deal of the workactually can and does take place before the project iscompleted. The "Background" section, for example, canbe based largely on the original proposal. While there maybe some events that cause minor differences between thestudy as planned and the study as implemented, the largemajority of information such as research background.the problem addressed, the stakeholders and theproject's goals, will remain essentially the same.

If you have developed a written evaluation design, the

62 EHR/NSF Evaluation Handbook

63

Page 70: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

REPORTING CHAPTER FOUR

material in this design can be used fa: the sections on"Evaluation Study Questions" and "Sample, DataCollection, Instrumentation." The "Data Analysis" sec-tion is frequently an updated version ofwhat was initiallyproposed. However, as we noted in Chapter Two, dataanalysis can take on a life of its own, as new ideasemerge when data are explored; the final data analysismay be far different than what was initially envisioned.

The "Findings" and "Conclusions" sections are themajor new sections to be written at the end of anevaluation study. These may present somewhat of achallenge because of the need to balance comprehen-siveness with clarity, and rigorous, deductive think-ing with intuitive leaps.

One of the errors frequently made in developing a"Findings" section is what we might call the attitude of"I analyzed it, so I am going to report it." That is,evaluators may feel compelled to report on analysesthat appeared fruitful, but ultimately resulted in littleinformation of interest. In most cases, it is sufficientto note the . :nalyses were conducted and that theresults were inconclusive. Presentation of tables show-ing that no differences occurred or no patterns emerged,is probably not a good idea unless there is a strongconceptual or political reason for doing so. Even in thelatter case, it is prudent to note the lack of findings inthe text and to provide the back-up evidence inappendices or some technical supplement.

One tip to follow when writing these last sections is toask colleagues to review what you have written andprovide feedback before the report reaches its fmalform. Your colleagues can assist in assessing theclarity and completeness of what you have written, aswell as provide another set of eyes to examine yourarguments and, possibly, challenge your interpreta-tions. It is sometimes very hard to get enough distancefrom your own analyses after you have been immersedin them. Ask a colleague for help and return that favorin the future.

What Might a Sample Report Look Like?

In the pages that follow, we present sample sectionsfrom an evaluation report. The sections have beencreated to illustrate further the kinds of informationincluded in each section. This report is a progress reportdeveloped after the first year of funding of a project thatwill ultimately continue for a total of 5 years.

EHR/NSF Evaluation Handbook70 63

Page 71: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR SAMPLE REPORT

Report of the Higher Education University Alliancesfor Minorifr Participation (AMP) Project

Background

Overview of the AMP Program

The Alliances for Minority Participation (AMP) Program is funded by the National ScienceFoundation's Human Resource Development division, part of the minority studentdevelopment initiative. The program was developed in response to concerns raised bythe low number of underrepresented minority students who successfully completedscience and engineering baccalaureate degree programs. A major goal of the AMPProgram is to increase substantially the pool of interested and academically qualifiedunderrepresented minority students who go on for graduate study in these fields.Students eligible to participate in this program are United States citizens or legalresidents in undergraduate colleges and universities who are African American,American Indian/Alaskan Native, or Hispanic.

AMP Program objectives are listed below:

Establish partnerships among community colleges, colleges and universities, schoolsystems. Federal/state/local agencies, major national Science, Engineering andMathematics (SEM) laboratories and centers, industry, private foundations, and SEMprofessional organizations.

Provide activities that facilitate the transition and advancement of minority studentsthrough one or more critical decision points during SEM educationhigh school tocollege, 2-year to 4-year college, undergraduate to graduate school, or college to theworkplace.

Achieve a demonstrated increase in the number of underrepresented minoritystudents receiving undergraduate SEM degrees.

Demonstrate the involvement and commitment of SEM departments and faculty inthe design and implementation of improvements of SEM undergraduate education.

Demonstrate the existence of an infrastructure and management plan for ensuringlong-term continuance of AMP or similar activities among the participating organiza-tions and institutions.

Identify for evaluative purposes the critical data elements associated with demon-strating the increases of undergraduate and graduate students in SEM programs.

Project Description

The "Higher Education University" AMP project was funded initially for 5 years,renewable each year. The project operated during 1991-92 from September to June andduring the 1992 summer session from July to August. According to the project plan,some students participated during both periods, while others participated in either the

AN,

64 EIIR/NST Evaluation I landbook1.,y

(

Page 72: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

SAMPLE REPORT CHAPTER FOUR

academic year or the summer session. The "Higher Education University" AMP projectincluded a coalition of colleges, universities, K-12 schools and nonprofit organizations.

Students were involved in the following core activities:

Summer research using the skills of science, engineering, and/or mathematics

Travel to attend professional meetings for scientists, engineers, and mathematicians

Attendance al programs to hear presentations by special speakers on subjects relatedto science, engineering or mathematics

Enrollment in the Summer Bridge Program to improve and enrich SEM skills

Participation in peer study groups to improve skills in science and/or mathematics.

A Black History program that was added to enhance African American students' self-esteem.

The project was fully funded for $1 million per year. This funding provided for projectmanagers, faculty/teachers, graduate students, support personnel, development of aproject database, student financial support, and the implementation of the previouslydescribed activities.

The Purpose of the Project and Its Evaluation

The goal of the project was to provide appropriate activities and support to studentsinvolved in the AMP project to improve and enrich their science, engineering andmathematics skills so that their interest and retention in the SEM pipeline continuesthrough undergraduate to graduate school. and eventually to the Ph.D. The initialevaluation of the project focused on identifying those activities that successfully met theAMP objectives and on reporting student outcomes related to the success of theactivities. On an annual basis, the evaluation will identify which of the project activitiesneed to be modified or deleted prior to the projects Summative Evaluation.

"Higher Education University" AMP Project objectives are listed below:

Establish and maintain a partnership among r Dmmunity colleges, colleges anduniversities, school systems, and industry

Provide activities that facilitate the transition and advancement of minority studentsthrough two critical decision points during SEM education 2-year to 4-year collegeand undergraduate to graduate school

Retain 95 percent of the AMP students in SEM courses

Increase the number of underrepresented minority students in SEM courses eachyear by 10 percent

El IR/NSF Evaluation Handbook 72 65

Page 73: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR SAMPLE REPORT

66

Increase by 25 percent each year the number of minority students receivingundergraduate SEM degrees

Demonstrate the involvement and commitment of SEM departments and faculty inthe design and implementation of improvements of SEM undergraduate education

Demonstrate the existence of an infrastructure and management plan for ensuringlong-term continuance of AMP or similar activities among the participating organiza-tions and institutions

Identify which components and activities helped recruit, retain and increase thenumber undergraduate and graduate students in SEM programs.

Evaluation Questions

The evaluation looked at a broad range of questions related to both the projects implemen-tation and its success. Specifically, the evaluation addresses the following questions:

Did the AMP project result in the establishment of adequate partnerships?

What was the impact of the partnerships on promoting the AMP projects objectives?

What activities were most successful in recruiting, retaining and increasingunderrepresented minorities in science, engineering, and mathematics?

What evidence is there that the project may successfully reach its long-term outcomes(e.g., SEM baccalaureate degree, acceptance into graduate school seeking a SEMdegree)?

How could the project be improved and/or changed to better serve the needs ofunderrepresented minority students who are enrolled in science, engineering, andmathematics courses?

Sample, Data Collection, instrumentation

The primary sourcus for information about the AMP project came from the AnnualReports, the database of Minimum Obligatory Set (MOS) elements, project-focusedquestionnaires and interviews, and observations of the instructional components. Theprincipal group for study was all AMP students in the project. Aggregates of grades byrace, nder and class status, pass and fail records, GPAs and retention rates were thebasis for analyzing the impact of the "Higher Education University" AMP project sites andcomponents. These same data were used for tracking the longitudinal progress of theAMP project for SEM underrepresented minority studentsthat is, the students'movement towards SEM baccalaureate and graduate degrees. For comparative pur-poses, selected scholastic information for all SEM students by race, gender and classstatus were collected. Table 1 summarizes the evaluation design. Variables, measures,and samples (participants) are presented for each evaluation question.

73 EHR/NSF Evaluation Handbook

Page 74: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

SAMPLE REPORT CHAPTER FOUR

Table 1: Summary of the Evaluation Design

Question 1: Did the AfV113 project result in the establishr4ent of adequate partnerships?

Subquestion Data Collection Approach Respondents Schedule

1 a. What partnershipswere established?

1 b. Were the partner-ships established ina timely fashion?

lc. Were the goals forthe number and mixof partners achieved?

Review of recordsInterviews

Review of records

Interviews

Comparison of proposaland data from 1 a

NAPrincipalInvestigator

NA

PrincipalInvestigator

NA

End of yearEnd of year

End of year

End of year

Question.2: What impact did the partnerships have?

Subquestion Data Collection Approach Respondents Schedule

2a. How effectivewere the partner-ships?

2b. What were the mosteffective activitiesprovided by them?

Questionnaires

Questionnaires

Observation

All staff,Selected students

All staff

NA

End of year

End of year

Ongoing

EHR/NSF Evaluation Handbook 67

74

Page 75: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR SAMPLE REPORT

Data Analysis

Descriptive statistics (frequencies, percentages. means, medians, standard deviations,etc.) were used to report the results of the evaluation. Also, these statistics were usedto make comparisons among the ratings and statements from the various respondents.Tests of significance were computed to determine if there were real differences amongcertain quantitative data, that is, the grade point averages and grades for AMP studentsand all SEM students.

Table 2 summarizes the data analysis design. The measures, variables, and analysesare presented for each evaluation question.

Table 2: Summary of the Data Analysis Plan

Question 1: Did the AMP project result in the establishment of adequate partnerships?

Subquestion Data Collection Approach Analysis Plan

I a. What partnershipswere established'?

1 b. Were the partnershipsestablished in atimely fashion?

1 c. Were the goals forthe number and mixof partners achieved?

Review of recordsInterviews

Review of recordsInterviews

Comparison of proposaland data from la

DescriptionsSimple numericaltallies

Frequency distributionof time of partnershipestablishment

Matching of goalswith achievements

Question 2: What impqct did the partnerships have?

Subquestion Data Collection Approach Analysis Plan

2a. How effectivewere the partner-ships?

2b. What were the mosteffective activitiesprovided by them?

Questionnaires

Questionnaires

Obseivation

Percentages selectingvarious ratings

Percentages selectingvarious ratingsSummaries of runningrecords

68

75EHR/NSF Evaluation Handbook

Page 76: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

SAMPLE REPORT CHAPTER FOUR

Findings

Did the AMP project result in the establishment of adequate partnerships?

The total number of institutions and groups that formed coalitions with the AMP projectwas 25. The types of groups that formed the coalitions were: 2 colleges, 5 universities,4 junior colleges, 3 school districts, and 11 businesses and community groups. The onlygroup that was below the target was school districts in the area. The target was to have6 school districts involved.

The activities of these coalitions were judged to be very helpful to the project. Over 80percent of the staff responded "helpful" or "very helpful" (the highest ratings) to thefollowing activities:

Mentoring

Extended on-site experiences

Special training opportunities

etc.

Which components of the AMP project were the most successful in supporting andretaining AMP students?

The Summer Bridge Program was a very successful part of this project. This conclusionis based on ratings from the AMP students and the grades that they subsequentlyreceived in pre-calculus and calculus. (See Tables 3 and 4.)

The data are impressive. Over 50% of the Summer Bridge students received an "A" ora "B" in these classes. Equally important, the failure rate was low. Only 10% of thestudents failed pre-calculus and 15% failed calculus. These compare quite favorablywith failure rates in the past which ranged from an average of 20% in pre-calculus to35% in calculus.

A number of the comments about the Summer Bridge Program recognized the positiveeffects of the support systems which were provided.

A 16-year old female remarked:

"The Bridge Program gave me just that extra little boostthat I needed to do well in my classes.

I knew the basic material. I knew I knew thebasic material. I was ready!"

This program was not without criticism, however. Several students pointed out that tht-schedule of classes, occurring as they did at mid-summer, prevented them from takingjobs that they needed. They recommended that the program be offered right after theend of the regular school year or right before the beginning of the next school year, sothat a block of time would be available for employment.

El 112/NSE Evaluat ion I landbook

'7G

69

Page 77: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FOUR SAMPLE REPORT

Another component that received high ratings was the Peer Study Group with itsopportunities for collaborative learning. (See Table 3.)

etc.

For a moderate number of students (35%), scholarship assistance was critical for themto remain in school.

etc.

Summary of Findings

1. There was a large number of coalitions formed between from the business andcommunity groups. School districts met only half of the participation target.

2. The Summer Bridge Program, Peer Study Group, and scholarship assistance wereAMP components, which were rated highly for supporting students to remain in theSEM pipeline.

etc.

Conclusions

For 1991-92, the "Higher Education University" AMP project met its objectives. Theproject established partnerships among community colleges, universities, schoolsystems, business, and community groups. However, partnership with the schoolsystem fell short of their participation target by 50 percent.

The Summer Bridge Program, Peer Study Groups, and scholarship assistance wererated by the AMP students, faculty, and AMP staff as critical to retaining and increasingundergraduate and graduate students in SEM programs. However, in the SummerBridge Program, there were some negative remarks about the scheduling of activitiesand recommendations were made for...

The Black History course received high ratings from the African American AMPstudents. Many of these students said that they were encouraged to succeed as SEMstudents after learning about successful role models in science and mathematics.

70 EHR/NSF Evaluation Handbook

Page 78: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE: EXAMPLES

Example 1:

Sometimes, providing an example is more worthwhilethan a thousand words of advice. Because we believethat is the case with evaluation, we have provided anumber of examples (both good and bad) of differentevaluation efforts. These examples build on informa-tion from the previous chapters. The specific exampleswe present are fictitious, but are based on studies thatcould and have been done.

Evaluation of an Inservice Program for Elementary Science Teachers

This example illustrates the:

Use of formative evaluation formeasuring implementation

Importance of involving a skilledevaluator early in the project

Use of both qualitative andquantitative approaches

Use of multi-informants and datacollection techniques

Ability to adapt a design basedon new information.

NSF funded the Center for Professional Enhancementfor a 3 year program of in-service education forelementary science teachers. The purpose of the projectwas to introduce the teachers to some of the newapproaches to elementary science instruction and toassist them in applying these techniques in theirclassrooms. Teachers.were selected for participationbased on the nomination of a supervisor (usually aprincipal), their written essay, and a commitment bothto attend regular and summer sessions and try out therk-w approaches in their classrooms.

The program consisted of 1 year of training with follow-up workshops after the first year. A mixture of teach-ing strategies was used. These included: lecturesessions, hands-on experiences in using techniquesidentified as being promising, visits to classroomstaught by model teachers, and peer coaching.

Before the project was even funded, the Principalinvestigator hired an evaluator to participate in theprogram. The evaluator was one who had considerableexperience in studies of elementary science teachertraining and was aware of the new directions in whichelementary science is proceeding. The Principal inves-tigator had considered hiring an elementary scienceteacher who had been surplused to fill this role, butdecided against it because of the belief that a moreskilled evaluator would benefit the project more. Theevaluator and the Principal investigator discussed thegoals of the project, the plan for meeting these goals,and the questions that needed to be explored. Consid-erable time was devoted to clarifying the kinds ofinformation needed to make sure the training wasfunctioning as intended. The Principal investigatorwas very interested in studying program implementa-tion as early as possible to make sure everything was

EHR/NSF Evaluation Handbook 71

Page 79: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

72

"on track." The evaluator also talked to the trainers tounderstand their information needs and understandmore fully the kinds of data that would help them dotheir jobs as well as possible. Both the Principalinvestigator and the trainers expressed a strong inter-est in knowing the extent to which what was learnedwas, in fact, being transferred to the classroom.

The evaluator began observing the training sessionson an intermittent basis almost as soon as theystarted. Although no formal observation system wasused, she wrote a brief narrative summary after eachobservation session detailing the focus of the session,the strategies discussed, and the involvement/en-gagement level of the participants. After 6 months, sheadministered a questionnaire to the participants whichaddressed a wide range of issues, including the ad-equacy of the training and the extent to which it wasused and useful in their own classrooms. She alsointerviewed the trainers to get their reactions to theproject and to hear any new concerns that may havearisen. She had planned to interview other teacherswho worked with the participants to assess theirawareness of the new teaching techniques (it washoped that the training would have a "spill-over" effecton others in the school), but this idea was abandonedas being premature after a preliminary review of theteacher questionnaires.

The findings proved to be very useful and the Principalinvestigator was pleased with the feedback from thisearly investment. The observational summaries indi-cated that the training sessions themselves werehighly effective. Even during the lecture sessions,participants were engaged and very attentive. Thequality of interaction and discussion during the hands-on ses3ions was very good, with the participantsfrequently going beyond the demonstration tasks andinventing their own alternatives.

By and large, the interviews with the trainerscomplemented the observational data. Trainerswere pleased at how smoothly their classes weregoing and at the high level of engagement shown bythe participants. They did, however, have someproblems with the model teachers and coordinatingdemonstrations by them with material covered inthe other training sessions. Because these teacherswere teaching in regular school situations, theneeds of the project and the regular classroom toofrequently came into conflict.

73EHR/NSF Evaluation Handbook

Page 80: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EXAMPLES CHAPTER FIVE

The data from the participant questionnaire were lesspositive. While participants had high praise for thetraining they were receiving, they were somewhat lessenthusiastic about the project overall. While they hadinitially been very pleased with the opportunity toparticipate, they were finding that the time they spentaway from the classroom was interfering with theirjobs as teachers. Further, they were unable to applywhat they learned to their own classes because theylacked the supplies and materials necessary. Thesupport that had been provided for the traditionalscience teaching in which they previously had en-gaged did not meet the needs of the new lessons towhich they were being exposed.

Based on these findings, several changes were madein the project. First, video-taped demonstrations weresubstituted for visits to model teachers during theregular school year. Arrangements were made fordemonstrations of model teaching during the summerat a nearby laboratory school that had a summersession. Selection criteria for teachers were alsochanged so as to require some in-kind support fromtheir institutions. The Principal or someone in thecentral office had to agree to provide the supplies andmaterials needed for instructing in the new ways, ifsuch materials were not already available. The Prin-cipal investigator also reallocated some of the projecthinds to provide more materials for the teachers. Anumber of lessons were designated as ones in whichall needed materials would be provided to the partici-pants for use in their classes. Time was also set asideduring the training sessions to discuss attempts toapply the new strategies to the classroom. Both suc-cessful and unsuccessful applications were considered.

The second year evaluation showed that these changeswere having a positive effect. While the second yearparticipants still had a certain level of frustration at beingout of their classrooms, their ability to bring back and byout new strategies reduced this frustration greatly. Thesharing sessions at which application attempts werediscussed became favorites of both the participants andthe trainers. The former gained important insights intoways of transferring their skills; the latter gleaned manytips to pass on to the next year's participants.

Example 2: Evaluation of an Integrated Learning System for the Teaching of Mathematics

NSF funded Jones University, along with the SmithSchool District to conduct a study of the efficacy of the

El-IR/NSF' F:valuatIon Handbook 73

8 0

Page 81: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

This example illustrates the:

Problems that can arise when thepotential needs of criticalstakeholders are not considered

Limitations of relying on a singlemeasure of program impact

Need to provide for formativeprogress evaluation

Misinterpretations that result fromfailure to appropriatelydisaggregate results.

Boston Integrated Learning System (ILS). The SmithSchool District, once considered a very fine schoolsystem, had over the last several decades, fallen onhard times. Budget cuts, population shifts, and anational economy which has been sliding, combinedto give Smith new challenges that it had never beforefaced. The results were discouraging. Not only weretest scores on the decline, but absenteeism was high,and even when in school, the students frequentlymissed class or were disruptive.

The Boston ILS was selected for implementation inthis district because of both its ability to tailor instruc-tion to individual needs and its motivational charac-teristics. It was also hoped t hat with an ILS like Bostonin place, teachers would be able to spend more one-on-one time with each student, without reducing thequality of instruction received by the group.

The Boston ILS provides the hardware and softwareneeded to assist students in elementary mathematics.The ILS combines teaching modules, testing modules,and a reporting component intended to provide anindividualized learning experience. It has high qualitygraphics and an audio component.

Seven schools were selected for the project. The schoolswere among the most needy in the district, defined interms of student test scores and free lunch counts. Allstudents in these schools participated.

The goal of the project, as written in the proposal toNSF, was to increase test scores in mathematics. Thekey indicator of performance was defined as scores onthe Schmata Test of Basic Skills, a norm referencedtest given every other year to the students.

The study was initiated in the fall of 1989. Because theevaluation design seemed to the Principal Investigatorto be very straightforwardpre-post test performanceon the Schmata test, no allowance was made for anevaluator at project onset. Rather, the Principal Inves-tigator intended to rely on the normal test reporting ofthe school district as the means of acquiring evaluativedata. He also included some funds for reanalysis ofthese data by his colleagues at Jones University. TheNSF Program Officer queried the Principal Investigatorabout the scope of the evaluation, raising the questionof whether or not it would meet all stakeholders' needs.She asked, specifically, whether or not all relevantparties at the school district had been consulted before

74 81 El IR/NSF Evaluation Handbook

Page 82: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EXAMPLES CHAPTER FIVE

the proposal was submitted. The Principal Investiga-tor said that he had talked to the Director of Curricu-lum and they were in agreement with regard to theevaluation. He said, however, that he would revisit thequestions with a broader group of policymakers at theschool level and amend his proposal, if necessary. In therush of next steps, this consultation fell by the wayside.

Ten months after the project was initiated, the schooldistrict faced another budget crisis. The Board askedfor evidence that the project was successful, threaten-ing to cut back the in-kind fonds that had beenallocated for teacher training and planning time to theproject schools. The Principal luvestigator was able togive his impressions of how the program was workingand several teachers also offered their support. Inaddition, however, the Board received a number ofphone calls from other teachers saying that Bostonplaced too much of a burden on them and the benefitto students was negligible.

Fearing loss of support from the school system, thePrincipal Investigator called upon some of his col-leagues for advice on what to do. Fortunately. theUniversity had on staff some strong educational evalu-ators. After review of the project and the data avail-able, they came up with an evaluation scheme.Recognizing the fact that the Schmata test resultswould not be available for more than a year, theygathered some interim measures of program impact.First, they looked at test performance on the assess-ment modules provided by Boston. Analysis of thesedata showed that, overall, students were makingsteady progress and seemed to be retaining the skillslearned as measured by the retention tests. (Whilethere was no way to compare these students' perfor-mance with that of students given traditional instruc-tion, the analysis at least provided some promise ofsuccess.) Second, they looked at data in other areasto see whether an impact could be posited. The areasthey selected were attendance and referrals for behav-ioral incidents. These data showed that, overall, atten-dance had increased and behavioral incidents haddecreased compared to the same time in previousyears. Further, when the data for some individualstudents were examined, these same students showedincreased attendance and fewer referrals for distur-bance than they had in the past. Taken together, thesefindings provided weight to the claims of the PrincipalInvestigator and the project continued to receive sup-port from the school district.

EHR/NSF Evaluation liandbook 75

82

Page 83: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

Six months later, the evaluators returned to these dataand did some additional analyses, disaggregating theresults by grade, gender, race, and English-languageproficiency. The evaluators found the picture of suc-cess which emerged from the overall data dld not holdtrue for each subgroup of students participating.Specifically, they found that both the progress andbehavioral data showed striking differences betweenEnglish and limited-English speaking students, withthe limited English speaking students failing to do aswell. Despite the fact that the latter were receiving thedistrict's special language supports and were assignedto the regular classroom for their instruction, thesestudents were clearly failing to profit from Boston ILS.Unfortunately, this information was not obtaineduntil after the students had spent a full school year inthe project and had started a second year ofparticipation.

Example 3: The Evaluation Of A Special Program For Gifted Minority Students

Project REACH FOR THE STARS, a project aimed atidentifying and supporting gifted minority students inmath and science, was funded by NSF under theComprehensive Regional Center for Minorities Pro-gram. This project aimed at identifying talented mi-nority students at the end of 8th grade andencouraging their participation in mathematics andscience courses for the duration of their high schoolcareer and beyond. It included a mentoring component, Saturday morning and summer enrichmentsessions, and support groups.

Students were identified for participation based ontest scores, grades, and motivation. Two subgroupswere created. Subgroup I, the majority of st idents(75%), had the highest test scores and at least a "B"average in math and science. The second subgroup,subgroup II (25%), consisted of students who werehighly motivated to participate, but had not shownstrong performance in the past. Since there were morestudents who qualified for the program and wereinterested than those who could be accepted, a lotterysystem was used to select individuals for participa-tion. Those who were not selected were placed into acomparison group against which to measure theprogress of students in subgroup I. Unfortunately, therewas not a sufficiently large number of students who fellinto subgroup II to allow for a similar procedure.

The evaluation used a wide variety of measures whichwere entered into a data base student-by-student.

76 83 El IR/NSF Evaluation Handbook

Page 84: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EXAMPLES CHAPTER FIVE

This example illustrates:

A Summative Evaluation built onprogress data collected annually

The use of both survey and casestudy methodology

The use of multiple data sources

The problems of interpretingfindings without a comparisongroup

The consideration of timeliness inthe production of a report.

Included were grades both in the target courses, mathand science, and in other major academic subjects;test scores from end-of-semester exams; standard-ized tests; and, as relevant, the SAT, ACT, andCollege Board results. At the end of the 12th grade,data on post secondary applications and acceptan-ces, as well as scholarships and other honors wereobtained and added.

In addition, focus groups were conducted each yearwith participants in order to get students' reactions tothe program both from an academic and a socialperspective. Surveys were administered to the par-ents and teachers at the end of the second and fourthyea_ s. Finally, a follow-up survey was sent to gradu-ates (both the participants and the subgroup I com-parisons students) one year after they had graduatedin order to fmd out what they were doing, how well theywere doing, and what they thought, in retrospect of thespecial program in which they had participated.

A substudy, which was turned into a dissertation byone part-time researcher, looked closely at the experi-ences of five students differing in gender, race, andfamily structure. These case studies were used toprovide a thick description of program experiencesand student/ family, and staff reactions.

On an annual basis, the data were analyzed for theprogram participants overallthe comparison groupstudents and for the participating students by sub-groups. These annual analysis were used to monitorthe progress of the students and to pinpoint individualstudent problems as they arose. The student focusgroups also provided important input for modifica-tions in the project, which fine-tuned the approach.

A final report built on the data collected annuallyproviding an overall summary for the four years ofproject participation. The only new data added in thefourth year that was not collected previously was theinformation on honors, awards, and post-secondaryacceptances. Because of the careful job that had beendone documenting progress along the way, the pro-duction of a final report was greatly simplified andthere was little protest from any of the participantsabout the requests for data and the "burden" that itcaused .

The final report showed that in general the programwas a success. Students made good grades in the

EHR/NSF Evaluation Handbook 77

84

Page 85: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

target courses, continued to do well in the othercourses, and were accepted into strong post-second-ary institutions. There was a statistically significantdifference between the grades and test scores of theprogram participants and the comparison group stu-dents. The data collected from the students' parentsshowed they had high expectations for their childrenand were convMced by their proven success in theprogram that they could and should aim high in thefuture. However, the parents of the nonparticipatingstudents were quite similar in their responses. Wherestudents from subgroup I dropped out or failed forother reasons to succeed, there was usually someextenuating circumstance relating to family orfriends. Although the number was too small to bestatistically significant overall, there was a tendencyfor comparison students to drop out more frequentlyfor reasons related to school problems.

Although not all of the subgroup II students weresuccessful, nearly half of them were so. Unfortu-nately, the analyses did not uncover any particularpredictors of who from that group might succeed orfail. Some teachers felt that despite these studentsfailure to attain success in absolute terms, they stillperformed better than they would have without theprogram. However, the lack of a comparison group forthese students made it impossible to test out thishypothesis. The evaluator felt that understanding ofthese students could be enhanced by some furtherinterviews or by the data that would result from thefollow-up questionnaire and hoped to delay reportinguntil these data could be analyzed and the picturemade more complete. However, the principal investi-gator felt that the report could not be delayed furtherwithout putting in jeopardy any chance of continuedfunding.

Example 4: Evaluation of a Summer Camp For Female High School Students

Project CAMP CRUSADE FOR WOMEN IN SCIENCE.a five-year project begun in 1987, is aimed at womenin high school grades 9-11 and seeks to promoteinterest and involvement in the study of science. Thegoals are science-oriented high school and post highschool course and activity choices on the part of campparticipants which will ultimately lead them to pursuecareers in the sciences.

This project, funded by NSF under the YoungScholars Program, currently has an all-woman

78 El IR/NSF Evaluation I landb )ok

85

Page 86: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EXAMPLES CHAPTER FIVE

staff of two secondary science teachers, two under-graduate student majoring in science, and onecollege professor.

The project is being carried out by Hill College, a smallmidwestern institution located in the Mountain schooldistrict, a large district with 5 high schools. The partici-pating college professor is the Principal Investigator.

Eligible applicants include all female students ingrades 9-11 in the Mountain school district. Allapplicants are asked to complete a questionnairewhich seeks information about previous courses takeii,and includes a series of questions measuring atti-tudes, satisfaction, motivation, educational goals,and career goals. The Principal Investigator consid-ered using Grade Point Average and test scores asselection criteria, but rejected this approach becauseof questions she had regarding how well they wouldpredict performance on activities at the camp, andbecause she wanted Camp Crusade to provide oppor-tunities for all women interested in science regardlessof their previous attainments.

The summer camp provides opportunities for up to 35participants to engage in a variety of activities such aslectures, experiments, field studies, films, and studywork groups. From among 120 applicants, the 35participants were randomly selected with equal selec-tions from each grade.

The proposal to NSF included an evaluation compo-nent with a modest budget. It called for a FormativeEvaluation to be conducted every two years. Theproject evaluator was a part-time instructc,r in theDepartment of Education at the Hill College with priorexperience in educational research, but no evaluationexperience. The evaluator planned to use an experi-mental design based on comparisons between thetreatment group (camp participants) and a controlgroup which consisted of a random sample of 50 non-accepted applicants. The evaluator attempted to matchparticipants and controls by grade level, but given thesmall applicant pool this was difficult, since the greatmajority of applicants were 10th graders.

The first Formative Evaluation of the program tookplace in 1987. However, the initial evaluation waslimited to an Implementation Evaluation that deter-mined that the project was being conducted as planned.No Progress Evaluation was done to determine

El IR/NSF Evaluation I landbook 79

86

Page 87: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

whether the participants were moving towards meet-ing the project's goals. Stakeholders questioned theabsence of progress information and wonderedwhether it was an oversight or an unwillingness tolook at the issue.

To meet these concerns, the Principal Investigator andthe evaluator decided to modify the evaluation design.They decided that an Implementation Evaluation wouldbe conducted every other year. and a Progress Evalu-ation would be conducted yearly. The revised evaluation design called for data to be collected both at theend of the summer and during the next school year.Both qualitative and quantitative approaches were tobe used to capture a wide variety of information on howthe summer experience was affecting the participants.Specifically the data collection would include:

Questionnaires

Personal interviews

Observations

School recor Is.

During the last two days of the camp, the evaluatorconducted personal interviews and observed thework groups; on the last day she distributed a self-administered questionnaire to the participants.The interviews focused primarily on the camp expe-rience while the self-administered questionnairewas essentially a replication of the questionnairewhich all applicants had completed. The evaluatorhad planned to administer this questionnaire alsoto the control group but could not reach moststudents in the control group because of the timingof this survey (late summer, before the start ofschool) and this idea was abandoned.

Analyses of the initial information gathered on theexperimental and control students showed that thegroups were very similar. There were no differencesbetween the two groups with respect to courses se-lected or motivation. Overall, the data from the experi-mental group collected following the camp experienceappeared promising. The post-test questionnaire indi-cated that the camp attendees were more motivatedthan they had been to pursue advanced and electivehigb school course work in the sciences, had increasedpositive attitudes towards science, and were more

HO El 112/NSF Evaluation Handbook

Page 88: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

EXAMPLES CHAPTER FIVE

This example illustrates:

Ability to adapt a design based onnew demands

Use of control group for measuringproject effects

Utilizing an appropriate mix of datacollection techniques - bothquantitative and qualitative

Failure to adequately plan forfollow-up data collectionprocedures and correction for lowresponse rates

Over generalizing from oneparticular sample of participants toa whole population

Treating data from a ProgressEvaluation as if they were part of aSummative Evaluation

likely to consider pursuing future academic studies inthe sciences than before they attended the camp. Theinterviews with the experimental group supported thefindings from the questionnaires. They allowed theevaluator to add more qualitative data to the study andenhance the quantitative results.

The observational data suggested that the workgroup sessions were well-liked by the girls. Therewas a high level of interaction among the partici-pants, as well as between the participants and thefaculty. There was an especially high volume ofcontact between the campers and the undergradu-ate counselors. The students were observed askingnumerous questions of the counselors regardingtheir college studies and the nature and difficulty oftheir programs. There were similar levels of interac-tion with the high school teachers, but interactionswith the professor were constrained. Further ob-servations will be necessary to determine why theseinteractions were so low, focusing on programdesign, personalities, etc. This issue was high-lighted for further attention in the next Implemen-tation Evaluation.

At the end of the second semester of the school yearfollowing the 1988 camp, a review of the school recordsindicated that 65% of the girls who attended the camphad registered for honors or advanced placementscience classes, while only 45% of the control groupstudents had done so. Also, 25% of the experimentalgroup, and only 10% of the control group, chose to takea science-related course as their elective.

A follow-up questionnaire was mailed to both theexperimental and control groups six months after thecamp. The response rate was low for both groups-50% for the experimental group and 30% for thecontrol group. The Principal Investigator recognizedthat follow-up phone calls were needed to increase theresponse rate and insure reliability of findings, butbudget constraints prohibited this approach and thesedata were used in the evaluation although the evaluator was careful to point out their weakness.

The findings indicated that although the measuresof attitudes and motivation for the experimentalgroup had decreased slightly since the immediatepost-test questionnaire was administered, the mea-sures were significantly higher than those of thecontrol group. Responses to questions about the

El IR/NSF Evaluation I hunlbook h

Page 89: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER FIVE EXAMPLES

camp itself indicated that the participants reallyenjoyed it and were happy about the new relation-ships they had established with other students andwith the staff. Approximately 25% of the responsesindicated that the camp participants desired moreenvironmental science activities. Therefore, thePrincipal Investigator decided to accommodate thisdesire by replacing one of the lab activities thatreceived very critical comments with an environ-mental science activity.

The report on the Progress Evaluation was very posi-tive. Its overall conclusion was that this project had arositive effect and motivated girls to become moreinvolved in science and oriented toward academicstudies in scientific fields. In fact, the report includeda recommendation that more camps like this shouldbe established throughout the school district andmore girls should be encouraged to apply.

The stakeholders who initially called for the ProgressEvaluation greeted the findings and recommendationwith mixed reactions. Supporters of the project werevery enthusiastic about the findings. Critics ques-tioned the conclusions citing the very high initialmotivational levels of the experimental and controlstudents and the low response rates in the follow-upsurvey. They finally agreed that the data were encour-aging but that final decisions regarding programexpansion needed to await additional data.

82 EHR/NSF Evaluation Handbook

Page 90: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SIX: SELECTING AN EVALUATOR

Just as some people presume that anyone can be ateacher or that anyone can be an expert of educationalpolicy, some people believe that anyone can be anevaluator. That is just not the case and any ProjectDirector who works on such an assumption not onlyendangers the quality of the evaluation effort, but risksbeing faced with making decisions on poor or incorrectinformation. The extra time, and sometimes, money ittakes to get someone who is properly trained andexperienced is well-worth the allocation of resources.

The selection of the appropriate person to be a ProjectEvaluator is critical to the data collection, monitoringand evaluation tasks that are needed for the manage-ment of a project. In an earlier chapter, we briefly touchedon the fact that an evaluator can either be external orinternal to the project team. In this section. we talk moreabout the evaluator focusing on the qualifications thatare needed and ways to locate competent personnel.

Skills Needed

The evaluator must be knowledgeable of the project'sgoals and objectives, and must agree philosophicallywith the mission and the purpose of the project.

This person must be able to apply evaluation expertiseto collect data, monitor and evaluate the project. Thisperson must be able to work independently withminimum supervision and to provide consultativeadvice, when needed. Further, the evaluator must beable to listen and to deal with all participants andstakeholders in a respectful and ser^4tive way. Thereis nothing that can kill an evaluation faster than anarrogant evaluation specialist.

The specific skills needed are listed below:

Has knowledge of evaluation theory andmethodology

Can differentiate between research andevaluation procedures

Can plan, design, and conduct an evaluation

Has knowledge and ability to do statisticalanalyses of data

El IR/ NSF Evaltmt to t I I anclbook 83

ti

Page 91: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SIX SELECTING AN EVALUATOR

84

Has knowledge and ability to manage andmaintain a moderate- sized database

Has ability to train staff to input data into thedatabase

Has skills to write and edit brief interpretive reports

Has experience conducting an evaluation

Can communicate clearly and effectively withproject and program staff and others related tothe evaluation tasks

Has ability to differentiate between whatinformation is needed solely for the project andwhat information is needed for the program

Understands and can do formative evaluation.

Although it is not always possible, it is preferable to havethe evaluator as a part of the team from the beginning ofthe project, even before the project is funded at theproposal stage. This not only benefits the evaluator,but also the entire project. An evaluator who is knowl-edgeable and sensitive to the kinds of information thatwill be needed to answer the Formative and Summativequestions, can set in place the mechanisms for providingthe data needed at project outset. Evaluation will not beseen as an "add-on," as it so often is, and the datagathering can, in all likelihood, be far less intrusive thanif it is started at a later date. The fact that evaluatorsand program implementors may have interests andneeds that clash is all too well known. Early establish-ment of a shared commitment and shared under-standing is essential. (In Chapter Five where wepresent examples of evaluation studies, we illustratesome reasons for having an evaluator on board earlyon. For more on this issue of the role of the evaluatorin new programs and the social conditions that mayinfluence evaluation see Rossi and Freeman, 1993).

Sometimes Project Directors want, or may be pres-sured, to select as the evaluator someone who hasbeen closely aligned with the project on the programdevelopment sidea teacher or curriculum specialist,for example. Assuming the person also has evaluationcredentials, such a choice may seem appealing. How-ever, one danger in this route is the strong possibilityof a biased evaluation, either real or perceived, takingplace. That is, the evaluator may be seen as having too

El112/NSF Evaluation Handbook

Page 92: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

SELECTING AN EVALUATOR CHAPTER SIX

much ownership in the original idea and, therefore, beunable to conduct an objective evaluation. Indeed, theevaluator herself may feel uncomfortable, torn be-tween possibly conflicting loyalties. Given the stakesthat frequently are attached to evaluation fmdings it isprudent to avoid any apparent conflicts of interest.

Finding an Evaluator

There are many different sources for locating a ProjectEvaluator. The one that works best will depend on anumber of factors including the home institution forthe project, the nature of the project, and whether ornot the Principal Investigator has some strong feelingabout the type(s) of evaluation that are appropriate.

There are at least three avenues that can be pursued:

If the project is being carried out at or near acollege or university, a good starting point is likelyto be at the college or university itself. PrincipalInvestigators can contact the Department chairsfrom areas such as Education, Psycholov,Administration, or Sociolog and ask about theavailability of staff skilled in project evaluation. Inmost cases, a few calls will yield several names.

A second source for evaluation assistance comesfrom independent contractors. There are manyhighly trained personnel whose major incomederives from providing evaluation services.Department chairs may well be cognizant of theseindividuals and requests to chairs for help mightinclude suggestions for individuals they haveworked with outside of the college or university.In addition, independent consultants can beidentified from the phone book, from vendor listskept by procurement offices in state departmentsof education and in local school systems, andeven from resource databases kept by someprivate foundations, such as the KelloggFoundation in Michigan.

Finally, suggestions for evaluators can beobtained from calls to other researchers orperusal of research and evaluation reports. Astrong personal recommendation and adiscussion of an evaluator's strengths andweaknesses from someone who has worked witha specific evaluator is very useful when starting anew evaluation venture.

MIR/NSF Evaluation I landbook9 2

85

Page 93: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SIX SELECTING AN EVALUATOR

REFERENCES

Although it may take a chain of telephone calls to getthe list started, most Principal Investigators will ulti-mately find that they have several different sources ofevaluation support from which to select. The criticaltask then becomes negotiating time, content, and, ofcourse, money.

Rossi, P. H. & Freeman, H. E. (1993). EvaluationASysternaticApproach (5th Edition). Newbury, CA: Sage.

86 EHR/NSF Evaluation Handbook

9

Page 94: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN: GLOSSARY

Accountability

Accuracy

Achievement

Adversarial/advocacy group

Affective

The responsibility for the justification of expenditures,decisions, or the results of one's own efforts.

The extent to which an evaluation is truthful or validin what it says about a program, project or material.

A manifested performance determined by some type ofassessment or testing.

A group of people who enter into cross-examination ofcounter plans, strategies, or outcomes.

Consists of emotions, feelings, and attitudes.

Algorithm A step-by-step problem-solving procedure.

Anonymity (provision for)

Assessment

Attitude

Attrition

Audience(s)

Background

Evaluator action to ensure that the identity of subjectscannot be ascertained during the course of a study, instudy reports, or in any other way.

Often used as a synonym for evaluation. The term issometimes recommended for restriction to processesthat are focused on quantitative and/or testing ap-proaches.

A person's mental set toward another person. thing, orstate.

Loss of subjects from the defined sample during thecourse of a longitudinal study.

Consumers of the evaluation; those who will or shouldread or hear of the evaluation, either during or at theend of the evaluation process. Includes those personswho will be guided by the evaluation in makingdecisions and all others who have a stake in theevaluation (see stakeholders).

The contextual information that describes the reasonsfor the project, its goals, objectives, and stakeholders'information needs.

Baseline Facts about the condition or performance of subjectsprior to treatment or intervention.

EFIR/NSF Evaluation Handbook 87

94

Page 95: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN GLOSSARY

Behavioral objectives

Bias

Case Study

Checklist approach

Client

Coding

Cognitive

Cohoil

Comparison group

Component

Conceptual scheme

Conclusions (of an evaluation)

Content analysis

Specifically stated terms of attainment io be checkedby observation, or test/measurement.

A consistent alignment with one point of view.

An intensive, detailed description and analysis of asingle project, program, or instructional material inthe context of its environment.

Checklists are the principal instrument for practicalevaluation; especially for investigating the thorough-ness of implementation.

The person or group or agency that commissioned theevaluation.

To translate a given set of data or items into machine-readable categories

The domain of knowledge"knowledge- that" or "knowl-edge-how."

A term used to designate one group among i 'any in astudy. For example, "the first cohort" may be the firstgroup to have participated in a training program.

A group that provides a basis for contrast with (inexperimentation) an experimental group (i.e. , the groupof people participating in the program or project beingevaluated). The comparison group is not subjected tothe treatment (independent variable), thus creating ameans for comparison with the experimental groupthat does receive the treatment. Comparison groupsshould be "comparable" to the treatment group, butcan be used when close matching is net possible (seealso Control Group).

A physically or temporally discrete part of a whole. Itis any segment that can be combined with others tomake a whole.

A set of concepts that generate hypotheses and sim-plify description.

Final judgments and recommendations.

A process of systematically determining the character-istics of a body of material or practices.

88 EHR/NSF Evaluation I landbOok

95

Page 96: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

GLOSSARY CHAPTER SEVEN

Control group

Correlation

Cost analysis

Cost-benefit analysis

Cost-effecfiveness

Criterion, criteria

Criterion-referenced test

Cross-sectional study

Delivery system

Dependent variable

Descriptive statistics

A group that does not. receive the treatment (service orproduct). The function of the control group is todetermine the extent to which the same effect occurswithout the treatment. The control group must beclosely matched to the experimental group.

A statistical measure of the degree of relationshipbetween variables.

The practical process of calculating the cost of some-thing that is being evaluated. Cost analysis looks at:(1) costs to whom; (2) costs of what type: and (3) costsduring what period.

This process estimates the overall cost and benefit ofeach alternative product or program.

This analysis determines what a program or proce-dure costs against what it does (effectiveness). Is thisproduct or program worth its costs?

A criterion (variable) is whatever is used to measure assuccess, e.g., grade point average.

Tests whose scores are interpreted by referral to welldefined domains of content or behaviors, rather thanby referral to the performance of some comparablegroup of people.

A cross-section is a random sample of a population,and a cross-sectional study examines this sample atone point in time. Successive cross-sectional studiescan be used as a substitute for a longitudinal study.For example, examining today's first year studentsand today's graduating seniors may enable the evalu-ator to infer that the college experience has producedor can be expected to accompany the difference be-tween them. The cross sectional study substitutestoday's seniors for a population that cannot be studieduntil 4 years later.

The link between the product or service and theimmediate consumer (the recipient population).

One that represents the outcomethe contrast iswith independent variables some of which can bemanipulated.

Those that involve summarizing, tabulating, organizing,and graphing data for the purpose of describing objects orindividuals that have been measured or observed.

El IR/NSF Evaluat ion 1 landbook 89

Page 97: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN GLOSSARY

Design

Dissemination

Effectiveness

Executive report

Executive summary

The process of stipulating the investigatoly proce-dures to be followed in doing a certain evaluation.

The process of communicating information to specificaudiences for the purpose of extending knowledgeand, in some cases, with a view to modifying policiesand practices.

Refers to the conclusion of a Goal Achievement Evalu-ation. "Success" is its rough equivalent.

An abbreviated report that has been tailored specifi-cally to address the concerns and questions of aperson whose function is to administer an educationalprogram or project.

A nontechnical summary statement designed to pro-vide a quick overview of the full-length report on whichit is based.

Experimental design The plan of an experiment, including selection ofsubjects who receive treatment and control group (ifapplicable), procedures, and statistical analyses to beperformed.

Experimental group The group that is receiving the treatment.

External evaluation Evaluation conducted by an evaluator from outsidethe organization within which the object of the studyis housed.

Extrapolate To infer an unknown from something that is known.(Statistical definitionto estimate the value of a vari-able outside its observed range.)

False positive When an event is predicted and it does not occur(Type I error).

False negative Wnen an event is not predicted and it occurs (TypeII error).

Feasibility The extern to which an evaluation is appropriate forimplementation in practical settings.

Field test The study of a program, project. or instructional materialin settings like those where it is to be used. Field testsmay range from preliminary primitive investigations tofull-scale summative studies.

90 EHR/NSF Evaluation Handbook

7

Page 98: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

GLOSSARY CHAPTER SEVEN

Flow chart

Focus group

Formative evaluation

Gain scores

Generalizability

A graphic representation of a set of decisions that is setup to guide the management of projects, includingevaluation projects.

A group selected for its relevance to an evaluation thatis engaged by a trained facilitator in a series ofdiscussions designed for sharing insights, ideas, andobservations on a topic of concern.

E Jaluation designed and used to improve an interven-tion, especially when it is still being developed.

The difference between a student's performance on atest and his or her performance on a previous admin-istration of the same or parallel test.

The extent to which information about a program,project, or instructional material collected in onesetting can be used to reach a valid judgment abouthow it will perform in other settings.

Goal-free evaluation E uation of outcomes in which the evaluator func-tions without knowledge of the purposes or goals.

Hawthorne effect The tendency of a person or group being investigatedto perfurm better (or worse) than they would in theabsence of the investigation, thus making it difficult toidentify treatment effects.

Hypothesis testing The standard model of the classical approach toscientific research in which a hypothesis is formulatedbefore the experiment to test its truth. The results arestated in probability terms that the results were duesolely to chance. The significance level of one chancein 20 (.05) or one chance in 100 (.01) is a high degreeof improbability.

Impact evaluation An evaluation focused on outcomes or pay-off.

Implementation evaluation Assessing program deliveiy (a subset of FormativeEvaluation).

Indicator

Inferendal statistics

A factor, variable, or observation that is empiricallyconnected with the criterion variable, a correlate. Forexample, judgment by students that a course has beenvaluable to them for pre-professional training is anindicator of that value.

These statistics are inferred from characteristics ofsamples to characteristics of the population fromwhich the sample comes.

EHR/NSF Evaluation Handbook 91

98

Page 99: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN GLOSSARY

Informed consent

Instrument

Agreement by the participants in an evaluation of theuse of their names and/or confidential informationsupplied by them in specified ways, for stated pur-poses, and in light of possible consequences prior tothe collection and/or release of this information inevaluation reports.

An assessment device (test, questionnaire, protocol,etc.) adopted, adapted, or constructed for the purposeof the evaluation.

Interaction Two factors or variables interact if the effect of one, onthe phenomenon being studied, depends upon themagnitude of the other. For example, mathematicseducation interacts with age, being mere or lesseffective depending upon the age of the child.

Internal evaluator Internal evaluations are those done by project staff,even if they are special evaluation staff, that is,external to the production/writing/ teaching/servicepart of the project.

Level of significance The probability that the observed difference occurredby chance.

Longitudinal study An investigation or study in which a particular indi-vidual or group of individuals is followed over asubstantial period of time to discover changes due tothe influence of the treatment, or maturation, orenvironment.

Mastery level The level of performance needed on a criterion. Themastery level is often arbitrary.

Matching

Matrix

Mean

An experimental procedure in which the subjects areso divided, by means other than lottery, that thegroups are regarded for the purposes at hand to be ofequal merit or ability. (Often matched groups arecreated by ensuring that they are the same or nearlyso on such variables as sex, age, grade point averages,and past test scores.)

An arrangement of rows and c9Iumns used to displaycomponents of evaluation design.

Also called "average" or arithmetic average. For acollection of raw test scores, the mean score is ob-tained by adding all scores and chviding by the numberof people taking the test.

92 EHR/NSF' Evaluation Handbook

Page 100: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

GLOSSARY CHAPTER SEVEN

Measurement Determination of the magnitude of a quantity.

Median

Meta-analysis

The point in a distribution which divides the group intotwo, as nearly as possible. For example, in a scoredistribution, half the scores fall above the median andhalf fall below.

The name for a particular approach to synthesizingquantitative studies on a common topic. involving thecalibration of a specific parameter for each ("effect size").

Metric data Data which includes a unit of measuremcit (i.e.,dollars, inches).

Mode The value which occurs more often than any other.If all scores (in a score distribution) occur with thesame frequency, there is no moae. If the two highestscore values occur with the same frequency, thereare two modes.

Needs assessment Using a diagnostic definition, need is anything essen-tial for a satisfactory mode of existence or level ofperformance. The essential point of a needs assess-ment for evaluation is the identification of perfor-mance needs.

Nominal data Data which consist of categories only without order tothese categories (i.e., region of the country, coursesoffered by an instructional program).

"No significant difference"

Nonreactive measures

Norm

Norm-referenced tests

Objective

Observation

Operational definition

A decision that an observed difference between twostatistics occurred by chance.

Assessments done without the awareness of thosebeing assessed.

single value, or a distribution of values, constitutingthe typical performance of a given group.

Tests that measure the relative performance of theindividual or group by comparison with the performanceof other individuals or groups taking the same test.

A specific description of an intended outcome.

The process of direct sensory impection involvingtrained observers.

A defmition of a term or object achieved by staling theoperations or procedures employed to distinguish itfrom others.

EHR/NSF Evaluation Handbook mo 93

Page 101: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN GLOSSARY

Ordered data Non-numeric data in ordered categories (for example,students performance being categorized as excellent,good, adequate, and poor).

Outcome Post-treatment or post-intervention effects.

Paradigm A general conception of or model for a discipline orsubdiscipline which maybe very influential in shapingits development. (For example, "The classical socialscience paradigm in evaluation.")

Peer review Evaluation done by a panel of judges with qualifica-tions approximating those of Lhe author ot candidate.

Performance-based The use of global ratings of behavior assessmentwhich is a movement away from paper-and-penciltesting. This assessment is costly and there may be aloss of validity and reliability.

Pilot test A brief and simplified preliminary study designed totry out methods to learn whether a proposed project orprogram seems likely to yield valuable results.

Planning evaluation Evaluation planning is necessary before a programbegins, both to get baseline data, and to evaluate theprogram plan, at least for evaluability. Planningavoids designing a program that is unevaluable.

Population All persons in a particular group.

Post test A test to determine performance after the administra-tion of a program, project, or instructional material.

Pretest A test to determine performance prior to the adminis-tration of a program, project. or instructional material.Pretests serve two purposes: diagnostic and baseline.Also the use of an instrument (questionnaire, test,observation schedule) with a small group to detectneed for revisions.

Process evaluation

Product

Program

Progress evaluation

Refers to the evaluation of the treatment or interven-tion. ;t focuses entirely on the variables between inputand output.

A pedagogical process or material coming from re-search and development.

The general effort that marshals staff and projectstoward defined and funded goals.

A subset of Formative Evaluation.

94 EHR/NSF Evaluation Handbook

101

Page 102: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

GLOSSARY CHAPTER SEVEN

Prompt

Qualitative evaluation

Quantitative evaluation

Quasi-experimental

Random

Random sampling

Recommendations

Reliability

Remediation

Replication

Research

Response bias

Sample

Sample bias

'Sampling error

Secondary data analysis

Reminders used by interviewers to obtain completeanswers.

The part of the evaluation that is primarily descriptiveand interpretative, and may or may not lend itself toquantitative treatment.

An approach involving the use ofnumerical measurementand data analysis based on statistical methods.

When a random allocation of subjects to experimentaland control groups can not be done, a quasi-experi-mental design can seek to simulate a true experimen-tal design, by identifying a group that closely matchesthe experimental group.

Affected by chance.

Drawing a number of items of any sort from a largergroup or population so that every individual item hasa specified probability of being chosen.

Suggestions for specific appropriate actions basedupon analytic approaches to the program components.

Statistical reliability is the consistency of the readingsfrom a scientific instrument or human judge.

The process of improvement or a recommendation fora course of action or treatment that will result inimprovement.

Repeating an intervention or evaluation with all essen-tials unchanged. Replications are often difficult toevaluate because of changes in design or execution.

The general field of disciplined investigation.

Error due to incorrect answers.

A part of a population.

Error due to non-response or incomplete responsefrom selected sample subjects.

Error due to using a sample instead of entire popula-tion from which sample is drawn.

A reanalysis of data using the same or other appropri-ate procedures to verify the accuracy of the results ofthe initial analysis or for answering different questions.

EHR/NSF Evaluation Handbook

Lrrmr l 2 95

Page 103: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER SEVEN GLOSSARY

Seff-administered instrument

Self-ieport instilment

A questionnaire or report completed by a study partici-pant without the assistance of an interviewer.

A device in which persons make and report judgmentsabout the functioning of their project, program, orinstructional material.

Significance Overall significance represents the total synthesis ofall you have learned about the merit or worth of theprogram or project. This is different from statisticalsignificance which may be testing one of severalconditions of a program or project.

Stakeholder A program's stakeholder is one who has credibility,power or other capital invested in the project, andthus c An be held to be to some degree at risk with it.

Standard deviation A measure of the spread of a variable, based ondeviation from the mean value for metric data.

Standardized tests Tests that have standardized instructions for admin-istration, use, scoring, and interpretation with standardprinted forms and content. They are usually norm-referenced tests but can also be criterion-referenced.

Statistic A summary number that is typically used to describea characteristic of a sample.

Strategy A systematic plan of action to reach predefined goals.

Summary A short restatement of the main points of a report.

Summative evaluation Evaluation designed to present conclusions about themerit or worth of an intervention and recommenda-tions about whether it should be retained, altered, oreliminated.

Time series study A study in which periodic measurements are obtainedprior to, during, and following the introduction of anintervention or treatment in order to reach conclu-sions about the effect of the intervention.

Treatment Whatever is being investigated; in particular, what-ever is being applied or supplied to, or done by, theexperimental groups that is intended to distinguishthem from the comparison groups.

Triangulation In an evaluation, it is an attempt to get a fix on aphenomenon or measurement by approaching it viaseveral independent routes. It can be more than threeroutes. This effort provides redundant measurement.

96 EHR/NSF' Evaluation Handbook

103

Page 104: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

GLOSSARY CHAPTER SEVEN

Unanticipated outcomes

Utility

Utilization (of evaluations)

A result of a program or interview that was unex-pected. Often used a,s a synonym for side-effects, butonly a loose cquivalent.

The extent to which an evaluation produces anddisseminates reports that inform relevant audiencesand have beneficial impact on their work.

Use and impact are terms used as substitutes forutilization. Sometimes seen as the equivalent of imple-mentation, but this applies only to evaluations whichcontain recommendations.

Validity The soundness of the use and interpretation of ameasure.

SOURCES

Jaeger, R. M. (1990). Statistics A Spectator Sport.Newbury Park, CA: Sage.

Joint Committee on Standards for EducationalEvaluations (1981). Standards for Evaluation ofEducational Programs. Projects and Materials. NewYork, NY: McGraw Hill.

Rogers, E. M. (19831. Diffusion of Innovations. NewYork, NY: Free Press.

Scriven, M. (1991). Evaluation Thesaurus. FourthEdition. Newbury Park, CA: Sage.

Authors of Chapters 1-4.

El IR/NSF Evaluation Handbook l 497

Page 105: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER EIGHT: ANNOTATED BIBLIOGRAPHY

In selecting publications for inclusion in this short bibliography, an effort was made toincorporate those which can be useful for evaluators who want to find informationrelevant to the tasks they will be faced with and which this brief handbook could notcover in depth. Thus, we have not necessarily included all books which might be on areading list in a graduate course dealing with educational evaluation, but have selectedthose which NSF/HRD grantees should find most useful. This includes most of thosereferenced in Chapters 1-4, as well as others which provide perspectives and methodsnot extensively covered in the handbook.

We have divided this bibliography into two sections. Section One includes books thatdeal in large part with the broad topics of theory and practice in the field of evaluation,with emphasis on those which deal specifically with education. Section Two is morenarrowly focused on technical topics and/or single issues. Of course, this is not an air-tight division: many of the broad-based works contain a great deal of technicalinformation and hands-on advice.

Section One: Meaty and Practice

House, Ernest R. (1993). Professional Evaluation Social Impact and PoliticalConsequences. Newbury Park, CA: Sage.

The author is a professor in the School of Education at the University of Colorado andan experienced evaluator of major social and educational programs. In the author's ownwords, this book: "is about analyzing the social, political, economic, historical, andcultural influences on evaluation." It provides a thoughtful overview of the field'sevolution, from its earlier reliance on "value-free" experimental and primarily quantita-tive methods to the current emphasis on methods more appropriate to the diverse,multicultural and politically charged environment in which social and educationalprograms operate.

Rossi, Peter H. & Freeman, Howard E. (1993). Evaluation A SystematicApproach (5th Edition). Newbury Park, CA: Sage.

This is the newest edition of one of the most comprehensive and widely used texts aboutevaluation. It provides extensive and sophisticated discussions of all aspects ofdesigning and assessing the implementation and utility of social programs. Many of theprojects cited and discussed in this volume deal with educational programs andinnovations, although the bulk of the programs with which these authors had first handexperience were in the fields of housing, health services, and criminal justice. Most tasksthat evaluators are likely to be asked to perform, and most problems they will have todeal with, technical as well as political, are covered here. The authors adhere to the socialscience model in their approach to evaluation, with clear preference for randomized andquasi-experimental designs, but they also cover other evaluation methods, including theuse of qualitative and judgmental approaches.

EHR/NSP Evaluation Handbook 99

1 )5

Page 106: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER EIGHT ANNOTATED BIBLIOGRAPHY

Worthen, Blaine R. & Sanders, James R. (1987). Educational Evaluation:Alternative Approaches and Practical Guidelines. White Plains, NY: Longman.This book was designed primarily as a basic text for graduate courses in evaluation, orrelated administration, curriculum, or teacher education courses, where efforts aremade to teach practitioners how to assess the effectiveness of their educationalendeavors. It seeks to familiarize readers with alternative approaches for planningevaluations, and provides step-by-step practical guidelines for conducting them. Thebook is very systematically organized, describes at length all evaluation approacheswhich have been developed over the years, and includes a great deal of usefulinformation, especially for the inexperienced evaluator. A detailed, natui alistic descrip-tion of the conduct of an evaluation program, including problems encountered withschool staff, other stakeholders, and administrators, provides a useful example of "realworld" issues in evaluation.

Scriven, Michael (1991). Evaluation Thesaurus (4th Edition). Newbury Park, CA:Sage.

A highly original, wide ranging collection of ideas, concepts, positions and techniqueswhich reflect the critical, incisive, and often unconventional views held by this leader inthe field of evaluation. lt contains a 40-page introductory essay on the nature ofevaluation and nearly 1,000 entries which range from one paragraph definitions of technicalterms and acronyms to philosophical and methodological discussions extending overmany pages. The Thesaurus is not focused on the field of education, but it providesexcellent coverage of issues and concepts of interest to educational evaluators.

Guba, E. G. & Lincoln, Y. S. (1989). Fourth Generation Evaluation. Newbury Park,CA: Sage.

The authors propose a monumental shift in evaluation practice, advocating theconstructionist position in its most extreme form. Guba and Lincoln describe problemsfaced by previous generations of evaluatorspolitics, ethical dilemmas, imperfectionsand gaps, inclusive deductionsand lay the blame for failure and nonutilization at thefeet of the unquestioned reliance on the scientific/positivist paradigm of research.Fourth generation evaluation moves beyond science to inclukte the myriad human,political, social, cultural, and contextual elements that are involved in the evaluationprocess.

The book describes the differences between the earlier (and still widely used) evaluationmodel, based on positivist/scientific assumptions and statistical technkues, and the"naturalistic" approach to evaluation, and outlines methodological guidelines tbr theconduct of naturalistic evaluations.

100 10EHR/NSF Evaluation Handbook

Page 107: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

ANNOTATED BIBLIOGRAPHY CHAPTER EIGHT

Joint Committee on Standards for Educational Evaluation (1981). Standards forEvaluations of Educational Programs, Projects, and Materials. New York, NY:McGraw Hill.

For more than 10 years this volume has provided guidance to educational evaluators.It contains 30 standards ior program evaluation along with examples of their applica-tion. It is the authoritative reference on principles of good and ethical program evaluation.Sponsored by several professional associations concerned with the quality of programevaluations in education, the Joint Committee has defined state-of-the-art principles,guidelines, and illustrative cases that can be used to judge the quality of any evaluation.The standards fall into four categories: utility, feasibility, propriety, and accuracy.

Patton, Michael Q. (1986). Utilization-Focused Evaluation (2nd Edition).Newbury Park, CA: Sage.

In a book that combines both the theoretical and the practical, Patton examines how andwhy to conduct el aluations. In this revised and updated edition, the author providespractical advice grounded in evaluation theory and practice, and shows how to conductevaluations from beginning to end in ways that will be usefuland used. This volumediscusses the ferment and changes in evaluation during the eighties and the tremen-dous growth of "in-house" evaluations conducted by internal evaluators. Patton alsodiscusses a methodological synthesis of the "qualitative versus the quantitative"methods debate, as well as the cross-cultural development of evaluation as aninternationally recognized profession. Presented in an integrated framework, the authordiscusses topics such as utilization built on a new, concise definition of evaluation whichemphasizes providing useful and usable information to specific people.

Section Two: Technical Topics

Jaeger, R. M. (1990). Statistics: A Spectator Sport (2nd Edition). Newbury Park,CA: Sage.This book takes the reader to the point of understanding advanced statisUcs withoutintroducing complex fort (atlas or equations. It covers most of the statistical concepts andtechniques which evaluators commonly use in the design and analysis of evaluationstudies, and most of the examples and illustrations are from actual studies performedin the field of education. The topics included range from descriptive statistics, includingmeasures of central tendency and fundamentals of measurement, to inferentialstatistics and advanced analytic methods.

Campbell, Donald T. & Stanley, Julian C. (1966). Experimental and Quasi-experimental Designs for Research. Boston, MA: Houghton Mifflin.

This slim (84 pages) volume is a slightly enlarged version of the chapter originallypublished in the 1963 Handbook of Research on Teaching and is considered theclassic text on valid experimental and quasi-experimental designs in real worldsituations where the experimenter has very limited control over the environment. To thisday, it is the most useful basic reference book for evaluators who plan the use of such designs.

EHR/NSP EvAuation Handbook 101

Page 108: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

CHAPTER EIGHT ANNOTATED BIBLIOGRAPHY

Herman, Joan L. (Editor, 1987). Program Evaluation Kit. Newbury Park, CA: Sage.

This kit, prepared by the Center for the Study of Evaluation at the Univeray of California,Los Angeles, contains nine books written to guide and assist evaluators in planning andexecuting evaluations, with emphasis on practical, field-tested step-by-step proceduresand with considerable attention to the management of each phase. The kit makes heavyuse of charts, illustrations, and examples to clarify the material for novice evaluators.Volume 1, Evaluator's Handbook, provides an overview of evaluation activities,describes the evaluation perspective which guides the kit, and discusses specific'procedures for conducting Formative and Summative evaluations. The remaining eightvolumes deal with specific topics:

Volume '1\vo How to Focus an Evaluation

Volume Three How to Design a Program Evaluation

Volume Four How to Use Quantitative Methods in Evaluation

Volume Five How to Assess Program Implementation

Volume Six How to Measure Attitudes

Volume Seven How to Measure Performance and Use Tests

Volume Eight Hot to Analyze Data

Volume Nine How to Communicate Evaluation Findings

Depending on their needs, evaluators will ilnd every one of these volumes useful. VolumeSeven, How to Measure Performance and Use Tests, covers a topic for which we havenot located another suitable text for inclusion in this bibliography.

The Kit can be purchased as a unit, or by ordering individual volumes,

Yin, Robert K. (1989). Case Study Research: Design and Method. NewburyPark, CA: Sage.

The author's backgrourkl in experimental psychology may explain the emphasis in thisbook on the use of rigorous methods in the conduct and analysis of case studies, thusminimizing what many believe is a spurious distinction between quantitative andqualitative studies. While arguing eloquently that case studies are an important toolwhen an investigator (or evaluator) has little control over events and when the focus ison a contemporary phencmenon within sonie real-life context, the author insists thatcase studies be designed and analyzed so as to provide generalizable findings. Althoughthe focus is on design and analysis, data collection and report writing are also covered.

102

1 0 SEHR/NSF Evaluation Handbook

Page 109: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

ANNOTATED BIBLIOGRAPHY CHAPTER EIGHT

Fowler, Floyd J., Jr. (1993). Survey Research Methods (2nd Edition). NewburyPark, CA: Sage.

Using non-technical language, the author has provided a comprehensive discussion ofsurvey design (including sampling, data collection methods, and the design of surveyquestions) and procedures which constitute good survey practice, including attentionto data quality and ethical issues. According to the author, "this book is intended toprovide perspective and understanding to those who would be designers or users ofsurvey research, at the same time as it provides a sound first step for those who actuallymay go about collecting data."

Stewart, David W. & Shamdasani, Prem N. (1990). Focus Groups: Theaty andPractice. Newbury Park, CA: Sage.This book differs from many others published in recent years which address primarilytechniques of recruiting participants and the actual conduct of focus group sessions.This volume pays considerable attention to the fact that focus groups are by definitionan exercise in group dynamics, which must be taken into account when conductingsuch groups and especially when interpreting the results obtained. However, it alsocovers very adequately practical issues such as recruitment of participants, the role ofthe moderator, and appropriate techniques for data analysis.

Linn, Robert L., Baker, Eva L., & Dunbar, Stephen B. (1991). Complex,performance-based assessment: expectations and validation criteria.Educational Researcher, (Vol. 20, No. 8, pp. 15-22).

In recent years there has been an increasing emphasis on assessment results, as wellas increasing concern about the nature of the most widely used forms of studentassessment and uses that are made of the results. These conflicting forces have helpedcreate a burgeoning interest in alternative forms of assessments, particularly complex,performance-based assessments. It is argued that there is a need to rethink the criteriaby which the quality of educational assessments are judged, and a set of criteria thatare sensitive to some of the expectations for performance-based assessments isproposed.

EHR/NSE Evaluation Handbook 1i19 103

Page 110: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

STISThe Science & Technology Information System (STIS)

at the National Science Foundation

What is STIS?STIS is an electronic dissemination system that provides fast,easy access to National Science Foundation (NSF) publications.There is no cost to you except for possible long-distance phonecharges. The service is available 24 hours a day, except for briefweekly maintenance periods.

What Publications are Available?Publications currently available include:

I The NSF BulletinProgram announcements and "Dear Colleague" lettersGeneral publications and reportsPress releases, Other NSF news itemsNSF organizational and alphabetical phone directoriesNSF vacancy announcements

Il Award abstracts (1989-now)

Our goal is for all printed publications to be available electronically.

Access MethodsThere are many ways to access STIS. Choose the method thatmeets your needs and the communication facilities you haveavailable.

Elearonic Documents Via E-Mail. If you have access to Internetor BITNET e-mail, you can send a specially formatted message,and the document you request will be automatically returned toyou via e-mail.

Anonymous FTP. Internet users who are familiar with this filetransfer method can quickly and easily transfer STIS documentsto their local system for browsing and printing.

On-Line STIS. If you have a VT100 emulator and an Internetconnection or a modem, you can log on to the on-line system. Theon-line system features full-text search and retrieval software tohelp you locate the documents and award abstracts that are ofinterest to you. Once you locate a document, you can browsethrough it on-line or download it using the Kermit protocol orrequest that it be mailed to you.

Direct E-Mail. You can request that STIS keep you informed, viae-mail, of all new documents on STIS. You can elect to get eithera summary or the full text of new documents.

Internet Gopher and WAIS. If your campus has access to theseInternet information resources, you can use your local client soft-ware to search and download NSF publications. If you have thecapability, it is the easiest way to access STIS.

1 1 0

Getting Started with Documents Via E-MailSend a message to [email protected] (Internet) or stisserv@NSF(BITNET). The text of the message should be as follows (theSubject line is ignored):

get index

You will receive a list of all the documents on STIS and instruc-tions for retrieving them. Please note that all requests for electron-ic documents should be sent to stisserv, as shown above.Requests for printed publications should be sent to [email protected](Internet) or pubs@NSF (BITNET).

Getting Started with Anonymous FTPFTP to stis.nsf.gov. Enter anonymous for the usemame, and your E-mail address for the password. Retrieve the file "index". This con-tains a list of the files available on STIS and additional instructions.

Getting Started with The On-Line Systemif you are on the Internet: telnet stis.nsf.gov. At the loginprompt, enter public.

If you are dialing in with a modem: Choose 1200, 2400, or9600 baud, 7-E-1. Dial (202) 357-0359 or (202) 357-0360.

[(703) 306-0212 or (703) 306-0213]**

When connected, press Enter. At the login prompt, enter public.

Getting Started with Direct E-MailSend an E-mail message to [email protected] (Internet) or stis-serv@NSF (BITNET). Put the following in the text:

get stisdirm

You will receive instructions for this service.

Getting Started with Gopher and WAISThe NSF Gopher server is on port 70 of stis.nsf.gov. The WAISserver is also on stis.nsf.gov.). You can get the ".src" file from the"Directory of Servers" at quake.think.com. For further inforrna-tion en using Gopher and WAIS contact your local computer sup-port organization.

For Additional Assistance Contact:E-mail:

Phone:TDD:

[email protected] (Internet)stis@NSF (BITNET)202-357-7555 (voice mail) [(703) 306-02141**202-357-7492 [(703) 306-0090]**

**These numbers become effective November 15, 1993.

NSF 91-10 Revised 8/31/93

Page 111: DOCUMENT RESUME ED 366 511 AUTHOR Stevens ...DOCUMENT RESUME SE 054 163 Stevens, Floraline; And Others User-Friendly Handbook for Project Evaluation: Science, Mathematics, Engineering,

NATIONAL SCIENCE FOUNDATION4201 Wilson Blvd.Arlington, VA. 22230(703) 306-1650

00145173 ERICERIC FACILITY1301 PICCARD DRIVESUITE 300ROCKVILLE MD 20850-4305

BULK RATEPOSTAGE & FEES PAID

National Science FoundationPermit Non G-69

NSF 93-152(NEW)