measurement practice in evaluation capacity …

MEASUREMENT PRACTICE IN EVALUATION

CAPACITY BUILDING

Roy Guanco Ponce

Master of Assessment and Evaluation, Bachelor of Science in Statistics

Graduate Diploma in Econometrics

Submitted in partial fulfilment of the requirements of the degree of

Doctor of Education

December 2014

Centre for Program Evaluation

Melbourne Graduate School of Education

The University of Melbourne

ii

ABSTRACT

Evaluation Capacity Building (ECB) aims to enable individuals and organizations to

adopt the concepts and practices of evaluation. Its purpose is to mainstream the

generation and utilization of evaluation information within organizational systems and

structures. It is important in that it mediates the true impact of evaluation to

organizational outcomes. The aim of this thesis is to explore measurement practice in

relation to content, implementation and context, and to examine how outcomes are

measured in ECB initiatives. When ECB is viewed as a learning intervention, it calls

for a progressive approach to measurement. This means that ECB outcomes

measurement must consider the ECB content developmental proficiency. This study

used the Broadbased Research Synthesis Method and examined sixty-three (63)

published ECB reports with respect to content, implementation, context and

measurement practices in ECB. Item Response Theory Analysis documented ECB

content construct and hierarchical structure and Exploratory Factor Analysis revealed

ECB content sub-domains. The findings illustrate that ECB content topics delivered in

practice fit this developmental progression continuum. The main contribution of this

study to the field of Evaluation, in particular in the area of measurement in Evaluation

Capacity Building, is the introduction of the notion that ECB is a learning

intervention. Through this assumption, it was demonstrated that ECB follows a

developmental proficiency construct. This study has clearly established that ECB can

be viewed as a learning progression. Based on this perspective, it has set a case to

reframe ECB content, implementation and measurement practice. It is suggested that

future ECB initiatives utilize this alternative framework for ECB content delivery and

measurement of ECB outcomes.

iii

DECLARATION

This thesis contains no material which has been accepted for any other degree in any

university. Furthermore, to the best of my knowledge and belief, this thesis contains

no material previously published or written by any other person, except where due

reference is given in the text.

Signature:

Roy G. Ponce

iv

ACKNOWLEDGMENTS

I express my deepest appreciation and gratitude to the following persons who assisted in

the completion of this thesis:

Associate Professor Janet May Clinton and Dr. Amy Marie Gullickson, my supervisors,

whose critical feedback, guidance, encouragement and gentle forbearance made me

understand and appreciate the challenges and rewards of conducting research.

Professor John Hattie, chairman of my thesis committee, for the efficient facilitation of

the progress meetings and for investing time to examine critically my data analysis and

thesis argument. Dr. Ghislain Arbour, member of my thesis committee, who challenged

me to think outside the box and consider other perspectives.

My professional circle whose assistance, critical inputs and suggestions reduced the

isolation of this research journey: Dr. Edito Sumile, Dr. Amanda Bayliss, Dennis Alonzo,

Timoci O’Connor, Brad Astbury, Daniel Arifin, and Marion Joy Brown.

My family and friends, whose love and support are my inspiration: Papang Valentin,

Mamang Edita, May, Amy and Jonathan, Elias and Gina, Lee and Helen, Omar and

Melissa, Joeven and Arbeth, Bing and Marl, Hannah, Eden, Earvs and Norm.

The people of Australia for the Australia Awards Scholarship grant through the Philippine

Australia Human Resource Development Facility and the Davao Oriental State College of

Science and Technology. The Melbourne Graduate School of Education and Centre for

Program Evaluation of the University of Melbourne for the excellent student support

services and demonstrating the values of “Growing Esteem”.

May GOD, the source of all wisdom and understanding, in whom I believe, remember

and bless you for all your kindness and goodness.

v

Table of Contents

Abstract ………………………………………………………………………... ii

Declaration ……………………………………………………………………. iii

Acknowledgments …………………………………………………………….. iv

Table of Contents …………………………………………………………….. v

List of Tables ………………………………………………………………….. viii

List of Figures ………………………………………………………………… x

List of Appendices ……………………………………………………………. xi

CHAPTER Page

1. INTRODUCTION ……………………………………………….. 1

Statement of the Problem ……………………………………… 4

Purpose of the Study …………………………………………... 5

Aims of the Study ……………………………………………... 6

The Research Hypothesis ……………………………………… 7

Research Questions ……………………………………………. 7

Significance of the Study ……………………………………… 8

Limitations …………………………………………………….. 8

Outline of the Thesis …………………………………………... 9

2. REVIEW OF LITERATURE …………………………….............. 10

Overview of the Chapter ………………………………………. 10

The Study Perspective: Learning Intervention ………………... 11

Assessment of Learning and ECB Measurement ……………... 12

The Emergence of ECB ……………………………………….. 14

ECB Definitions ……………………………………………….. 17

Evaluation Approaches and ECB……………………………… 22

Program Theory and ECB ……………………………………... 24

Measurement in Evaluation and ECB …………………………. 37

Measurement in ECB ………………………………………….. 41

Knowledge Gaps: The Case for Investigating ECB

Measurement Practice ………………………………………….

46

vi

3. RESEARCH DESIGN …………………………………………… 48


The Problem and Research Questions ………………………… 48

Research Design ………………………………………………. 49

Conceptual Framework of the Study ………………………….. 55

Research Instrument Development ……………………………. 59

Sources of Potential Error ……………………………………... 63

Data Management ……………………………………………... 64

Statistical Analysis …………………………………………….. 65

Role of the Researcher ………………………………………… 67

Ethical Concerns ………………………………………………. 68

Conclusion …………………………………………………….. 68

4. RESULTS AND ANALYSIS …………………………………….. 69


The Sample Profile ……………………………………………. 71

ECB Contextual Profile ……………………………………….. 77

Research Question 1: How can ECB measurement practice be

described from empirical evidence? …………………………

84

Research Question 1A: What are the content and

implementation approaches of ECBs found in published ECB

reports? …………………………………………………………

84

Answer to Research Question 1A ……………………………... 100

Research Question 1B: What is the rigor of measurement

practice in published ECB reports? ……………………………

101

Answer to Research Question 1B ……………………………... 108

Research Question 1C: What determines practice of measuring

ECB outcomes? ………………………………………………...

109

Answer to Research Question 1C ……………………………... 115

Answer to Research Question 1 ……………………………….. 115

Research Question 2 …………………………………………... 117

Item Response Theory (IRT) Analysis ……………………… 118

What is being measured in ECB? ……………………………... 124

Factor Analysis: Multidimensional Assumption ……………… 131

vii

ECB Content and Decision to Measure ……………………….. 140

Answer to Research Question 2 ……………………………….. 141

Chapter Conclusion ……………………………………………. 142

5. SYNTHESIS AND CONCLUSION ……………………………... 144


Contribution of the Study ……………………………………... 144

Limitations of the Study ………………………………………. 152

Future Research Directions ……………………………………. 152

Conclusion …………………………………………………….. 153

REFERENCES ………………………………………………………... 154

APPENDICES…………………………………………………………. 159

viii

List of Tables

TABLE Page

2.1 Conceptual Components of ECB Definitions …………………… 20

2.2 Participation-oriented Evaluation Approaches ………………….. 23

2.3 Three-Component Framework for ECB ………………………… 31

2.4 ECB Assessment Instruments …………………………………… 42

3.1 Information Sources Search ……………………………………... 52

3.2 Evaluation Capacity Building Content, Implementation and

Context Variables ………………………………………………..

58

3.3 Developmental Model Proficiency as Applied to Rigor of ECB

Measurement Instrument ………………………………………...

62

4.1 Case Sample of this Study and the Labin et al. (2012) Sample ….. 72

4.2 Journals that Published ECB Case Reports in the Sample ………. 73

4.3 Countries where ECB Case Reports were Conducted …………… 76

4.4 Distribution of ECB Domain …………………………………….. 78

4.5 Type of Organization …………………………………………….. 78

4.6 Type of Program Delivered ……………………………………… 79

4.7 Number of Organizations in an ECB Activity …………………… 80

4.8 Number of Programs in an ECB Activity ………………………... 81

4.9 Number of Program Sites ………………………………………... 81

4.10 Affiliation of ECB Facilitators …………………………………... 82

4.11 ECB Case Report Methodological Paradigm ……………………. 83

4.12 Intended Target of ECB …………………………………………. 93

4.13 Participant Focus of ECB ………………………………………... 94

4.14 Type of ECB Teaching Strategies ……………………………….. 94

4.15 Mode of Strategies Reported …………………………………….. 98

4.16 ECB Contact Duration …………………………………………… 99

4.17 Rigor of ECB Measurement Practice …………………………….. 103

4.18 Rigor of ECB Measurement Practice…………………………….. 108

4.19 Simple Logistic Regression Analysis: Publication Profile and

Decision to Measure ……………………………………………..

111

4.20 Simple Logistic Regression Analysis: ECB Context Profile and

ix

Decision to Measure ……………………………………………... 112

4.21 Simple Logistic Regression Analysis: Implementation and

Decision to Measure ……………………………………………...

113

4.22 Topic Number List and Levels of Developmental Proficiency ….. 122

4.23 Item Mean Square Fit Statistics ………………………………….. 124

4.24 Presence-Absence Matrix of Reported ECB Outcomes with

Reference to ECB Content ………………………………………..

127

4.25 Guttman Ordering of Reported ECB Outcomes with Reference to

ECB Content ……………………………………………………...

129

4.26 Some Process and Outcome Areas Measured in Reported ECBs .. 130

4.27 Structure Matrix of the ECB Content Using Maximum

Likelihood Method of Extraction and Direct Oblimin Rotation…

133

4.28 Factor Correlation Matrix ………………………………………... 134

4.29 Sub-domain Groupings for ECB Context and IRT Hierarchy

Classification ……………………………………………………..

137

4.30 Simple Logistic Regression Analysis: ECB Content and Decision

to Measure ………………………………………………………...

140

x

List of Figures

FIGURE Page

2.1 A Five-Step Approach to Developmental Assessment, Learning

and Teaching ……………………………………………………..

14

2.2 Evaluation Timeline: Development and Institutions …………….. 16

2.3 Integrated Evaluation Capacity Building Model ………………… 25

2.4 Multidisciplinary Model of ECB ………………………………… 30

2.5 Logic Model for ECB Theory of Change ………………………... 32

2.6 Model for Measuring Evaluation Capacity ………………………. 34

2.7 Evaluation Capacity Index ……………………………………….. 35

3.1 Multistage Selection Process …………………………………….. 53

3.2 Analysis Diagram and Research Questions Map ………………… 57

4.1 Analysis Diagram and Research Questions Map ………………… 70

4.2 Timeline and Distribution of Published ECB Case Reports ……... 74

4.3 Venn-Diagram of the Methodological Paradigms of ECB Reports 83

4.4 ECB Content Targeting Individual Level Capacity (N = 63) ……. 87

4.5 ECB Content Targeting Organizational Level Capacity (N=63) … 88

4.6 Venn-Diagram of the Capacity Change Target of ECB Reports … 89

4.7 Integrated Evaluation Capacity Building Model ………………… 92

4.8 Pairwise Combination of ECB Strategies ………………………... 96

4.9 ECB Outcomes Measurement ……………………………………. 101

4.10 ECB Teaching Strategies ………………………………………… 114

4.11 ECB Cases and ECB Developmental Proficiency ……………….. 120

4.12 ECB Outcomes Measurement …………………………………… 125

4.13 Scree Plot for ECB Content ……………………………………... 132

4.14 ECB Content Sub-Domains Frequencies ………………………... 139

xi

List of Appendices

APPENDIX Page

Appendix A List of Published ECB Cases in the Study Sample 158

Appendix B Coding Form 166

1

CHAPTER 1

INTRODUCTION

Evaluation Capacity Building (ECB) seeks to enable individuals and

organizations to adopt the concepts and practices of evaluation. Its purpose is to

mainstream the generation and utilization of evaluation information within

organizational systems and structures. ECBs are mostly aimed at improving

organizational accountability, organizational learning, and program outcomes.

Moreover, there is now a collective understanding that ECB is an intentional process

and that the ultimate goal is sustainable evaluation within the organization (Labin,

2008; Stockdill, Baizerman, & Compton, 2002).

This study takes the perspective that ECB is a designed intervention that

targets improvement of evaluation capacities of individuals or organizations. This

idea has long been supported by practitioners of the field, and there are many logic

models for ECB that have been developed by evaluation theorists and practitioners.

Also, there have been several attempts to synthesize these ECB models to attain a

unified understanding of the whole logic of ECB (Labin, 2014; Labin, Duffy, Meyers,

Wandersman, & Lesesne, 2012; Milstein, Chapel, Wetterhall, & Cotton, 2002;

Preskill & Boyle, 2008; Taylor-Powell & Boyd, 2008; Taylor-Ritzler, Suarez-

Balcazar, Garcia-Iriarte, Henry, & Balcazar, 2013). From this perspective it follows

that evaluation of ECB is essentially a notion of program evaluation that can be

embedded in ECB designs.

Furthermore, the nature of ECB as a designed intervention may be

considered as a learning intervention within organizations. This is a crucial

assumption if ECB is to be considered an evaluand itself, the object of an evaluation.

2

This could mean that ECB may be considered to have close similarity to learning

interventions in the education setting, although differing in many ways according to

context and purpose. For example, in organizations, adult learning can be achieved by

direct or indirect training which is mostly conducted in actual workplace settings.

When ECB is viewed as a learning intervention, three possibilities may be

explored. First, it enables an investigation of ECB through the lens of learning and

measurement theories that could be borrowed from the education discipline. The field

of program evaluation is accustomed to the idea of eclectic approaches and to drawing

from multidisciplinary approaches. Second, it warrants the necessity to examine the

content of ECB whether it has a single unifying construct. That is, to verify whether

the practice of ECB holds together as an entity itself called ECB. Third, it calls for an

investigation of how measures are carried out with respect to evaluating the effects of

the learning intervention. This means that learning outcomes are central to ECB

measurements.

It is an issue as to whether these ECB learning outcomes occur individually

or collectively, whether they affect individual behaviour or collective practice, or

whether they influence systems and structures. These ECB learning outcomes could

be examined if they follow a structured developmental learning progression from the

ECB learning content being delivered. Learning progression refers to the structure

that learners build as they progress towards mastery of the knowledge and skills

needed for evaluation capacity, as in the case of ECB. Once learning progression is

identified, the implications would be important in ECB practice because it could mean

purposeful sequencing of teaching and learning expectations across multiple

developmental stages. Hence, it is possible that this notion of developmental stages

holds in building evaluation capacities and needs to be investigated.

3

While it is possible that improved evaluation capacities among individual

and organizations manifest themselves in improved program delivery, and, by

extension, program outcomes, an organization‘s program outcomes are a different

evaluand with different sets of intervening factors compared with ECB learning

outcomes as an evaluand. A new understanding of ECB measurement and evaluation

may be found by being clear about what the ECB learning outcomes really are. This

could be the necessary preliminary step in understanding measurements in ECB.

Several implications come to mind when ECB is considered as a learning

intervention. First, it implies that ECB is not simply perceived as a mere

demonstration of evaluation skills or approaches to conducting evaluation, but it can

be viewed as programmatic intervention, following a designed logic. Often,

evaluators teach evaluation to organizations when there is opportunity, and to some

extent persuade these organizations to make evaluation a way of life. This is what can

be called a positively opportunistic ECB, as opposed to a more formal ECB that

perceives it as a serious intervention that requires accountability to measure learning

outcomes.

Second, viewing ECB as a learning intervention could clearly demand

cooperation of the stakeholders engaged in ECB. Because of their significant roles in

the organizational system, it is important to consider target stakeholders who

undertake these learning interventions. Clinton (2014) argued that the primary reason

for doing ECB is how the stakeholders – their willingness and readiness to adopt

evaluation – mediate the true impact of evaluation to organizations. This proposition

claims that ECB is a deliberate intervention to organizations that are affected by

stakeholders‘ decisions.

4

Lastly, to consider ECB as learning intervention implies that there is an

associated program theory that would lend itself to the rigors of program evaluation.

A program theory may identify with an implicit auxiliary measurement theory, which

is necessary to establish methodological rigor for any program evaluation initiative.

Several authors have suggested that for a program theory to be operational for

evaluative investigation, an accompanying measurement theory needs to be in place to

allow examination of its methodological rigor. (Blalock, 1979, 1982; Braverman,

2013; Braverman & Arnold, 2008). Thus, investigation of the measurement practice

of an ECB is a necessary preliminary exercise to ascertain how ECBs fare with

respect to methodological rigor as a programmatic intervention and determine

whether practitioners understand what ECB measurement practice is really about.

Hence, the motivation of this study stems from the perspective that ECB is a

learning intervention. The implications of this perspective could possibly provide new

ways of thinking about ECB measurement and evaluation. An examination of

empirical data through this lens may yield useful results to inform ECB practice.

Statement of the Problem

It seems surprising to find that ECB practitioners pay little attention to

measuring ECB outcomes. A research synthesis of ECB literature has documented

―very limited reporting of measures and quantitative data… for a field embedded in

evaluation and populated by evaluators.‖ (Labin, et al., 2012). In addition, the

reported evaluations of ECB are commonly carried out in qualitative narratives and

use anecdotal evidence. This is not to say that qualitative reports are inferior.

However, there are higher expectations of quantitative evidence-based claims if

5

evaluation practitioners are to convince organizations to mainstream evaluation – that

is to make evaluation part of the regular routine in an organization. While resting on

the assumption that practitioners in the evaluation field are mostly accustomed to

measurement, the phenomenon of low measurement of ECB outcomes in practice is

sufficient to warrant this investigation. This study aims to investigate the

measurement practice in evaluation capacity building. This is important because

through an understanding of the existing measurement practice it could be revealed

how practitioners understand the content structure, delivery, and evaluation of ECBs.

The lack of attention to measures, and ultimately to evaluations of ECB, may

not necessarily imply lack of skills and competencies on the part of ECB practitioners.

Evaluators – who are most likely to be the consultants, resource persons and trainers

for ECBs – are accustomed to measurement principles and methodologies. The

evaluation profession demands methodological rigor for the evaluations the

practitioners perform and so it is permissible to assume that evaluators are familiar

with the critical role of measurements in evaluation. Perhaps one plausible

explanation for this lack of attention to ECB evaluation is Braverman‘s (2013) notion

of the trade-off between measurement rigor and feasibility of measurement

implementation. However, what levels of measurement rigor occur, how ECB

measurements are carried out in practice, and how much measurement is conducted

all remain to be investigated in the empirical field.

Purpose of the Study

This research looks at the broader evaluation practice of Evaluation Capacity

Building (ECB) in published ECB reports. It attempts to examine two key

6

components of ECB evaluation from the learning intervention perspective. It looks at

the ECB measurement practice as a way to document and determine whether ECB has

a verifiable content construct, and possibly a structure of developmental proficiency

and how ECB outcomes measurement are supposed to be carried out.

Aims of the Study

This study hopes to achieve two major aims. First, it aims to document and

describe the measurement practice that occurred in ECB initiatives as reported in

published ECBs. Second, it seeks to investigate whether empirical evidence supports

the notion of ECB developmental proficiency that follows from the learning

intervention perspective of ECB. To achieve these primary aims, the following

detailed objectives are to be carried out:

Describe context, implementation, and content of ECB initiatives;

Describe the ECB measurement practice with respect to what is being

measured, the rigor of measurement and how much measurement is

undertaken;

Determine what influences the decision to measure ECB outcomes in practice;

and

Investigate whether ECB content delivered follows a unified learning

construct, and possibly a progressive structure, and whether outcomes

measured demonstrate this thinking in practice.

7

The Research Hypothesis

The premise of the research is the view that ECB is a learning intervention,

then from this perspective, the ECB content delivered in ECB activities could be the

focus of ECB outcomes measurement. Formally, the research hypothesis is stated as

follows:

Evaluation capacity building as a learning intervention would call for a

progressive approach to content delivery and outcomes measurement.

Research Questions

The main questions for this research are:

Research Question 1:

How can ECB measurement practice be described from empirical evidence?


Is there evidence to demonstrate that ECB content exhibits a unified

learning construct and possibly a progressive structure?

These questions may be broken down into the following sub-questions:


What are the contexts, implementation approaches, and content of

ECBs delivered in published ECB reports?

What is the rigor of measurement practice in published ECB reports?

What determines practice of measuring ECB outcomes?

8


Does ECB content demonstrate a unified construct and progressive

structure?

Does ECB content group together in specific ways?

Significance of the Study

This study hopes to contribute to the body of knowledge in evaluation

teaching. The characterization of ECB measurement practice as well as understanding

the nature of its relationship with ECB content, implementation and contextual factors

may provide answers to the problem of low response to ECB evaluation. Findings of

this investigation may provide alternative ways of looking at how practitioners

conceptualize, deliver and measure ECB.

Limitations

This study is limited to completed and published ECB reports. Published

ECB reports provide a feasible opportunity to examine how measurement practices in

ECB were carried out from a range of organizational contexts that would be otherwise

impossible to conduct individually on ECB initiatives in situ. This means that

conclusions from this study are limited to the population represented by the sample. In

addition, published ECB reports were aimed at different audiences and not for the

purpose of reporting measurement practices. This means that information may not be

complete or readily extracted from the report. This possible source of bias will be

minimized by establishing clear inclusion criteria for sample selection. A coding and

assessment instrument will be developed for data gathering consistency.

9

Outline of the Thesis

The thesis is outlined as follows. Chapters 1 to 3 set the scene of the study.

The first chapter established the rationale and identified the central question that the

study will attempt to answer. The literature on ECB is explored in Chapter 2 to

identify the existing understandings of ECB and to identify to factors already

established. A particular emphasis is placed on the emergence of ECB in the field of

evaluation, the influence of evaluation approaches and program theory thinking on

ECB, the state of ECB measurement studies and the case for the need to examine

measurement practices in ECB. The introductory part of the thesis ends in Chapter 3

with a detailed overview of the research design including its conceptual framework,

theoretical underpinnings and methodology used. Also, this chapter includes detailed

descriptions of the sample selection and inclusion criteria and the analysis tools that

are used to answer the research questions. Chapter 4 deals with the findings of the

study. Chapter 5 provides the synthesis and conclusion of the study. The synthesis of

the findings elaborates the significance of the study results in relation to the concepts

and ideas presented in the literature review. The conclusion summarizes the findings

of the study and makes the case for the research contribution. This is concluded by

some suggestions for further research studies and opportunities to apply the

recommendations from the findings of the study.

10

CHAPTER 2

REVIEW OF LITERATURE

Overview of the Chapter

This chapter has three objectives. First, it aims to locate the research topic in

the broad field of evaluation by identifying its position particularly in the areas of

evaluation capacity building (ECB) and measurement in evaluation. Second, it

presents the perspective through which the study investigates the problem and the

possible literature gaps it addresses through this lens. Lastly, it highlights why ECB

measurement practice has to be examined and demonstrates what this exercise can

reveal to further the development of ECB practice.

The first section presents the perspective this study adopts to frame the

concepts and ideas that existing literature may provide regarding ECB. Although the

approach to this study is through the quantitative method and positivist view, it is

recognized that the framing of the research questions to some degree has subjective

bias with respect to views and beliefs about the nature of ECB. This explication of the

research perspective is intended to provide a better understanding of the significance

of this study. The following sections begin with a brief narrative about the emergence

of this branch of evaluation practice and then provide an examination of the

definitions and conceptualizations of ECB. This is followed by a survey of the

dominant ideas of ECB approaches and models. Existing studies and issues on ECB

measurement are presented, and the chapter concludes by identifying the possible

gaps in ECB measurement literature.

11

The Study Perspective: Learning Intervention

This investigation is based on the view that ECB is a learning intervention.

This is the lens through which this study is conducted. The first component of this

perspective is the idea of ―learning‖. At the very essential level, it means ECB

initiatives can be perceived as analogous to the teaching-learning situation in adult

professional learning. Although ECB in organizations is more complex than the

picture of a classroom learning setting, the analogy could simplify and drive the point

that ECB as a learning intervention could provide a different understanding of ECB.

The classroom parallels ECB initiatives in several characteristics. The students

correspond to adult learners, mostly professionals who are key players or stakeholders

of the organization. The classroom learning environment, - which includes all factors

that enable or hinder learning, corresponds to an organizational environment that

could enable or hinder the development of evaluation capacities across individuals or

organizations. The classroom management systems, structures and rules also parallel

those of many organizations. Most importantly, the classroom learning content which

can be defined and structured as the foundation for learning assessment and

evaluation, equates with the knowledge and skills and abilities that organizational

training aims to develop.

However, this classroom-organization analogy for ECB diverges particularly

on the fact that organizations are expected to perform collective actions through

processes and systems that cannot be performed by individuals. Classroom learning

settings are often focused on individual learning, but organizational learning involves

collective and collaborative processes and systems.

For an ECB, whether the focus is on individual, team or organizational

learning, the teaching-learning processes need investigation. Theories of learning and

12

assessment may help reveal how ECBs could work, for example, by examining the

content material, the learning activities, and the way learning is assessed. Keeping this

analogy in mind while recognizing the fact that organizations are more complex than

classroom learning situations, this study focuses on drawing and integrating concepts

from educational and organizational paradigms in the hope of contributing to a deeper

understanding of ECB practice.

The second component of the perspective is the concept of ―intervention‖.

Intervention in this study suggests the idea of an intentional program design. This

means that ECB has implicit program theory with the basic structure of inputs-

activities-outputs-outcomes components. Thus, the assumption of ECB as a ―learning

intervention‖ is to recognize the view that ECB is both a teaching-learning process in

the area of educational theory as well as a programmatic intervention in the area of

program evaluation.

Assessment of Learning and ECB Measurement

Some concepts and approaches from educational measurement could be

applied to ECB measurement. The two prominent ideas from educational

measurement that appear to be useful with respect to ECB measurement are: (1)

developmental constructs of learning; and (2) developmental approach to assessment.

In learning intervention settings, developmental constructs are formulations

of the steps or stages of increasing competence. It is important that practitioners use

these stages to think developmentally about the intervention to support learning. The

importance of developmental constructs is that they can provide a basis for identifying

the Zone of Proximal Developmental (ZPD) of the learners, that is, the position in the

learning progression a learner is ready to learn (Griffin, 2007). Once the learner‘s

13

ZPD is identified, this information can be used to plan and monitor the teaching-

learning intervention. Developmental framework theories include Krathwohl‘s

Affective Domain, Bloom‘s Taxonomy and Dreyfus‘ Model of Skill Acquisition

("Assessment and learning partnerships: A short course for school leaders," 2012).

Thus, in ECB, the idea of content progression may also be considered. This is the

possible structuring of ECB content topics in developmental progression as a

reference for ECB implementation and measurement. This ECB developmental

progression will be referred to in this study as ECB developmental proficiency.

The developmental approach to assessment followed from the ideas of

developmental construct of learning and measurement theories of Rasch (1960,1980)

and Glaser (1963) as cited by Griffin (2007). This approach proposed that once a

developmental progression to learning is identified, it can be used for assessment that

could subsequently be used as a starting point for learning and the beginning of

change. This is explicated by the Five-Step Approach to Developmental Assessment

proposed by Griffin (2007) and shown in Figure 2.1. With regards to ECB, the key

point to be made from Griffin‘s developmental assessment framework is that ECB

measurement can only find meaning when it can be interpreted as a performance level

on the development progression.

These educational theories to learning and measurement provide a fresh look

at ECB as a learning intervention. It provides a basis for the need to examine ECB

measurement practice. The following questions could be asked: (1) Does ECB

measurement consider the idea of ECB as a developmental learning construct? (2) Is

there evidence to show that ECB is a unified learning construct that demonstrates

progressive structure? (3) Can the idea of developmental proficiency be applied to

14

ECB? Answers to these questions could perhaps redefine how ECB content,

implementation and measurement should be approached.

Figure 2.1 A Five-Step Approach to Developmental Assessment, Learning

and Teaching (Griffin, 2007)

The Emergence of ECB

An awareness of the emergence of ECB in the historical timeline of the

evaluation discipline could provide some background to the rise of the need for ECB.

The description of the continuing rise of evaluation discipline may help position ECB

in relation to the development of ideas in the field. This provides a backdrop for

conceptualizations and definitions of ECB that currently exist in the literature.

Many authors recognize that ECB has been a practice for some time

(Compton, Baizerman, & Stockdill, 2002; Milstein, et al., 2002; Preskill, 2008);

otherwise the idea of evaluation would not have developed into a distinct discipline as

it is today. ECB is recognized as a process ―long practiced but only recently named,

illuminated and explicated‖ (Compton, et al., 2002). Perhaps the most prominent

15

events that have become significant turning points for the highlighting of ECB were

the 2000 American Evaluation Association conference theme ―Evaluation Capacity

Building‖(Leviton, 2001) and the 2001 AEA conference theme ―Mainstreaming

Evaluation‖ (Sanders, 2002). The first decade of the 21st century could be considered

as an important evolutionary stage in the evaluation profession in that this was the

time when evaluators and organization leaders became interested in ECB along with

the wide acceptance of participatory, collaborative and stakeholder forms of

evaluation (Preskill & Boyle, 2008).

The development and institutionalization of evaluation, as a distinct field, is

fairly young (Brisolara, 1998; Preskill & Russ-Eft, 2005; Stufflebeam & Shinkfield,

2007). A sketch of the evaluation historical timeline, shown in Figure 2.2, gives an

indication of this relatively young discipline, roughly less than a hundred years. The

location of ECB in the timeline as an emerging area of the field of evaluation falls

within the most recent decade, and it took over ten years for ECB to be

conceptualized in the shape of a program theory. It was in this latter period of the

timeline that ECB practitioners became serious about ECB measurements and began

to think about evidence-based ECB outcomes.

Observing the events that unfolded prior to the emergence or, more

appropriately, emphasis of ECB, it can be seen that major conceptualizations and

approaches in evaluation have already taken ground. For example, the ideas of

participatory approaches to evaluation, utilization-focused evaluation and program

theory had already been established prior to the emphasis of teaching and

mainstreaming of evaluation during the AEA conferences. There is no evidence to

suggest that ECB has developed because the concepts of evaluation have matured.

16

Figure 2.2 Evaluation Timeline: Development and Institutions

1950 1960 1970 1980 1990 2000 2010 2020

Evaluation as

Educational

Assessment

(1950s)

Publication

of

Evaluation

Journals

(1970s)

Scriven’s

Meta-

evaluation

(1975)

Experimental and Quasi-

experimental Approaches to

Evaluation (1970s to 1980s)

Evaluation Network

and Evaluation

Research Society

(1976)

American

Evaluation

Association

(1985)

Program

Evaluation

Standards

(1994)

AEA’s Evaluation

Capacity Building

(2000)

AEA’s

Mainstreaming

Evaluation

(2001)

Wholey’s

Program

Theory

(1987)

Participatory Evaluation,

Utilization-Focused Evaluation,

Transformative Evaluation

(1980s to 1990s)

Integrated

Evaluation

Capacity

Building

Model (2008)

Scriven’s

Formative and

Summative

Evaluation

(1967) Scriven’s

Evaluation as

Alpha

Discipline (?)

(2013)

National

Evaluation

Societies

(1990s to

Present)

17

Perhaps the rise of ECB could be attributed to the socio-political demands for good

governance in the form of accountability and evidence-based social interventions in

which evaluation has taken a vital role, as well a rise in the need to understand and

develop the skills to carry out evaluations (Chouinard, 2013).

The forerunners of the evaluation profession succeeded the

institutionalization of evaluation, for example, in the form of national evaluation

organizations. The notion of evaluation has expanded from the confines of academic

institutions and federal requirements. This happened first in the United States of

America, and subsequently the rest of the world. Preskill (2008) has provided a

vision for what could be imagined as the future of evaluation with this emergence and

increasing commitment to ECB. It is a world where evaluation is a ―social epidemic

where individuals, groups, organizations and communities are constantly learning

about and from evaluations… creating a ‗global cascade‘ of evaluative thinking and

practice‖ (Preskill, 2008, p. 127). This vision of the use, influence and impact of

evaluation aligns with Michael Scriven‘s concept delivered at the 2013 Australasian

Evaluation Society conference in Canberra, Australia, that the future evolutionary

status of evaluation is to become an ―alpha discipline‖. To ensure movement towards

this evolutionary goal for evaluation, ECB has to be at the forefront. In this sense, the

advocates of evaluative thinking and practice appear to be heading in the right

direction. Having provided this brief backdrop of the position of ECB in the

evaluation field, the next section describes how ECB is defined.

ECB Definitions

The literature provides a compendium of definitions and conceptualizations

of ECB. Some of these definitions are presented here to provide a scope of

18

conceptualizations of ECB and describe the boundaries in which the study operates.

The prominent ones are the earlier definitions which are almost always referred to by

scholars of the field. Perhaps the most cited definition is from Stockdill, Baizerman

and Compton (2002, p. 14):

ECB is the intentional work to continuously create and sustain overall

organizational processes that make quality evaluation and its uses routine.

This definition views ECB as a systems approach. It considers the whole of

the organization as necessary to set up processes that facilitate routine evaluation

practice. The authors‘ influential work, the creation of evaluation systems for the

American Cancer Society (ACS) (the context for this definition), recognizes the

centrality of the organizational context to support ECB activities. On the other hand,

some scholars view ECB as an approach through which influential individuals can

influence the larger organizational structure to improve performance evaluation:

Evaluation capacity-building within an organization is typically understood as

an exercise in developing the evaluation skills and knowledge of some, or all,

of the organization‘s staff, with a view to increasing their ability to undertake

high-quality evaluations of the organization‘s projects and programs (Beere,

2005, p. 41).

Evaluation capacity building (ECB) is an intentional process to increase

individual motivation, knowledge, and skills, and to enhance a group or

organization‘s ability to conduct or use evaluation (Labin, et al., 2012, p. 308).

Others point to the importance of considering the full spectrum of stakeholder levels

within the organization:

19

ECB involves the design and implementation of teaching and learning

strategies to help individuals, groups, and organizations, learn about what

constitutes effective, useful, and professional evaluation practice. The ultimate

goal of ECB is sustainable evaluation practice—where members continuously

ask questions that matter, collect, analyze, and interpret data, and use

evaluation findings for decision-making and action. For evaluation practice to

be sustained, participants must be provided with leadership support,

incentives, resources, and opportunities to transfer their learning about

evaluation to their everyday work. Sustainable evaluation practice also

requires the development of systems, processes, policies, and plans that help

embed evaluation work into the way the organization accomplishes its mission

and strategic goals (Preskill & Boyle, 2008, p. 444).

The preceding definitions of ECB have commonalities and difference. Table

2.1 shows a comparison of the conceptual components found in these definitions. The

most common is the concept that ECB refers to improvement of evaluation capacities

through the development of individual knowledge and skills. This suggests that ECB

is seen as a program level intervention – that is, increasing the evaluation capacity of

individuals. Some definitions fall within this interpretation. Other authors view ECB

as a form of broader organizational practice. For example, Stockdill et al. (2002) and

Preskill and Boyle (2008) defined ECB with sustained or routine evaluation practice

within organizational processes.

From these definitions, it can be noted that there is no current agreement in

the field about what really constitutes ECB. However, Preskill and Boyle‘s (2008)

definition appears to present the most explicit and comprehensive definition (Table

2.1), one that is attuned to the ―learning intervention‖ perspective of this study. It

recognizes explicitly the component of ―teaching and learning strategies‖ of an ECB.

20

Table 2.1 Conceptual Components of ECB Definitions

Concepts

Reference

Stockdill,

Baizerman &

Compton

(2002)

Beere (2005) Labin, Duffy,

Myers,

Wandersman

& Lesesne

(2012)

Preskill and

Boyle (2008)

Intentional

work

Organizational

processes

Evaluation

quality

Sustained or

routine

Developing

knowledge,

skills or

motivation

Improved

ability or

transfer of

skills

Improved

organizational

programs

Individual

level

Group or team

level

Organization

level

Other definitions only imply the teaching-learning aspect of ECB by using

the terms ―develop‖ or ―increase‖ evaluation capacities. This definition is selected to

set the conceptual basis of ECB in this study. It not only affirms the premise of this

study, but also includes the ideas of: (1) ECB as intentionally designed intervention;

21

(2) considering both the program and organizational levels; and (3) the target

outcomes of ECB from individual to organizational capacities.

As ECB literature has grown, more scholars have continued to debate and

examine the different facets of ECB. Taut (2007), for example, emphasized that ECB

is ―not an area where a blue print approach could work‖. She proposed the idea that

no single definition can be comprehensive for ECB and that evaluation capacity is

situational and ever-changing in accordance with the local contexts.

Some have maintained the view that ECB is primarily about sustainability

of evaluation practice. McDonald, Roger and Kefford (2003), in their work ―Teaching

people to fish: Building the evaluation capability of public sector organizations‖,

contend that evaluation ‗capability‘ is a more appropriate term to use instead of

evaluation ‗capacity‘, since evaluation capability aims to provide ―enduring

organizational benefits, including a sustaining resource for producing evaluation as

well as a system for encouraging and using evaluation‖ (McDonald, et al., 2003, p.

10). While McDonald and colleagues (2003) associate the term ―capability‖ to

sustainability with respect to evaluation production and utility, some evaluators use

the terms interchangeably (Adams & Dickinson, 2010; McGeary, 2009).

This section concludes with the thought that although there are differences in

the conceptualizations of ECB from the definitions that have been explored, the

Preskill and Boyle (2008) definition appears to be the most explicit and

comprehensive in terms of the ECB components. The definition includes components

that are missing from the other sampled definitions. Although there is no current

agreement in the field about which definition to subscribe for the practice of ECB,

this study is positioned from Preskill and Boyle‘s (2008) definition. The primary

reason is that their definition grounds this study in the perspective of ECB as a

22

learning intervention. Furthermore, the definition is explicit in terms of teaching-

learning components, establishing a foundation of ECB outcomes as comprising ECB

learning outcomes from the ECB content delivered

Evaluation Approaches and ECB

Perhaps one of the evaluation concepts most associated with ECB is the

school of thought related to ―participant-oriented evaluation‖ (Fitzpatrick, Sanders, &

Worthen, 2011). The term school of thought is used to recognize the current diverse

ideas that participant-oriented or stakeholder-focused approaches introduced to

evaluation theory and practice. The term collectively refers to the following related

concepts: participatory evaluation, collaborative evaluation, empowerment evaluation

and utilization-focused evaluation (Fetterman, Rodriguez-Campos, Wandersman, &

O'Sullivan, 2014). A useful resource of the semantics and distinctions of these

concepts was provided by O‘Sullivan (2012) in Table 2.2. He clarified the four

approaches and related them to the implementation of evaluation and ECB.

Furthermore, these participant-oriented approaches appear in many articles published

on ECB, for example: collaborative evaluation (Arnold, 2006), participatory

evaluation (Atkinson, Wilson, & Avula, 2005; Kuzmin, 2012), empowerment

(Andrews, Motes, Floyd, Flerx, & Fede, 2005; Diaz-Puente, Yague, & Afonso, 2008;

Wandersman, 2014) and utilization-focused evaluation (Compton, Baizerman,

Preskill, Rieker, & Miner, 2001).

It appears that with the influence of these ideas in the evaluation field, they

could also influence the core approaches and concepts of ECB. This leads to a need

for an investigation of the implementation strategies of ECB. This is an issue for ECB

as a program level intervention which aims to increase the capacity of individuals

23

through working on program evaluations. This study addresses this issue by

examining what practitioners actually do as an approach to deliver ECB content. It

examines whether practitioners use a participatory approach to teaching evaluation,

direct training or a combination of both approaches.

Table 2.2 Participant-oriented Evaluation Approaches (O'Sullivan, 2012)

Aspects of

Evaluation

Collaborative

Evaluation

Participatory

Evaluation

Empowerment

Evaluation

Utilization-

focused

Evaluation

1. Primary evaluation focus

Promote participation

throughout

Engage some stakeholders

Stakeholders use evaluation tools

to achieve results

Promote the use of evaluation

findings

2. Evaluation decision-making

Negotiated Evaluator and participants

Participants Negotiated

3. Stakeholder roles

Clients, partners,

assistants, data sources

Clients, data sources

In charge of or partners

Key stakeholders

collaborate

4. Evaluator roles Team leader, collaborator

From participant observer to team

leader

Guide, facilitator,

critical friend

Active-reactive-interactive-

adaptive

5. Pre-evaluation

clarification activities

Probe program

purposes and resources

Unknown Addressed in

conduct of evaluation

Extensive

6. Design As rigorous as possible

Varies with evaluator role

Participant-centered

As rigorous as appropriate

7. Types of data

collection

Quantitative

and qualitative

Quantitative and

qualitative

Quantitative and

qualitative

Quantitative and

qualitative

8. Types of data reporting

As agreed upon Unknown Process, results, outcomes

On-going data as available

9. Evaluation Capacity

Building

Yes Unknown Yes Yes

10. Cultural

responsiveness

Yes Unknown Yes Yes

11. Systems or networking considerations

Yes Unknown Yes Yes

12. Implementation-

stakeholders as:

Instrument developers

Yes No Yes No

Data collectors Yes No Yes No

Data analyzers Yes No Yes No

Data interpreter Yes Yes Yes Yes

Data reporter Yes No Yes No

24

Program Theory and ECB

Interventions such as ECB can be perceived as a designed program. A

program is a set of resources and activities directed toward one goal or common

goals. A program theory identifies program resources, program activities, and

intended program outcomes, and specifies a chain of causal assumptions linking

program resources, activities, intermediate outcomes and ultimate goals (Wholey,

1987). The key in this definition is the identification of the components of a program

theory. These are the resources, activities and outcomes. Another important aspect of

this definition is the emphasis on causality in the chain of assumptions. Wholey

(1987) put it this way in ordinary language: ―If the following program resources are

available, then the following program activities will be undertaken… If these program

activities occur, then the following program outcomes will be produced… If these

activities and outcomes occur, then progress will be made toward the following

program goals‖ (Wholey, 1987, pp. 78-79). This understanding of the concept of

program theory is important in understanding the nature of ECB conceptualizations

through models and frameworks.

Integrative Evaluation Capacity Building Model

Program theory thinking pervades ECB thinking. The Integrative Evaluation

Capacity Building (IECB) model (Labin, 2014; Labin, et al., 2012; Leviton, 2014)

that has been widely circulated in the evaluation community has been modelled on

program theory. For example, this model has the structure of a program theory having

the basic components of inputs, activities and outcomes. The merit of IECB is that it

is empirically grounded and was established and updated using a synthesis approach

25

Figure 2.3 Integrated Evaluation Capacity Building Model (Labin, 2014)

26

to investigating published ECB reports. Figure 2.3 shows a diagram of the IECB

(Labin, 2014).

At the heart of the model are the identification of the program theory

components and links between the components Needs/Reasons-Activities/Mediators-

Outcomes. This is a clear adaptation of the program theory approach to understanding

and defining the concept of ECB. There are two key features of the IECB model

worth emphasizing other than the identification of the components and the assumed

links between these components. First, there is an implicit assumption that the IECB

model assumes the divide between individual and organizational levels of the

intervention. Although it is not clear how individual and organizational capacities

influence each other, this assumption appears to be widely accepted and often ECB

content delivered is based on this divide (Brown & Reed, 2002; Henry & Mark, 2003;

Taylor-Ritzler, et al., 2013). To an extent, this divide is also extended to several levels

such as individuals, teams or groups, organizational, community and even national

level (Holvoet & Dewachter, 2013; King, 2010; Preskill, 2008). Second, the

organizational program is also embedded in this model. The organization‘s program

goals appear as part of the Needs/Reasons component, the evaluation of the programs

in the Activities component and the program outcomes in the Outcomes component.

This shows the implicit assumption that ECB as an intervention is intertwined with

the interventions the organizations are running. While most of the ultimate goal of

ECBs is improved organizational outcomes, ECB intervention outcomes could be

confused by organizational intervention outcomes. This is very important in terms of

which ECB outcomes to measure. The IECB model is quiet clear in its distinction

between the individual, organizational and program level outcomes. It can be argued

that when ECB is viewed as a learning intervention, then the ECB outcomes that can

27

logically be linked to ECB activities are the individual and organizational outcomes.

The program level outcomes have a different set of context, implementation and

mediating factors. The improved evaluation capacity of individuals and organizations

is only one of the inputs for improved program outcomes. Hence, program outcomes,

while recognized as an ultimate ECB goal is a nested program theory that embedded

ECB program theory. ECB evaluation must be different from organizational program

evaluation.

In summary, the above discussion has shown that IECB is a program theory

that recognizes different possible levels of intervention. An organization‘s

intervention outcomes and ECB outcomes could be mixed up, adding confusion

regarding what to measure for ECB. This explains why there is a need to examine

ECB measurement practice. Documenting and investigating ECB measurement

practice could reveal not only how programmatic concept of ECB actually occurs but

also whether the measurement practice of organizational outcomes and ECB

outcomes are clearly delineated. The investigation findings could provide insights into

what could be done for future ECB measurement practice.

More ECB Models

There are several ECB frameworks and models that have contributed to the

current understanding of ECB. This is not a comprehensive review of the ECB models

but the ones presented here are prominent in the literature search. The intention of the

survey of these ECB models was to identify the range of conceptual ideas that were

published in this emerging area of evaluation discipline. Some of these models

concern program theory or logic models about how to conduct ECB, while others

relate to the assessment of evaluation capacity. These models and frameworks are:

28

Program Theory/Logic Models

General Framework for ECB (Milstein, et al., 2002)

Multidisciplinary Model of ECB (Preskill & Boyle, 2008)

Three-Component Framework and Logic Model for ECB Theory of

Change (Taylor-Powell & Boyd, 2008)

Synthesis Model of Evaluation Capacity (Taylor-Ritzler, et al., 2013)

Assessment of Evaluation Capacity

ECB Supply and Demand Model (Nielsen, Lemire, & Skov, 2011)

Getting to Outcomes (Wandersman, 2014).

General Framework for ECB

The General Framework for ECB reported by Milstein, et al. (2002) is in the

context of the Center for Disease Control and Prevention and Public Health system in

the United States. The framework is essentially a systems approach adhering to the

belief that ECB is about organizational principles, processes and procedures within its

organizational cultural and infrastructure contexts. However, it recognizes that

training in evaluation is a key component:

Evaluation capacity in public health would require a process of culture change,

including significant reforms to their own organizations… should build an

evaluation literate workforce and maintain a cadre of applied evaluation

scientists throughout the agency… These goals are partly accomplished

through training in evaluation, with additional strategies focusing on

leadership and other aspects of organizational infrastructure(Milstein, et al.,

2002, pp. 32-33).

The framework provides the ECB principles and guidelines that are deemed able to

promote program evaluation in the organization. This framework appears to be tied to

29

organizational management with a view that ECB can be embedded in the

organizational system and form part of the organizational operations. The framework

has an implicit goal of mainstreaming evaluation in the organization. Success measure

is defined in terms of functional evaluation systems in the organization.

Multidisciplinary Model of ECB

Preskill and Boyle (2008) proposed a multidisciplinary model of ECB (Figure

2.3). The model draws from the field of evaluation, organizational learning and

change, and adult and workplace learning. Its purpose is to provide a perspective for

understanding cohesion and organization of ECB. The model diagram in Figure 2.4

represents the key aspects of the model. It shows ―Transfer of Learning‖ as a link

between the ECB component and the organizational ―Sustainable Evaluation

Practice‖ component. The ECB component includes the goals of ECB, the motivation,

assumptions and expectations of ECB, the ECB design, and the ECB strategies. This

model identifies the idea that at the core of the ECB component is the teaching-

learning component and the transfer of learning. The model does not distinguish

between individual and organizational levels but, instead, emphasizes the

organizational learning capacity context in which this ECB and sustainable evaluation

practice could thrive and diffuse evaluation learning. This model is similar to the

General Framework for ECB; however it expands on the details in the ECB practice

and the organizational sustainable evaluation practice.

30

Figure 2.4 Multidisciplinary Model of ECB (Preskill & Boyle, 2008)

Three-Component Framework and Logic Model for ECB Theory of Change

Taylor-Powell and Boyd (2008) presented the Three-Component Framework

for ECB (Table 2.3) and a logic model for evaluating ECB (Figure 2.4). This

framework is in the context of complex organizations, specific to the case of state

education extension organization. The three components for ECB are identified as

professional development, resources and support, and organizational environment.

This is a framework where ECB is viewed as a form of professional development,

emphasizing that learning could take place in the workplace or during formal training

in educational institutions. It also recognizes the significance of resource support and

organizational environment in undertaking ECB. The proponents believe that when

these key components and elements are present in an organization, it sets the system

31

for ECB to diffuse in the organization represented by the logic of ECB theory of

change in Figure 2.5.

Table 2.3 Three-Component Framework for ECB (Taylor-Powell & Boyd,

2008)

Component

Elements

Professional development Training

Technical assistance

Collaborative evaluation projects

Mentoring and coaching

Communities of practice

Resources and supports Evaluation and ECB expertise

Evaluation materials

Evaluation champions

Organizational Assets

Financing

Technology

Time

Organizational environment Leadership

Demand

Incentives

Structures

Policies and procedures

In this theory of change, the ECB components are considered as activity

inputs and the outcomes include individual change with respect to the four learning

domains. This shows implicitly the ‗teaching-learning‘ assumption that is required for

professional development, whether through in-house or formal training. The model

also assumes that changes in the team and program could be simultaneous and that

organization change and social betterment are cumulative effects of individual, team

and program improvements.

32

Figure 2.5 Logic Model for ECB Theory of Change (Taylor-Powell & Boyd,

2008)

This model is similar to that of the IECB but emphasizes the idea of

cumulative effects. The ECB component of the model is not as detailed as presented

by the Multidisciplinary model for ECB. As with the first two models presented, these

models did not provide any reference to ECB evaluation or measurement, although

Preskill and Boyle (2008) suggested that it could be expected in practice.

33

Synthesis Model of Evaluation Capacity

The Synthesis Model of Evaluation Capacity described by Taylor-Ritzler, et

al. (2013) is a model produced by systematic review from existing ECB conceptual

models, principles and factors in the context of non-profit organizations. It identifies

individual and organizational factors that are believed to predict evaluation capacity

outcomes. The individual factors are: awareness of the benefits of evaluation;

motivation to conduct evaluation; and competence (knowledge and skills) to engage

in evaluation practices. The organizational factors include: leadership for evaluation;

a learning climate that fosters evaluative thinking; and resources that support

evaluation. The synthesis model also identifies the evaluation capacity outcomes,

these are: mainstreaming evaluation into work processes; and use of evaluation

findings. The model also emphasizes that organizational factors and organizational

learning capacity mediates ECB outcomes. Compared with the other models, this

model is consistent with Preskill and Boyle‘s Multidisciplinary model and also

consistent with the Labin‘s IECB. The contribution of this model includes empirical

validation of the factor relationship in the context of non-profit organizations. This

model has also become the basis of the development of an evaluation capacity

assessment instrument by the same team of ECB scholars.

ECB Supply and Demand Model

One existing ECB model that provides an operational framework for

measurement is that of Nielsen, Lemire and Skov (2011). The idea is modelled from

the supply and demand concept from economics. It identifies developing human

capital, tools and resources as ECB supply while ECB demand comprises of

organizational policies, plans, structures, processes and culture. The main idea is that

34

for any organization to be able to define the scope and objectives for ECB, it should

be able to determine that relative quantitative measures of these two sides of ECB.

This model, shown in Figures 2.6 and 2.7, identifies the components of ECB supply

and demand. It is important to mention that this model assumes that it could cut across

three levels: macro (societal level), meso (organizational level), and micro (individual

level).

Figure 2.6 Model for Measuring Evaluation Capacity (Nielsen, et al., 2011)

35

Figure 2.7 Evaluation Capacity Index (Nielsen, et al., 2011)

This conceptualization is entirely different from the program theory based

models presented earlier. The model provides a way of measuring existing demand

and supply of ECB and could be useful for a needs assessment for ECB teaching and

learning plan. The assumption here is that ECB is not necessarily an organizational

intervention but part of the inherent qualities of the human capital, derived from

evaluation training, experience and the education they receive. The point of this

model is that scores on both supply and demand need to be matched with the

organization demand for sustainable evaluation practice to take place in an

organization. The significant contribution of this line of thinking is that this could

serve as a way to assess the organizational context for setting the scope and objectives

of ECB. This concept could be made operational through a quantitative measurement

approach presented in Figure 2.7.

36

Getting to Outcomes

The final ECB framework included in this survey is the Getting to Outcomes

(GTO) framework proposed by Wandersman (2014). This is the latest framework

following the IECB proposed by Labin and colleagues in 2012. The purpose of GTO

is to provide an operational framework for the IECB model. Grounded in the

principles of empowerment evaluation, this framework addresses the dissatisfaction

that came with evaluations showing a lack of outcomes. The idea of GTO is to

provide key stakeholders of a program initiative with outcomes ―up-front‖. That is, ―if

key stakeholders including program staff had the capacity to use the knowledge and

tools of evaluation to help them plan more systematically, implement with quality, self

evaluate, and use the information for continuous quality improvement, then they

would more likely to achieve outcomes‖. The framework then provides a 10-step

approach to guide ECB practice matched to the science of ECB, as provided by the

IECB model. This operational framework shows that GTO in itself is an ECB

approach targeting outcomes-oriented evaluation planning, implementation and

evaluation. The contribution of this framework includes practical steps to follow for

ECB training.

In conclusion, three key points can be identified from the collective

contributions of these models. First, a range of principles, components, factors and

relationships are thought to be important in ECB. Second, the prominence of the

program theory thinking is common to some ECB models. Lastly, the models all

provide frameworks from which ECB can be evaluated. Perhaps one of the reasons

why IECB stands out among the models is its clear identification with a program

theory. This ensures the possibility of program evaluation which, in turn, means ECB

evaluation is essentially program evaluation. The idea of measuring ECB outcomes

37

takes a central role. The following sections examine the concepts of measurement in

the area of evaluation and ECB.

Measurement in Evaluation and ECB

Measurement in evaluation could refer to several things. It could mean the

process of identifying indicators, setting of standards and development of assessment

tools to assist the process of evidence building in evaluation. It could also refer to the

range of quantitative methodological processes of data collection, management,

analysis and interpretation that would provide some means to answer the evaluation

questions. Within this range, perhaps, the most important factor is providing evidence

that is quantifiable and verifiable. Thus, in this thesis the idea of measurement cannot

be separated from a quantitative paradigm. This section will discuss the major

thoughts relating measurement to evaluation and then to ECB.

In a dialogue on measurement in evaluation, Braverman (2013) proposed a

strong case for measurement in evaluation and brought attention to the realities and

challenges it involves. He argued that the convincing power of evaluation for

evaluation, when used by stakeholders, is only as good as its credibility to all

stakeholders. This view regarding measurement builds on the view of Patton (2002) in

his seminal and influential work on Utilization-Focused Evaluation. Braverman

(2013) argued that utilization of evaluation information only gains credibility when

methodological rigor is established well enough to convince evaluation users. The

most feasible way to attain methodological rigor is to consider the whole gamut of

validity issues of an evaluation activity: measurement decisions, standards for

strengths of evidence, alternative measurement options, measurement requirements

38

and the like. Central to his argument is the critical role that measurement holds in

evaluation and the contextual issues that surround measurement decision-making in

evaluation:

The technical aspects of an evaluation study that are associated with

methodological rigor are directly linked to the quality of evidence that

the study is able to produce… An evaluation‘s measurement-related

planning decisions and implementation activities, that is, it‘s

measurement-related rigor, will influence the quality of evidence

(Braverman, 2013, p. 101).

This position on the significance of measurement is not a shallow

afterthought. Braverman (2013) draws the theoretical underpinning of this claim from

social science theory. The sociologist Hubert Blalock (Blalock, 1979, 1982) made a

case on the relationship between theory and measurement. He noted that social

science theories, whether explicit or not, are accompanied by ―auxiliary measurement

theories‖ that underlie the use of whatever specific measures have been chosen.

Quoting from Blalock (1982) as cited by Braverman (2013):

In short, we must become more attentive to the need for stating explicit

auxiliary measurement theories and for examining comparability of

measurement, just as we must also be concerned about the

generalizability of our substantive theories (p.31).

This case on the necessity of measurement is straightforward: methodological rigor

requires an underlying measurement rigor. The extent to which measurement rigor is

valid and acceptable among intended users determines the quality of methodological

rigor of any evaluation activity.

While arguing for the significance of rigorous measures for evaluation,

Braverman (2013) considered the important issues of feasibility in carrying out these

measurements. He emphasized that evaluators, at the negotiation stage of evaluation

39

planning, need to be upfront with stakeholders about the tradeoffs between the

demands of rigorous measurements and feasibility. This means that the negotiation

stage of planning for evaluation is critical for considering the feasibility-rigor

dynamics of measurement. Measurement rigor is referred to as the quality of

measures with respect to the psychometric properties of the measurement instruments

while feasibility refers to the resources needed (such as time, finances and expertise in

developing and carrying out the measurements). Running measurements is relatively

easy, while developing a valid and reliable measure requires a substantial investment

of resources, including expertise.

This concept of rigor-feasibility dynamics in evaluation measurement

carries an important significance for ECB measurement. This dynamism could readily

answer the question, ―What might prevent rigorous measurement of ECB?‖ A

straightforward answer would be that it depends on the feasibility of measuring given

context at hand. If we assume that rigor-feasibility exists, then the nature and quality

of measurement of completed ECBs in the published reports would be products of this

dynamic. This means that whatever measurement rigor level we observe in the reports

had already been decided by the stakeholders at the time and context of the ECB

initiative. This leads to one of the research questions for this study, ―What is the rigor

of measurement practice in published ECB reports?‖ An answer to this question could

provide information on the status of measurement practice in the field possibly

providing information on the quality of this rigor-feasibility dynamics in the empirical

world. To observe low levels of measurement rigor in practice would then possibly

imply different priorities in the evaluation of ECB other than measurement.

It may appear that the focus of measurement rigor discussed here is limited

to quantitative primary data and types of data that lend to reliability and validity

40

standards. It may also seem that there is no recognition of the contributions of

qualitative evaluation (that is, when methodological rigor is only associated with

measurement rigor). This concern is at the heart of Weitzman and Silver‘s critique

(2013) on Braverman‘s (2013) position in the dialogue. Weitzman and Silver (2013)

challenge evaluation practitioners to a perspective shift. They argue that while they

believe rigorous measurement is ideal, this is seen through the lens of measurement

experts and psychologists where a potential bias towards measurement perfection

appears to be the only primary goal. They further argue that evaluators recognize that

in the other disciplines (where most evaluation demands are made), rigorous

measurements are not as popular as measurement-oriented practitioners think.

Weitzman and Silver (2013) proposed that what are needed are not ‗perfect‘ measures

but ‗good‘ measures that are timely, relevant and good enough for the stakeholders.

In effect, Weitzman and Silver (2013) state that in evaluation measurement,

what matters is what is measured, rather than spending energy and resources on trying

to perfect rigorous measurement approaches. They submit that in evaluation

measurement it would be a good practice to take stock of all available and relevant

data that can be feasibly measured (much like low hanging fruit), and invest ‗thickly‘

in the most important aspects of the program. They are not arguing against rigorous

measurement but for the idea of the rigor-feasibility dynamics. This line of thought is

also addressed by Braverman (2013). He emphasizes the consideration of ―alternative

approaches for generating evidence in support to causal claims‖. In an earlier article,

Braverman and Arnold (2008) emphasized on ―context-dependent decision making in

levels of methodological rigor,‖ and ―the importance of relevance and feasibility‖. In

the light of this exchange about evaluation measurement, the concepts and arguments

41

for approaches to measurement in evaluation also applies to ECB measurement. This

is because ECB measurement is an essential aspect of ECB evaluation.

Looking now at this study on ECB measurement practices, it is imperative to

examine what practitioners in the field are measuring in ECB. Identifying what is

being measured will reveal the scope of variables that are measured in ECB, and

possibly see them against the backdrop of a bigger question of what really matters in

ECB measurement. Being able to identify what matters in ECB is where ECB

practitioners should be investing thickly when it comes to developing measures for

ECB. The next section will provide a survey of the studies that deal with

measurement in ECB.

Measurement in ECB

There are several existing studies on measurement in ECB. These are

reviewed to examine the focus of ECB measurement these studies have carried out.

The studies can be grouped into three categories: (1) those studies that developed

measurement tools for various components or elements of ECB; (2) studies that

critique measurement approaches; and (3) research that validates ECB models.

The group with most studies on measurement in ECB comprises those that

developed measurement tools for various components or elements of ECB. For

example, Taylor-Ritzler et al. (2013) documented studies with existing ECB

assessment instruments. Table 2.4 shows this list of published ECB measurement

tools, updated for the present study. It can be observed that from among the published

ECB measurement tools, the majority deal with organizational evaluation capacity

measurements. Three of the seven ECB assessment instruments measure

organizational evaluation ―readiness‖. Other areas include organizational learning

42

culture, leadership, systems and structures, motivation, organizational contexts and

other relevant ECB variables. The last two entries of Table 2.4 are new additions

published ECB assessment instruments.

Table 2.4 ECB Assessment Instruments

Name of Instrument Author and Year Components Measured by the Instrument

Readiness for

Organizational

Learning and

Evaluation (ROLE)

Preskill and Torres

(2000)

Culture (organizational)

Leadership

Systems and structures

Communication of information

Teams (working as a team)

Assessing Learning

Culture

Botcheva, White, and

Huffman (2002)

Outcome measurement practices

Learning culture

Organizational

readiness for change

(TCU-ORC)

TCU Institute of

Behavioral Research

(2005)

Motivation for change (program needs, training

needs, pressure for change)

Resources

Staff attributes

Organizational climate

Evaluation process use

measure

Taut (Taut, 2007) Evaluation

Section 1: Views of evaluation, decision making,

expectations, sharing knowledge, and learning

culture

Section 2: Opinions and experiences with

evaluation, available resources, internal and

external monitoring and reporting

Section 3: Previous experience with evaluation

A checklist for

building organizational

evaluation capacity

Volkov and King

(2007)

Organizational Context

ECB Structures

Resources

Evaluation and

organizational capacity

Cousins, Goh, Elliot,

and Aubry (2008)

Organizational Learning Capacity

Organizational support systems

Capacity to do evaluation

Specific types of evaluation activities

Stakeholder participation in evaluation

Use of evaluation findings

Use of evaluation process

Conditions mediating evaluation use

Readiness assessment

tool for evaluation

capacity building

Danseco, Halsall, and

Kasparzak (2009)

Experience with evaluation

Leadership and collaboration

Systems and structures

Evaluation practice

Cultural Competence

of Program Evaluators

Dunaway, Morrow and

Porter (2012)

Cultural Competence in Program Evaluation

Systems Evaluation

Protocol

Urban, Burgermaster,

Archibald, and Byrne

(2013)

Quality of evaluation plans and models

The second group of ECB studies are those that critique ECB measurement

approaches. For example, the most common after ECB activity measurement tool

43

used are self-assessments to detect workshop success (D'Eon, Sadownik, Harrison, &

Nation, 2008). Lam (2009) refuted the argument that the measure could work. He

pointed out that, among other things, self-assessments – even with balanced over and

underestimates – remain biased and should not be used to evaluate workshops. He

argued that participants‘ performance should not be attributed directly to training even

if the self-assessments are psychometrically valid and obtained prior to the workshop.

He further cautioned that self-assessment findings should not be generalized to other

situations without further analysis. The study concluded with principles in using

assessments for evaluating training such as ECB. Another example for this group of

ECB measurement literature is Braverman‘s (2013) concern for program evaluations

(possibly a large number relative to practice) that are outside the rigor requirements of

journal editorial boards and professional peer review mechanisms. The study pointed

out that for small scale program settings, rigorous measurement strategies are often

not given attention. He argued that sound evaluation planning requires numerous

decisions about how constructs in a program theory will be translated into measures

and instruments that produce evaluation data. The study concludes with a suggestion

that in making measurement decisions, standards for strength of evidence that a given

measure produces must be established, alternative measurement options weighed and

measurement requirements are carefully communicated with clients and stakeholders.

The third set of ECB studies is comprised of studies that validate ECB models.

Often, evaluation theorists seek empirical data to validate a proposed model of a

phenomenon. The same is true for ECB conceptualizations. Researchers turn to

empirical measurements to validate, refine and test conceptualized ECB models. For

example, in a Danish study of public sectors, Nielsen et al. (2011) proposed the

Supply and Demand model for ECB. Along with this conceptualization was the

44

inclusion of strategy to measure the Evaluation Capacity Index to map Danish public

sector organizations. The study concludes with results that support the validity of the

proposed model. In the proposed Synthesis Model of Evaluation Capacity, Taylor-

Ritzler et al. (2013) turn to developing the Evaluation Capacity Assessment

Instrument (ECAI). The instrument was tested on 169 staff of non-for-profit

organizations. The 68-item measure assessed participants‘ perceptions of individual

and organizational factors predictors of two ECB outcomes, the mainstreaming and

use of evaluation, and demonstrated that the instrument met internal consistency

criteria. The study also concluded that the ECAI validated the synthesis model and its

depiction of relationships between the evaluation capacity predictors and outcomes.

Among this third set of ECB studies is the proposed Integrative Evaluation Capacity

Building (IECB) proposed by Labin et al. in 2012 and was updated in 2014. This team

of researchers turn to quantitative measures to describe and validate the proposed

ECB program theory.

Labin‘s (2014) latest work on ECB, as of this writing, provides a stock-take

of the existing ECB measurement tools documented in the literature. She identifies

the links between these measurement tools and the IECB model. The mapping of

these measurement tools to the constructs and indicators shows convergence of ECB

concepts affirming the validity of the framework and at the same time provided for

the expansion of the details of the model. Labin‘s et al. (2012) synthesis study and the

follow up study (2014) appear to unify the concepts and ideas on ECB using mapping

of these ECB measurement tools. For example, in the Needs/Reasons component of

the ECB framework on Motivation, a measurement tool was identified that provides

specific items that measure the indicator and construct for Motivation. In this case, the

measurement tool used by Botcheva, White and Huffman (2002) addressed the

45

indicator ―Internal (motivation): incentives, rewards, and recognition‖. As to the

overall content of the measurement tools Labin (2012) noted that specifics on training

and technical assistance appear to be more widely included compared with specifics

on leadership and collaborative skills.

Labin‘s (2014) findings have also shown that while the primary reason for

doing ECB is to improve program outcomes among organizations, there are only two

measurement tools that focus on program outcomes. This supports the view that it

could be possible that measurement developers for ECB perceive a distinction

between measures for ECB outcomes and measures for program outcomes. Program

outcomes and ECB outcomes could have different mediating factors and contexts.

ECB outcomes are usually mediated by contexts of the organization while program

outcomes are mediated by contexts of the organization and the environment of the

intervention target.

The findings presented by Labin (2014), in the light of the analysis of the

research problem in this study, establish the need for the first research question of this

study: ―What are the content and implementation approaches of ECBs found in

published ECB reports?‖ There is a need to investigate independently the content and

implementation factors of ECB as practiced. This is important because in teaching-

learning environments, one can examine the competency learned from the content

delivery, and for that matter assess or measure the competency taught on a particular

learning environment. Hence, measures can focus on the things that ECB activities

have delivered and this delineates the difference between ECB outcomes and program

outcomes.

In summary, the studies on measurement in ECB focus on developing ECB

measurement tools, analysis and critiques of measurement approaches and

46

confirmation of proposed models. In the light of ECB as a learning intervention, ECB

outcomes could be perceived as learning outcomes of the ECB initiative. This differs

from program outcomes that organizations expect from program design and

implementation. These two sets of outcomes might help clearly define what needs to

be measured in ECB.

Knowledge Gaps: The Case for Investigating ECB Measurement Practice

At this point, this chapter has presented the perspective adopted in this study.

From this viewpoint, ECB is regarded as a learning intervention in a complex

organizational setting. The conceptualizations and definitions of ECB were presented

favouring the view that ECB outcomes include mainstreaming and use of evaluation

from learning interventions directed at individual, teams or organizational level

targets with the ultimate goal of improving organizational outcomes. A brief overview

of the evaluation timeline positioned the emergence of ECB to be formally recognized

by the body of professionals in the recent decade. The chapter proceeded with the

examination of the influence of program theory and evaluation approaches in the

practice of ECB. It then covered the different conceptualizations of ECB models and

frameworks, investigated how measurement in evaluation relates to ECB, and

presented a brief survey of ECB measurement. This chapter is therefore able to

position the present study in the evaluation field timeline under ECB studies in

furthering discussions on ECB measurements.

Some knowledge gaps were documented in this literature review, warranting

examination of ECB measurement practice in the field. First, while there are ECB

models that determine the factors that relate ECB determinants to ECB outcomes,

47

there is no existing study on the structure of the ECB content delivered. The

researcher‘s argument here is that under the assumption that ECB is a learning

intervention, ECB outcomes can be considered as learning outcomes. Content

structure is critical because it provides a basis for measurement of learning outcomes.

In the survey of the ECB measurement studies, most of the existing studies were

focused on developing ECB assessment tools in the various components and elements

of proposed ECB models, critiques on measurement approaches and validation studies

on ECB models. There were no studies that investigated how the measurement

practices were carried out regarding rigor of measurement. Lastly, while the idea of

developing measurement tools and mapping of these tools is addressed in the overall

program theory for ECB, this does not answer what ECB really seeks to measure. An

investigation of what empirically happens in the field might be able to further

knowledge in this area of the evaluation field. The next chapter details the research

design of this study.

48

CHAPTER 3

RESEARCH DESIGN


A review of the literature in the previous chapter positioned the research

topic in the broader field of evaluation, evaluation capacity building (ECB),

assessment of learning, and measurement in ECB. This chapter presents the actual

mechanics of the research, the procedures and instruments adopted in this study. The

research problem is restated at the beginning of the chapter to refresh focus on the

main concerns of this research. The conceptual framework of the study is then

presented to provide the background understanding of the assumptions and set the

direction of the inquiry of this research. This is followed by a description of the data

management and statistical analysis applied in the study and concludes with notes on

the role of the researcher and ethical concerns.

The Problem and Research Questions

The overarching objective of this study is to look at the measurement

practices in ECB to draw empirical evidence that could possibly explain the level of

attention practitioners give to evaluation of ECB outcomes. This is approached by

investigating what has been happening in the field as can be examined from

completed and published ECB reports.

The main questions for this research are:

Research Question 1: How can ECB measurement practice be described

from empirical evidence?

49

Research Question 2: Is there evidence to demonstrate that ECB content

follows a unified learning construct and possibly a progressive structure?

These questions may be broken down into the following sub-questions:


What are the contexts, implementation approaches, and content of

ECBs delivered in published ECB reports?




Does ECB content demonstrate a unified construct and progressive

structure?

Does ECB content group together in specific ways?

Research Design

This section elaborates the way the research is conducted. It presents,

explains, and justifies the approach adopted to address the research questions of the

study.

Broad-based Research Synthesis Method

This study employed an adaptation of the broad-based research synthesis

method proposed by Labin (2008). It is a type of research synthesis that aggregates

findings from primary research as a secondary data analysis. Its main aim is to

establish a base of current knowledge of the subject of interest. Research syntheses

50

are useful tools to clarify and direct future research. Broad-based research synthesis is

different from the traditional research synthesis approach that usually has restrictive

inclusion criteria, often based on randomized controlled trial (RCT) designs. Labin

(2008) argued that research synthesis like meta-analysis is highly restrictive in its

tendency to use RCT data and pooling of quantitative results from similar designs.

Broad-based research synthesis is explicitly systematic but has the characteristics of a

qualitative review. It emphasizes systematic decision rules, uses qualitative and

quantitative means to summarize findings, and integrates qualitative and quantitative

data from various sources and designs, thus the term ―broad‖. In employing this

method, the researcher takes the view that ―random assignment‖, the basis of most

RCT and meta-analysis synthesis, cannot stand alone. In addition, researchers and

policymakers need to find other approaches of synthesis study beyond experimental

and non-experimental debate (Labin, 2008).

This method is deemed suitable for this study for several reasons. First, the

nature of the main research questions demands a reasonable scope of information if it

is to describe the ECB measurement practice in the field. To do this, a research

synthesis approach is necessary since this allows the methods of systematic review

that would ensure a certain level of information adequacy. Second, Labin‘s notion of

―broadbased‖, which means not restricting samples on the basis of a particular

statistical design, is applicable to the situation of ECB studies. ECBs studies are often

carried out and reported on a wide range of research designs and approaches. If the

systematic review of ECB cases is limited to specific evaluation approaches, then it

may defeat the purpose of the synthesis to provide a reasonable scope of information

with regards to ECB measurement practice in the field. Third, this research is

intended as an extension of the work of Labin et al. (2012). The study hopes to deepen

51

the understanding in the area of ECB measurements following the successful and

well-accepted presentation of the synthesis program theory for ECB. Finally, this

approach is feasible for a single researcher with limited resources. The survey of

literature and ECB cases using this approach could reasonably provide breadth and

depth of information on ECB measurement practices in a relatively short period of

time. This approach is more efficient compared with actual immersion ECBs which

costly and time demanding. This means that, given the student researcher context, the

broad-based synthesis approach provides the best alternative to respond to the

problem at hand, affording a wide-range of ECB perspectives, approaches and

contexts which cannot be captured by single actual immersion case studies.

The method stresses the importance of the documentation of the decision

rules. It addresses the questions: What databases were searched? What key words

were used? What design or quality features were used as selection criteria for

inclusion? What coding criteria were used for outcomes or effect sized? What level of

reliability was obtained by coders of results?

The following steps guide the conduct of broad-based research synthesis

(Labin, 2008):

1. Define the research question

2. Collect information sources

3. Select information sources based on inclusion criteria

4. Extract and code data

5. Analyze data

6. Present findings

52

Study units and Selection Procedure

This research is based on completed and published ECB reports, therefore

the units of analysis of the study are the ECB reports that made it through the

selection procedure and criteria. Table 3.1 provides the information sources searched

to collect the case units. The initial listing was adapted from Labin et al. (2008) and

was supplemented with more recent sources. The search protocol follows a

systematic approach using a selection procedure adapted from Miller and Campbell’s

(2006) multistage literature selection design (Figure 3.1).

Table 3.1 Information Sources Search

Databases Searched: Dissertations and Theses (ProQuest)

Academic Search Premier (EBSCO)

Education Research Abstracts

ERIC

Expanded Academic ASAP

Informit, Informaworld, ProQuest (CSA)

International Bibliography of the Social Sciences

JSTOR

PsychInfo

Sociological Abstracts

Social Work Abstracts

Web of Science

Search Terms Used: Developing evaluation capacity

Empowerment evaluation

Evaluation capacity building

Evaluation capacity development

Evaluation skill building

Evaluation technical assistance

Evaluation training

Evaluative inquiry

Mainstreaming evaluation

Participatory evaluation

Evaluation capacity measurement

Journals: The American Journal of Evaluation

The Canadian Journal of Evaluation

The Evaluation Journal of Australasia

Evaluation

Evaluation and Program Planning

Evaluation Review

Evaluation and the Health Professions

The Journal of Multidisciplinary Evaluation

Journal of Development Effectiveness

New Directions for Evaluation

53

Educational Evaluation and Policy Analysis

Evaluation and Research in Education

Journal of Educational Evaluation for Health Professions

The Journal of Evaluation in Clinical Practice

Electronic Journal of Information Systems Evaluation

The Journal of Nondestructive Evaluation

The Journal of Personnel Evaluation in Education

Language Resources in Evaluation

Educational Research and Evaluation

Measurement and Evaluation in Counselling and Development

Practical Assessment Research and Evaluation and Studies in

Educational Evaluation

Stage 1

Stage 2

Stage 3

Stage 4

Figure 3.1 Multistage Selection Process

Searched

N0 databases,

evaluation journals,

Google Scholar

N1=102

Articles, Chapters, Book

Reviews

Reviewed

References and cited

works in N1 articles,

chapters, book reviews

N2=130

Articles, chapters, book

reviews

Reviewed

N2 articles, chapters, book

reviews against inclusion

criteria

N3=70

Case examples

Reviewed

N3 case examples, refined

coding criteria, cases with

insufficient information

removed.

Final selected case units,

N=63

Continuation of Table 3.1

54

The selection process (Figure 3.1) commenced by using the identified search

terms and information sources listed in Table 3.1. The Stage 1 search of articles,

chapters and book reviews yielded the initial sample (N1=102). This sample was

expanded in Stage 2 by reviewing its references and ‗cited by‘ works (N2=130). The

inclusion criteria reduced the case samples to N3=70 in Stage 3 and this was further

reduced (N=63) after cases with insufficient information were removed.

Broad-based Research Synthesis Method Inclusion Criteria

The critical aspect of the Broad-based Research Synthesis method is the

inclusion criteria. The inclusion criteria will determine which case units will be

included in the study. They are also an integral part of the systematic sampling

procedure applied for the search. This helps focus: (1) the parameters of searching the

literature; (2) the parameters for selecting items from the literature and reference lists

through their titles and abstracts; and (3) the final selection of the case units.

The searched articles need to satisfy each of the following criteria to qualify as case

examples:

1. An ECB report published in the listed databases and journals.

2. Published in the period 1970 to present. This selection of time frame

corresponds to the years included in the Labin et al. ECB literature synthesis.

3. Articles that include a report on the measurement process and description of

the measurement models or measurement tools used. However, all ECB

reports included by the sampling procedure will have to be examined to

determine the relative percentage of reports that used measurement practice.

4. The ECB has to be in the context of an organization, a government sector or

agency. This includes ECB reports on programs that indicate involvement of

an organization for the mainstreaming of evaluation. Case units in the context

55

of formal education for evaluation training such as university courses will be

excluded from the selection.

Conceptual Framework of the Study

The conceptual framework of this study is diagrammatically represented in

Figure 3.2. It summarizes the main idea of the research, the relationships of variables,

the research questions, and the methods of analysis. This framework was developed

from the concepts drawn from the literature review of this study and from the research

questions raised in the previous chapter. Table 3.2 summarizes the ECB variables

identified in the literature which form the first component of the framework.

In Figure 3.2, the left-hand box labelled “Evaluation Capacity Building”,

consists of three variables: content, implementation and context (details in Table 3.2).

Content refers to ECB topics delivered at ECB initiatives, often these ECB topic

contents are categorized into individual or organizational focused. These topics range,

for example, from basic evaluation knowledge and skills to organizational

strengthening of evaluation system within the organization. The implementation

variables refer to the teaching or training approaches of the ECB. It may include

strategies such using direct training or indirect training, or participatory evaluation

approaches that cater to adult learners in work settings. Context refers to the broad

range of organizational and environmental characteristics that describe the setting of

the ECB initiative. This includes, for example, characteristics such as domain of the

ECB and type of organization. Research Question 1A, “What are the implementation

approaches and content of ECBs delivered in published ECB reports?”, seeks to

document and describe these three variables. It should be noted that the “context”

56

variable is intentionally excluded from the research question, as these variables will

be represented as part of the independent variables in the analysis. The box at the far

right side Figure 3.2 labelled “Measurement Practice” refers to the variables “ECB

Learning Structure”, “Rigor of Measurement” and “Decision to Measure”.

Generally, ECB learning outcomes should be similar to those of the ECB

content topics. The “Rigor of Measurement” refers to a score level of the reported

ECB using scoring rubrics developed for this study. For instance, an ECB study

would have a high rigor score if it scores high in most of the criteria, for example in

the “scope of variables measured”, it will get a high score if it covers both individual

and organizational capacities (Appendix B). The variable “Decision to Measure” is a

binary data point categorizing ECB initiatives as those that measured or reported

evaluation of their ECBs or not. This dependent variable is included along with the

independent variables for use in the regression analysis to determine the likelihood

that an ECB initiative would measure its ECB outcomes. The “Measurement Practice”

box is determined by Research Questions 1B and 1C.

57

Figure 3.2 Analysis Diagram and Research Questions Map

Main Research Question: How do we describe the measurement practice in Evaluation Capacity Building?

RQ1: What is the description of ECB measurement practice from empirical evidence?

RQ1A: What are the implementation approaches and content of ECBs delivered in published ECB reports?

RQ1B: What is the rigor of the reported measurements in ECB?

RQ1C: What determines the practice of measurement in ECB?

RQ2: Is there evidence to demonstrate that:

RQ2A: ECB content follows a unified learning construct and a possible progressive structure?

RQ2B: ECB content could be grouped in specific ways?

Measurement Practice

RQ 2B

RQ2A

Evaluation Capacity Building

Content

Implementation

Context

Unidimensional Assumption

(IRT Analysis)

Multidimensional Assumption

(Factor Analysis)

ECB Construct and

Progression

ECB Content Sub-

domains

Levels or categories of

content, implementation

and context variables Predictive Influence

(Logistic Regression)

RQ1A

RQ1B ECB Learning Structure

Rigor of measurement

Decision to Measure

RQ1C

58

Table 3.2 Evaluation Capacity Building Content, Implementation and

Context Variables

ECB

Components Variables

Content Individual-focused topics

Evaluation awareness and attitude

Evaluation terms, approaches, or methods

Logic models

Evaluation plan

How to do an evaluation

Data management, analysis, interpretation

or use

Program planning

Program implementation

Organizational-focused topics

Organization evaluation practices

Evaluation readiness and willingness

Building leadership support

Building culture for evaluation

Creating/strengthening evaluation policy

requirements

Creating/strengthening evaluation

structures


systems

Creating/strengthening support for

evaluation resources

Improving organizational evaluation

social network

Implementation Teaching strategy

Mode of strategy

Contact duration

Intended target change

Participant focus

Context ECB domain

Type of organization

Type of program delivery

Number of organizations

Number of Programs

Number of sites

Affiliation of ECB facilitators

Methodological paradigm

59

The boxes in the center of the diagram identify the main analytical tools used

in the study. The Item Response Theory analysis is an analytical tool adapted in this

study to determine whether the ECB content topics tap into a single learning construct

(Hambleton & Swaminathan, 1985). In learning theory, it is important that the content

of learning material holds together as an entity that is organized in such a way that it

exhibits a developmental structure or stages of competency. IRT analysis can be used

to determine whether this construct for ECB exists and whether it exhibits a

developmental structure. In this study, this structure will be referred to as ECB

developmental proficiency. The Factor Analysis organizes the ECB content topics

into possible dimensions that may help in the understanding of the underlying

conceptual construct of ECB (Thompson, 2004). Both results of IRT and Factor

Analysis are subsequently used as input variables to determine their influence on the

“Decision to Measure” using the Binary Logistic Regression.

Research Instrument Development

The research instrument (Appendix B) developed for this study aimed to

document the ECB measurement practice. The instrument is a query format that

allows the researcher to code the ECB variables identified from the literature review

using the conceptual framework of the study. The content of the coding form was

mostly adapted from Labin et al. (2012) (See Appendix B for details). It consists of

three parts: (1) the ECB profile section, (2) the checklist for ECB content and

implementation, and (3) the scoring rubrics for the rating scale of rigor of

measurement practice. The following discussion details the development and

validation process of the instrument.

60

The checklist in the coding form for ECB content and implementation was

developed and peer reviewed by three evaluation colleagues to ensure the face

validity of the instrument. In the first review, several comments helped the validation

and trial of the initial phase of the instrument development. A second round of review

was conducted when the revised instrument was presented in a PhD collective

research forum. While most of the content of the checklist were taken from the ECB

synthesis report of Labin et al. (2012), several details and modification of the

checklist were added (Appendix B). Variables were distinguished from one another

in the instrument: some were allocated to the ECB profile section and others to the

ECB content and implementation list. The researcher established the operational

definition of profile variables. These are the demographics of the context in which the

reported ECB was conducted. For example, the domain or country of the ECB project

are profile or demographic variables and are not determined by ECB practitioners or

ECB stakeholders. The variables that fall in the ECB content and implementation are

those in which the values are decided upon by the practitioners or ECB stakeholders.

These are variables that could be a result of the ECB negotiation process. For

example, the ECB content or mode of delivery is determined during the negotiation

phase of the ECB.

After expert validation of the content of the instrument, the draft instrument

was pretested, item analysed and checked for internal consistency to verify its

reliability. The measure employed for internal consistency was Cronbach’s Alpha

(0.83, indicating high reliability). This was carried out on the assumption that the

items of the instrument measure a single construct tentatively based on the experts’

validation of the content. These results were reviewed and discussed several times

with the supervisors.

61

Rigor of ECB Measurement Practice: Developmental Model of Proficiency

In developing the assessment tool to measure rigor of ECB measurement

practice, the Developmental Model of Proficiency set out by the Center on

Continuous Instructional Improvement (Corcoran, Mosher, & Rogat, 2009) was

utilized as a guide to construct the rubric contents. The rubric construction adapted

the basic rubric development using the three-level structure and rubric-making

principles from Huba and Freed (2000). While this approach focuses on the links

between assessment and instructional development, the principles are applicable to

the context of ECB. The measurement practice progression scale is not a measure of

the practitioners‘ measurement competency but rather a description of the level of

measurement practitioners have applied in their ECB contexts.

The elements of Developmental Model of Proficiency were adapted (Table

3.3) in the following manner to fit this study‘s context:

Learning targets: this describes the mastery criteria of a given ECB

measurement practice component.

Progress variables: the progress variables are themes of the construct to

be assessed by the instrument. In the case of measurement practice in

ECB these are the criterion variables to be measured, for example the

―scope of variables‖, ―obtaining evidence‖, etc. (Appendix B).

Levels of achievement: this refers to the steps within each of the

progress variables that would serve as pathways for developmental

progression. In ECB measurement practice the levels could progress

from evidence that describes ―no application‖ and ‗low to high‖. Each

level would be described in such a way that it is exhaustive and

mutually exclusive.

62

Learning performance: this refers to the evidence described in a

particular level of the progress variable along the progression scale.

Table 3.3 Developmental Model Proficiency as Applied to Rigor of ECB

Measurement Instrument

Components In Corcoran et al. (2009) In this study

Learning Targets Mastery criteria of

learning targets in

classrooms

Rigor criteria of ECB

measurement practice

Progress Variables Dimensions of the learning

construct

Criterion variables to be

measured

Levels of Achievement Progression levels within a

dimension

Progression levels within a

criterion

Learning Performance Description of the

progression level within a

particular dimension

Description of evidence of

the progression level

within a criterion.

The following items constitute the assessment tool. The following variables

were identified and adapted as components of the measurement rigor rubrics from

Braverman (2013):

ECB Construct Variables: Individuals, Systems, Structure

Measurement approaches: Self-report, Observations, Multiple Sources

Scope of measures: Single Measure, Multiple Items, Multiple Measures

The following variables are additions in this study to the list of Braverman (2013):

Utilization: Design ECB, Guide ECB, Evaluate ECB

Representativeness: Non-probability samples, Probability samples, Census

Timing: During ECB, Short Time After ECB, Extended Time After ECB

Level of generalization: Anecdotal, Descriptive, Inferential

63

Design: Observational, Quasi-experimental, Experimental

Reliability measures

Sources of Potential Error

Since the study units of this research are existing ECB reports written for

various publication purposes and standards, they may not truly reflect or emphasize

the rigor of measurement practice. The emphasis could be on ECB approach or any

other aspect of ECB. There is also an issue on the perspective that is taken in this

study. Due to the focus on ECB measurement practice and measurement rigor, there

may be reports that do not suit this approach, such as an ECB report that adopts a

qualitative approach of evaluating an ECB. Inclusion of such a report in this would

naturally give the report a low level of measurement rigor when it should have been

examined on a different set of criteria for qualitative approaches. For example, these

kinds of reports may be examined using rubrics drawn from what Patton (2002) called

the ‘quality and credibility of qualitative evidence’. However, this is beyond the scope

of this research.

Another possible area of contention is the fact that the researcher developed

and applied the assessment instrument as the only rater of the ECB case units. To

strengthen this area of weakness, the standard practice of validating the instrument

was followed, using peer review of a panel of ECB practitioners and assessment

experts. The internal consistency and reliability of the tool were checked by pretesting

and item analysis. While inter-rater reliability is not established, this aspect is of little

concern because there is only one rater and it could be assumed that with the help of

the rubrics, some level of objectivity is achieved. This issue is presented in detail in

the section on research instrument development. There is a risk that fatigue and time

factor during the rating may affect the scores. This is addressed by the researcher

64

limiting case unit assessments to only 2 to 3 papers per day to avoid fatigue. Also, the

relative position of each paper in the sequence will be noted so that the variability of

scoring can be examined across the rating time and could be adjusted and addressed if

significant variance is found.

Data Management

The coding procedure was conducted by examining each ECB report and

will be carried out in two steps. The first coding was for the categories of ECB

context, content and implementation variables and the second coding for the

measurement of ECB measurement rigor using the established rubrics. A qualitative

software package called ATLAS.ti ("ATLAS.ti," 2014) was used for management of

ECB report textual data to categorical codes. This provides a tracking reference that

can link sections of the report to codes. In other words, the procedure is essentially

like assessing a piece of academic work, such as an essay, but using a well-developed

set of criteria, checklists and rubrics. A descriptive summary of the measurement

practice is written as an annotation to each case to qualitatively examine the piece of

work. The purpose of this step is to discern emerging themes and patterns that could

possibly validate, explain and provide a description of the quantitative values

produced in the quantitative analysis.

The scores generated by the scoring rubrics and some numerical data was

coded quantitatively to allow quantitative aggregation and statistical analysis

procedures. These data will be organized in a spreadsheet format suitable for

statistical analysis using IBM SPSS Statistics ("IBM SPSS Statistics," 2011). The

statistical software generated descriptive statistics such as frequency distributions and

65

descriptive summaries. Statistical tests such as comparisons, data reduction and

regression analysis was also examined.

Statistical Analysis

The statistical analyses that will be applied in this study are selected on the

basis of the nature of the research questions. The analysis diagram presented earlier in

Figure 3.1 shows the main statistical analyses that are suitable to answer the posed

research questions. The following statistical tools are used in this study:

Descriptive Statistics

These are the summary statistics in the form of frequencies, averages and

standard deviations used to provide descriptions of the quantitative and

categorical data sets. These tools will be used to generate answers for the first

research question and its sub-questions.

Binary Logistic Regression Analysis

Binary logistic regression analysis is a statistical modeling technique used to

predict the outcome of a categorical dependent variable based on one or more

predictors. The predictor variables could be numerical or categorical. This

regression estimates the odds that the dependent variable is a success

(Freedman, 2009). In this study, this analysis is applied to determine what

influences the decision to measure ECB, where the decision to measure is

taken as a binary categorical variable. The binary logistic regression analysis

in this study will be carried out using the IBM SPSS statistical software ("IBM

SPSS Statistics," 2011).

IRT Analysis

Item Response Theory (IRT) analysis, also known as latent trait theory

66

analysis, is an approach for the design, analysis and scoring of tests,

questionnaires, and similar instruments that measure ability, attitude, or other

variables. Central to the concept of IRT is a modeling technique that would

determine whether items of a test or content of a learning material (as in the

case of ECB) constitute a unified construct. This analysis also provides

information about whether the construct examined forms a hierarchical

structure of difficulty (Hambleton & Swaminathan, 1985). In this study, this

analytical technique is applied to address the second research question, that is,

to determine whether there is evidence to demonstrate that ECB content has a

unified learning construct. The application of the analysis will be carried out

using ConQuest, a statistical software for IRT (Wu, Adams, & Haldane, 2006).

Factor Analysis

Factor analysis is a statistical method used to describe, observed correlated

variables within a potentially lower number of unobserved variables, called

factors. It attempts to represent a set of observed variables in terms of a

number of common factors plus a factor which is unique to each variable. The

factors (also called latent variables) are hypothetical variables which explain

why a number of variables are correlated with each other. This will be used in

this study to determine whether the ECB content can be grouped into

categories or factors of similar traits (Thompson, 2004). The factor analysis

will be carried out using the IBM SPSS statistical software ("IBM SPSS

Statistics," 2011).

67

Role of the Researcher

A critical aspect of this research is the role of the researcher as the developer

and implementer of the research instrument. It is important that although the

development of the research instrument is carried out by a single researcher, peer

review is undertaken in the process. This approach ensured that the researcher

received critical feedback and diversity of views to examine and address possible

viewpoint biases. In the research instrument development, this was achieved first by

considering what the literature had to say about the construct to be measured. This

incorporated the views of authorities on ECB concepts and models, measurement in

evaluation and assessment principles. After drafting the instrument, the next step was

organizing a small group of colleagues in the field of ECB and assessment through

one-on-one meetings, small group discussions, online chat and emails on its content.

The comments and suggestions were carefully noted and considered in the revision of

the instrument draft. After a trial run and basic psychometric procedures, the final

instrument was examined and approved by the research supervisors, thus ensuring an

acceptable level of face validity of the research instrument.

Regarding the researcher‘s role as the only rater of ECB reports in the

study, the sources of assessment errors are identified and minimized. For example, to

make sure that fatigue does not affect scoring, a limited number of papers are

examined in a day depending on the length of the report. Long papers limit

assessment to two per day and at most three for shorter reports. The variability of

scores is examined with respect to the relative position of the papers in the assessment

sequence as well as the classification of papers with respect to length of report. That is

68

to see if score variability differs, for example, in the first half of the set or the second

half of the set.

The approaches taken confirm the credibility and trustworthiness of the

study despite the fact that the development and implementation of the research

instrument to assess ECB reports is conducted by a single researcher.

Ethical Concerns

This study used non-human subjects. This means that this falls into the

category of a low risk study which did not require ethics approval from the

University. However, the researcher used correct referencing of the case units

included in the study to ensure their representation and to disguise them. Even though

the ultimate paradigm used is quantitative, it was intended that the descriptive aspects

would minimize the bias and would clearly present the point of view of the study.

Conclusion

In summary, this chapter has presented the steps that will be undertaken to

answer the research questions The study will use the Broadbased Research Synthesis

approach to ensure the scope and systematic identification of the case units which are

to be included in the study. The chapter has described the research conceptual

framework that was constructed to identify the major components of the study, to

show relationships of the variables to be investigated and to outline the analytical

approaches to be carried out. The study‘s research instruments established, and the

data management plan and statistical analysis selected were discussed.

69

CHAPTER 4

RESULTS AND ANALYSIS


The investigation plan outlined in Chapter 3 is reported in this chapter.

The results are presented in the same order as that in which the main research

questions are posed. The chapter outlines the story of ECB measurement practice, and

how ECB practitioners deal with finding evidence of learning when they teach

evaluation to individuals and organizations. An analysis diagram is reproduced in

Figure 4.1 to facilitate understanding of the progressive presentation of the results.

The first research question is about the description of the content and

implementation of ECBs. Therefore, the sample and contextual profiles of the ECB

case samples included in this study are presented first in the first two sections. This

provides a picture of what has been taught in ECB practice. ECB content takes

prominence as this is the basis for examining what has been measured. The

researcher answers the second research question by examining in detail the

characteristics of ECB content topics whether there is evidence that the ECB content

follows a unified construct with the possibility of progressive structure and whether

these content topics could be grouped in specific ways. The second part stems from

the notion that ECB as a learning intervention implies possible existence of the

learning content construct as a basis for measurements.

70

Figure 4.1 Analysis Diagram and Research Questions Map

Main Research Question: How do we describe the measurement practice in Evaluation Capacity Building?

RQ1: What is the description of ECB measurement practice from empirical evidence?

RQ1A: What are the implementation approaches and content of ECBs delivered in published ECB reports?

RQ1B: What is the rigor of the reported measurements in ECB?

RQ1C: What determines the practice of measurement in ECB?

RQ2: Is there evidence to demonstrate that:

RQ2A: ECB content follows a unified learning construct and a possible progressive structure?

RQ2B: ECB content could be grouped in specific ways?

Measurement Practice

RQ 2B

RQ2A

Evaluation Capacity Building

Content

Implementation

Context

Unidimensional Assumption

(IRT Analysis)

Multidimensional Assumption

(Factor Analysis)

ECB Construct and

Progression

ECB Content Sub-

domains

Levels or categories of

content, implementation

and context variables Predictive Influence

(Logistic Regression)

RQ1A

RQ1B ECB Learning Structure

Rigor of measurement

Decision to Measure

RQ1C

71

The Sample Profile

This study includes 63 ECB cases published in various journals in the

field of evaluation. These cases follow the inclusion criteria presented in detail in

Chapter 3. The ECBs considered in this investigation pertain only to ECBs that

engaged in evaluation capacity building in relation to organizations. This means that

formal trainings in evaluation, for example in universities, with no organizational

involvement, are excluded. Published ECB reports that focus on research skills

building but did not explicitly target evaluation capacity were also excluded even

though they target similar technical competencies in ECB among individuals and

organizations.

Since the Labin et al‘s. (2012) conducted the definitive work to date on

ECB cases, Table 4.1 was produced as a comparison of the case samples examined

the Labin report. As their work has been recognized as an important benchmark for

ECB studies, it is imperative to relate the study sample to their work. This study only

accounts for 79 percent (48 of the 61) of the cases in Labin‘s synthesis study. This is

because of the selection criteria for this study which some of the Labin sample did not

satisfy. Also, some articles could not be located online, while the full text of other

studies was not available. The online search for articles in this study was extended to

include the most recent published report using the same search terms, journals and

databases as applied in the Labin study. The scope of this investigation is limited to

published ECB reports from 1978 to 2013 and restricted to ECB exercises in

organizations. The sample in this investigation does not represent ECB practices in

general, as many organizations do not publish their work and the search execution has

limitations. This means that statistical conclusions are limited only to the population

72

of ECB reports represented by the sample and cannot be generalized to the entirety of

ECB practice in the field. However, the results and findings may be sufficient to

create a picture of ECB measurement.

Table 4.1 Case Sample of this Study and the Labin et al. (2012) Sample

Case Reference Years Published Number of Cases

Labin et al. (2008) 1978 - 2008 48

Recent search 2008 - 2013 15

Total cases in this study 1978 - 2013 63

Note: Total number of studies in Labin et al. (2008) = 61

Table 4.2 presents the distribution of ECB case reports according to the

journals in which they are published. Five journals published the highest number of

ECB case reports: (1) New Directions for Evaluation; (2) Evaluation and Program

Planning; (3) American Journal of Evaluation; (4) The Canadian Journal of Program

Evaluation; and (5) Evaluation. These are among the leading journals in the

evaluation field. These five journals published 70 percent of the case reports

considered in this study. The remaining journals that published ECB case reports

represented the multidisciplinary domain of evaluation; these are the primary users of

evaluation such as the fields of education, health and social interventions.

73

Table 4.2 Journals that Published ECB Case Reports in the Sample

Name of Journal Number of ECB

Reports

New Directions for Evaluation 12

Evaluation and Program Planning 11

American Journal of Evaluation 10

The Canadian Journal of Program Evaluation 6

Evaluation 5

Journal of Prevention and Intervention in the Community 3

AIDS Education and Prevention 2

American Journal of Community Psychology 2

Professional School and Counseling 2

Evaluation Journal of Australasia 1

Evaluation Review 1

Gifted Child Quarterly 1

Health and Social Work 1

Journal of Community Practice 1

Journal of Health Care for the Poor and Underserved 1

Journal of Organizational Behavior Management 1

R&D Management 1

The Journal of Experiential Education 1

Total 63

Figure 4.2 displays the timeline plot of these case reports. While the range

appears to be wide, only six cases are within the period prior to year 2000. The graph

appears skewed to the left through time with several peaks after the year 2000. This

may indicate the interest generated by AEA conferences, conducted in 2000 and 2001,

which respectively highlighted ECB and evaluation mainstreaming. (Leviton, 2001).

74

The decade after year 2000 seemed to be a period of empirical ECB reporting with a

total of 49 cases (78%) in that span of time alone. Although, there seems to be a

declining trend after 2010, but it should be noted that this graph only pertains to ECB

case reports and not to ECB literature in general or ECB activity, which there is no

way to track in this study.

Figure 4.2 Timeline and Distribution of Published ECB Case Reports

The publication on ECB literature in general is a different story. Search

results produced a total of 58 publications on ECB related literature from 2008 to

2014. After applying the selection criteria, only 15 cases were included in the sample.

Thus, there are more ECB conceptual and theoretical conversations but fewer ECB

field case reports appearing in the literature. Conceptual and theoretical discussions

on ECB are possibly on the rise but fieldwork reports such as those included in this

study appear to be declining. This comparison of publications of theoretical and

conceptual ECB versus empirical ECB appears to support the claim that empirical and

evidence-based conceptualizations of ECB are still wanting. Also, it could be possible

0

1

2

3

4

5

6

7

8

9

1975 1980 1985 1990 1995 2000 2005 2010 2015

Nu

mb

er

of

Pu

bli

shed

EC

B

Rep

ort

Year

75

that journals stopped publishing ECB case reports because they are not new or unique

contributions anymore.

The United States of America is the leading country when it comes to

publishing ECB reports, accounting for 67% of the total case reports (Table 4.3).

Canada and Australia follow, but pale in comparison with the U.S., which published

about six times more than Canada and ten times more compared with Australia. Of

the four ECBs conducted in Australia, three were published elsewhere; only one

appeared in the Evaluation Journal of Australasia. This may indicate that the U.S. is

still the leading country when it comes to publishing evaluation research and activities

with its similar role in the evaluation history; or it may only indicated that the U.S.

constituents were quick to publish what they have been doing.

The ECB reports examined here were mostly (75%) from countries that

are members of the Organization for Economic Co-operation and Development

(OECD), which are economically advanced countries. The published ECBs from non-

OECD countries (Afghanistan, Ghana, and Latin America and the Caribbean) were

conducted in partnership with an agency from an OECD country; for example, the

Ghana report was conducted in partnership with a United Kingdom development

agency. Therefore, the sample of ECB cases in this study does not represent the

developing world. It is highly probable that when international development

agencies conduct ECB among the recipient developing countries, but results may not

have found their way to publication or they may be published but only within the

agencies‘ circulations. The sample for this study fails to capture the global situation of

ECB as there is a growing movement of adoption and adaptation of evaluation in

Southeast Asia and other parts of the world (Grob, 2010). The implication of this

76

particular background is that results of this study could only be attributed to the

population of ECBs conducted in developed countries as represented by the sample.

Table 4.3 Countries where ECB Case Reports were Conducted

Country/Region Number of ECB

Reports

United States of America 42

Canada 7

Australia 4

Afghanistan 1

Denmark 1

Ghana 1

Japan 1

Latin America and the Caribbean 1

Mexico 1

New Zealand 1

Spain 1

Sweden 1

Total 63

To summarize, this study includes mostly ECB cases in developed

countries, in particular the United States, and predominantly published in evaluation

journals. Although the period covers from 1978 to 2013, the majority of the cases

were published between the decade 2000 and 2010. They also point to the limitation,

that ECB cases which may have occurred in international development domains

77

outside developed countries are not being captured in this report. These characteristics

provide the scope on which the conclusions of this investigation can be inferred.

ECB Contextual Profile

The ECB contextual profile provides a general description of the

environments in which these ECB cases in the sample were conducted. This section

will detail where these ECBs have occurred: the domain of disciplines; the type of

organizations; the number of organizations and programs; the kinds of program

delivery; and even the affiliation of ECB practitioners and their methodological

paradigms.

These ECBs occurred mostly in organizations, institutions or agencies that

deal with social interventions. These domains include health, education, community

development, child and youth development, and research and policy (Table 4.4).

These domains are broad classifications of the fields where the ECBs were conducted.

For example, Fetterman and Bowman‘s (2002) experiential education and

empowerment evaluation of the Mars Rover Educational Program is classified under

the domain ‗Education‘.

Seventy-six percent of the ECB cases were in the domains of health and

education. These include, for example, organizations, that provided services for HIV

prevention or schools that tested a new guidance and development program. Agencies

that perform social work, such as child, youth and community development programs

comprised eight percent of the cases. Examples included violence prevention, adult

education and afterschool care programs. ECBs appear to have low occurrence in

agencies that focus on research and policy (10%).

78

Table 4.4 Distribution of ECB Domain

ECB Domain Number of ECB

Reports

Percentage

Health 28 44%

Education 20 32%

Community Development 12 19%

Research and Policy 6 10%

Child and Youth Development 5 8%

Note: ECB case may have multiple domains

Table 4.5 shows the type of organizations (also institutions or agencies)

where these ECBs have occurred. About half of the ECB initiatives were undertaken

by non-profit organizations (51%) followed by government agencies (29%). Schools

and school districts comprise only 16 percent of the share. This may signify the active

role of non-profit organizations in social development programs. In terms of ECB,

results indicate that these are the primary consumers or sources of ECB demand.

Evaluators that specialize in ECB may find practice niches in NGOs, government

agencies and education institutions.

Table 4.5 Type of Organization

Type of Organization Number of ECB

Reports

Percentage

Not-profit 32 51%

Government 18 29%

School or School District 10 16%

University 2 3%

For-profit 1 2%

79

The types of program summarized in Table 4.6 describe the primary

purpose for which ECBs occurred. For example, ―services‖ refers to organizations

that specialize in the provision of health care or education; ―education or capacity

building‖ refers to organizations that provide training programs; ―advocacy‖ for

organizations that specialize in campaigns such as anti-smoking and tobacco bans;

and ―research and policy‖ for organizations that focus on policy development, for

example, the Food Program for Policy in Ghana. Note that some organizations have

multiple program delivery priorities. For example, one may provide health care

services and advocacy at the same time. Hence, totals in Table 4.6 exceed the total

number of ECB cases in the sample. Findings show that most of the ECBs were

conducted in organizations that deliver ―services‖ and ―education or capacity

building‖ programs to their beneficiaries.

Table 4.6 Type of Program Delivered

Program Delivered Number of ECB

Reports

Percentage

Services 47 75%

Education or Capacity Building 26 41%

Advocacy 12 19%

Research 9 14%

Note: ECB case may have multiple programs delivered

ECBs were conducted in single or multiple organizations (Table 4.7).

Multiple organization ECBs occur, for example, when a funding agency gathered all

region-wide organizations that are recipients of funds for some training on program

evaluation. This is also the case for some national agencies operating at a head office

80

with satellite independent organizations nationwide, for example, a federal health

agency calling their direct line and allied agencies to engage in an ECB. Single

organization ECBs were carried out specific to one organization. Data show that

ECBs conducted on single or multiple organizations were almost equal in number: 52

percent for single organizations and 48 percent for multiple organizations. In terms of

organization composition, single or multiple organizations are almost equally

represented in the sample.

Table 4.7 Number of Organizations in an ECB Activity

Number of Organizations Number of ECB

Reports

Percentage

Single Organization 33 52%

Multiple Organizations 30 48%

Most ECBs were carried out by organizations that run multiple programs

(Table 4.8), that is, an organization may carry out many intervention programs at

once. Only about a third (32%) of ECB cases were running single programs. This has

possible implications with respect to ECB demand. Organizations with multiple

programs may have the tendency to engage more in ECB. As to program sites, almost

all programs run parallel implementation in multiple sites (Table 4.9). For example, in

a school district, a new guidance program was simultaneously implemented in several

locations. Multiple site programs account for 90 percent of the programs mentioned in

the ECB reports.

81

Table 4.8 Number of Programs in an ECB Activity

Number of Programs Number of ECB

Reports

Percentage

Single Program 20 32%

Multiple Programs 43 68%

Table 4.9 Number of Program Sites

Number of Sites Number of ECB

Reports

Percentage

One-site 8 13%

Multi-site 55 87%

Considering those who carried out reporting and publication of these ECBs,

it appears that most ECB facilitators were university affiliated (65%). This is not

surprising, as those who are connected with universities are typically expected to

publish their work. Authors who are affiliated with the internal evaluation unit of

organizations were often co-authors of university-based ECB facilitators. The sample

included very few private consultancy evaluation practitioners who publish their

work. Based on the data evidence at hand, there was no way of determining how

many of these private practice evaluators conduct ECBs in the field. However, this

information indicates the significant and important role of universities: (1) publishing

on ECB, and (2) in promoting and advocating evaluation to the organizations and

often through partnership.

82

Table 4.10 presents the affiliation of ECB facilitators that reported ECB

practice. Thirty percent of the ECB reports were published by Internal Evaluation

Unit, suggesting some interest in self-reflection or self-analysis.

Table 4.10 Affiliation of ECB Facilitators

Affiliation Number of ECB

Reports

Percentage

University 41 65%

Private Consultancy 6 10%

Internal Evaluation Unit 19 30%

Note: There is an overlap of categories as some of university affiliated evaluators were also involved in

some organization’s internal evaluation units.

Lastly, the final contextual factor in the ECB implementation profile was

examining paradigms used to report on the results of ECB efforts. Table 4.11 reveals

that the majority of the ECB facilitators (close to 90 percent of the ECB reports) made

use of qualitative methods. This was more than double those that use quantitative

methods. A third of all the cases used both qualitative and quantitative methods

(multiple methods). In this third group, there were no explicit statements in the reports

about specific research or reporting paradigms used. Therefore, it could not be

determined whether some of these multiple methods intentionally followed the mixed

methodology; that is, systematically integrating both methods to achieve an analysis

(as opposed to the eclectic use of methods) to give a more robust picture of the

phenomenon. This distribution of methodological paradigms can be best depicted

relative to their sizes using a Venn-diagram (Figure 4.3). This is an important finding

in relation to the researcher‘s main query regarding the measurement practices in

ECB. The figure demonstrates that quantitative methods are much less common in

83

ECB publications. This will have implications in the following analysis and the

inferences of this study.

Table 4.11 ECB Case Report Methodological Paradigm

Methodological Paradigm Number of ECB

Reports

Percentage

Qualitative 55 87%

Quantitative 22 35%

Multiple Methods 14 22%

Note: Categories are not mutually exclusive; qualitative and quantitative counts include multiple

methods counts

Figure 4.3 Venn-Diagram of the Methodological Paradigms of ECB Reports

In summary, the contextual profile of published ECBs included in the sample

reveals that the majority of the ECB cases were focused on social intervention

Qualitative, N1 = 55

Quantitative, N2 = 22

41 14 8

N = N1N2 – N1N2

N = 55 + 22 -14

N = 63

Multiple, N1N2 = 14

84

domains with priorities on health, education, and community development areas. It

also shows that about half of these were conducted by non-profit organizations

followed by government agencies and schools that provided delivery of social

services and education and capability building activities. ECBs were conducted in a

similar percentage of single organizations, and clusters of multiple organizations,

mostly on multiple programs delivered at multi-sites. University-affiliated ECB

practitioners largely authored the ECB publications in partnership with internal

evaluation units of the organizations. The reporting of results from ECBs was mostly

presented in a qualitative paradigm. This summary of contextual profile points out the

clear scope and limitations of the following analysis, inferences and conclusions.

Bearing mind on this background information, the profile of the sample and

the ECB contextual descriptions, the subsequent sections will set out to answer the

stated research questions. The first research question is stated as follows:

Research Question 1: How can ECB measurement practice be described

from empirical evidence?

This research question is dealt with by considering the three related sub-

questions. Answers to these sub-questions will form the composite response to

Research Question 1.

Research Question 1A: What are the content and implementation

approaches of ECBs found in published ECB reports?

85

This first research sub-question covers the content and implementation

strategies of ECBs included in the sample for this study. This describes what is taught

and what occurs in ECB activities. It also seeks to validate and affirm the descriptions

offered by earlier ECB systematic reviews and syntheses (Labin, 2014; Labin, et al.,

2012; Suarez-Balcazar & Taylor-Ritzler, 2014). The presentations here are essentially

descriptive, providing an overall background picture of the content and

implementation of ECB activities in the field. More importantly, these results will be

related later in the investigation with regard to measurement practices. This section is

divided into two parts: the content and implementation. The contextual profile is

presented in the preceding section beginning on page 77.

ECB Content

This section presents the documentation of ECB content topics delivered

by the ECB initiatives considered in the ECB sample. There were 63 cases examined

in this study of which only 57 cases reported ECB content topics. An initial checklist

was developed in the coding form to determine the ECB content, where content is

also referred to as ECB ―topic‖ throughout this report. After the coding some topics

were added but, essentially, there was little variation from the Labin et al. (2012)

report (See Appendix B for comparison note with Labin et al. instrument). Most of the

topics in the checklist focused on evaluation awareness and attitudes, evaluation

terms, approaches or methods, logic models and the like. However, when reviewing

the case samples, it was noted that some cases documented several topics, such as

ECBs that focused on both program planning and program implementation. There

were also several cases that referred to an ECB activity as ―training in evaluation‖ but

which comprised topics that seemed to focus more on research technical skills such as

86

data management and analysis. The final checklist of ECB content based on topical

themes generated 17 topic categories. The ECB content categories generated from the

checklist were initially categorized broadly into two groups based on Labin et al.

(2012). The groupings refer to ECB topics that target individual capacities and those

that target organizational capacities.

Figure 4.4 presents the ranked frequency distribution of the topics found in

the ECB case samples which target individual evaluation capacities. The bar graph

reveals the counts relative to each topic. As can be observed from the graph, an ECB

may have multiple related topics on evaluation, depending on the learning needs of

individuals and organizations that might have been negotiated prior to the conduct of

the ECB initiative.

The counts on ECB content show that the topic with the greatest frequency

was the creation of an ―evaluation plan‖. In some ECB cases, the topic ―evaluation

plan‖ co-occurred with the topics ―program planning‖ and ―program implementation‖.

Along with the topic ―evaluation plan‖ more than 50 percent of the ECB cases

covered the topic of data management, the ―how to‖ of conducting an evaluation, and

evaluation basics such as evaluation terms, approaches or methods. Nearly half of the

cases included logic models. While these contents related mostly to evaluation, some

essential skills in program management, as well as research skills (such data

management, analysis, interpretation or use) were also included.

―Evaluation awareness and attitudes‖ was the least frequent topic (14%)

in all the ECB cases. This does not mean that the theme of ―evaluation awareness and

attitudes‖ is not popular or important. It may signify that the organizations requesting

the ECBs already had a strong conviction of the importance of evaluation; hence,

there is no need to prime for positive attitude towards evaluation or awareness of the

87

significance of evaluation. However, this count may also represent organizations that

still struggle to convince their constituents of the relevance of evaluation for them.

Figure 4.4 ECB Content Targeting Individual Level Capacity (N=63)

9

15

20

30

33

37

37

42

0 10 20 30 40 50 60

Evaluation awareness and attitudes

Program implementation

Program planning

Logic Models

Evaluation terms, approaches or methods


Data management, analysis,…

Evaluation Plan

Number of ECBs

ECB Content: Individual Level

88

Figure 4.5 ECB Content Targeting Organizational Level Capacity (N=63)

The results of organizational level content topics of ECB are presented in

Figure 4.5. Nearly 20 percent of the cases covered topics on creating, strengthening

or building evaluation systems, evaluation structures, or culture for evaluation.

Evaluation Systems here referred to the establishment of organizational flow of

evaluative information, feedback loops and utilization. Evaluation Structures referred

to the establishment of evaluation units or networks as well as the management

capability for data and information; and Evaluation Culture referred to topics on

beliefs and behaviors relating to evaluation as an organization.

Approximately 10 percent of the cases covered ―improving the social

network‖ within and outside the organization for evaluation, and less than 10 percent

of the cases touched on ―evaluation support‖ such as leadership, resources, and

policies. As was the case with ―evaluation awareness and attitudes‖ for individual

3

4

5

5

6

7

10

10

13

0 10 20 30 40 50 60

Evaluation readiness and willingness

Creating/strengthening evaluation policy

requirements

Organization evaluation practices

Building leadership support

Creating/strengthening support for

evaluation resources

Improving organizational evaluation social

network

Building culture for evaluation


structures

Creating/strengthening evaluation systems

Number of ECBs

ECB Content: Organizational Level

89

evaluation capacity content, the low number of ECB cases that discuss ―evaluation

readiness and willingness‖ at the organizational level does not necessarily reflect the

unpopularity of evaluation, rather, it could indicate a reduced need for ECB

motivation, as these organizations could be already convinced of the significance of

evaluation and the need to improve the organization‘s individual and organizational

evaluation capacities.

Overall, about 40 percent (23 of 57 cases), of the ECBs reported content

areas that were related to organizational evaluation capacity (Figure 4.6). This group

of topics is broadly categorized as ‗organizational level‘ content because they each

address the organizational capacities for evaluation. Ninety-eight percent reported

content areas pertaining to individual evaluation capacity, while only one case

conducted an entire systems work for organizational evaluation capacity.

Figure 4.6 Venn-diagram of the Capacity Change Target of ECB Reports

Of the 23 cases that reported ECB at the organizational level, 56 percent (13

of 23 cases) completed evaluation systems and 43 percent (10 of 23) focused on

Individual, N1 = 56

Organizational, N2 = 23 34 22

1

N = N1N2 – N1N2

N = 56+23-22

N = 57*

(* Six cases with no content report)

Blended, N1N2 = 22

90

evaluation structures and cultures. This reveals two critical points: (1) there is a need

to measure evaluation systems, structures and cultures to determine effectiveness of

these evaluation training inputs; and (2) there has to be a way of measuring them. This

result attests that the majority of the existing ECB measurement tools were centered

on individual evaluation capacities (Labin, 2014).

At this point, there is a clear picture of what ECB content had been delivered

to organizations. ECB topics covered mostly involved creating an evaluation plan,

improving data management and analysis skills, learning about the basics of

evaluation, evaluation implementation and logic models. Few organizations touched

on organizational level content such as creating systems and structures for evaluation

and developing a culture for evaluation in the organization.

This description of content that has been delivered during ECBs has strong

implications as to what outcomes should be sought in ECBs. Clearly, one cannot

expect learning outcomes from content that has not been delivered or organized. If

most ECBs are focused on improving individual evaluation capacities, then improved

organizational evaluation capacities cannot be expected. Thus, there is a need to

examine the level of expectations with regard to program outcomes.

ECB Implementation

Evaluators recognize the fact that they have multiple roles in the

implementation of ECB. Evaluators are not only evaluators, but also teachers,

trainers, facilitators, managers, critical friends, coaches, and policy advisers. In ECB

implementation, these roles usually occur in combination, sometimes in a conflicting

manner. Primarily, however, the main role of an evaluator in an ECB is to teach about

evaluation. The approach could vary from an informal demonstration and

91

participatory technique to formal full-time in-house training sessions. These roles

ensure that ECB activities are delivered to and address the intended target capacity

change using the appropriate strategies. This section considers at ECB

implementation with respect to intended ECB target, the focus participants and the

strategies of implementation.

The Integrated ECB Synthesis Model (Labin, 2014) shown in Figure 4.7

could provide a reference framework for where these implementation factors could be

located. In the model, ECBs are intended to improve the evaluation capacities of an

organization, at both the individual and organizational levels. This is carried out

through the ECB ―Activities‖ component of the model.

92

Figure 4.7 Integrated Evaluation Capacity Building Model (Labin, 2014)

93

Intended Target of ECB

ECBs aim to improve individual or organizational evaluation capacities.

Table 4.12 presents the intended target capacities of ECBs included in this study. This

is different from the ECB content target categorization as presented earlier as this one

is the explicit intended target as reported. Sixty-four percent targeted individual

capacity change. A third of the cases targeted both individual and organizational

evaluation capacities. Combining the categories for individual or organizational

ECBs, this comprises 97 percent of ECBs that are focused on improvement of

individual evaluation capacities and 36 percent for organizational evaluation

capacities. These findings suggest that most ECBs are limited to individual evaluation

improvement compared with organizational evaluation improvement.

Table 4.12 Intended Target of ECB

Target Change Number of ECB

Reports

Percentage

Individual capacities only 40 64%

Organizational capacities only 2 3%

Individual and organizational

capacities

21 33%

Table 4.13 presents the participant focus of ECB. Counts show that 90

percent of the cases focused on evaluation training of the program staff and little over

half (59%) focused on program managers. Results also show that only about one-third

(30%) involved the leadership in the evaluation capacity training and a few included

program beneficiaries (14%). These findings imply that if ECBs are to target

organizational change, then ECBs might need to improve on engaging the leadership.

94

In Labin‘s IECB model (Figure 4.7), it is assumed that ―program outcomes‖ are the

ultimate outcome of an ECB, and additionally, as a function of both the individual and

organizational evaluation capacities. However, it appears in these findings that the

ECB targets in practice were skewed towards individual evaluation capacities and

omitted the leadership in the process.

Table 4.13 Participant Focus of ECB

Participant Number of ECB

Reports

Percentage

Staff 57 90%

Managers 37 59%

Leadership 19 30%

Beneficiaries 9 14%

Note: ECBs may have multiple participant focus

ECB Implementation Strategies

ECB implementation strategies refer to the collective ECB delivery

approaches. These are the ECB teaching strategies, the delivery mode and the contact

duration during the ECB activity.

Table 4.14 Type of ECB Teaching Strategies

Strategies Number of ECB

Reports

Percentage

Direct Training 39 62%

Technical Assistance: Coaching,

Mentoring, Consultations

39 62%

Participatory Evaluation 40 63%

Learning Materials (Print or Online) 13 21%

Note: ECBs may use multiple teaching strategies

95

Table 4.14 shows that the ECB teaching strategies employed occur in similar

proportions (62%). These are the teaching of evaluation through direct training; the

supplementation of various technical assistance modes; and the involvement in actual

evaluation or participatory evaluation. Only about 20 percent of the reports

mentioned the use of evaluation manuals, templates and web resources in conjunction

with direct training, technical assistance or participatory evaluation. Since each ECB

could use multiple teaching strategies, the Venn-diagram shows the overlaps and

complements among these strategies for a clearer picture of how these strategies were

combined in practice (Figure 4.8).

The six pair-wise diagrams compare four strategies two at a time (4C2=6), to

reveal which strategies tend to be used together. Although there are cases where more

than two strategies were used, grouping them in pairs enables simple catergorization

of subgroups for subsequent comparative analysis. Approximately one-third of all the

cases (32%) combine direct training approach and participatory ECB (Venn diagram

A). This figure shows that practitioner preferences in ECB strategy are almost equally

divided into three groups: (1) those that combine direct training and participatory

approach, and those that conducted ECBs either as (2) exclusive direct training or (3)

participatory strategy only. This finding is important in the sense that ECBs can be

categorized by strategy for comparison, for instance with respect to their measurement

practices. This categorization may provide some basis for determining whether the

choice of strategy is related to measurement practices.

Only 4 of the cases (as shown outside the two circles in Venn diagram A) do

not use direct training or participatory ECB. These cases considered at ECB as

establishing evaluation system and processes within an organization, rather than

through teaching.

96

Figure 4.8 Pairwise Combination of ECB Strategies

(Direct Training Participatory)‘

Direct

Training

19

Participatory

20 20

4

(Direct Training Technical Assistance)‘

Direct

Training

Technical

Assistance

25 14 14

10

(A) (B)

(Direct Training Learning Materials)‘

Direct

Training

Learning

Materials

8 31

5

19

(Participatory Technical Assistance)‘

Participatory

Technical

Assistance

12

28

11

12

(C) (D)

(Participatory Learning Materials)‘

Participatory

33

17

Learning

Materials

7

6

(Technical Assistance Learning Materials)‘

12 Technical

Assistance

30

Learning

Materials

9 4

(E) (F)

97

For example, these are the organizations that believe in the diffusion of evaluation

culture through the influence of evaluation champions or evaluation learning groups.

This is an approach that adheres to a belief that evaluation capacities are not

necessarily products of direct training or participatory evaluation.

The ECB cases that used the participatory approach only without direct

training can be described as ―opportunistic ECB‖. The term ―opportunistic‖ is used in

a positive sense here, connoting the desire of those in the evaluation profession to

teach about evaluation whenever there is opportunity to do so. Often, these

opportunities arise during an evaluation activity. ECBs of these types are ―add ons‖ to

evaluation activities and are generally not reflected in the evaluation contracts.

Rather, they are a form of professional obligation that any evaluator might typically

undertake voluntarily.

Almost 50 percent, that is 28 of the 63 cases, used technical assistance and

participatory approach for ECB (Venn diagram D). This is closely followed by

technical assistance and direct training approach with 25 of the 63 cases (Venn

Diagram B). This shows that technical assistance appears to be a basic feature of ECB

that practices direct training and participatory approaches. These forms of technical

assistance include consultations, coaching or mentoring using face-to-face, telephone

or online communication media.

The data on the use of supplementary learning materials may not give an

accurate picture of this strategy (Venn Diagram C, E and F). These data only record

cases that explicitly mention the use of training manuals, activity guides, templates or

web based interactive resources. Learning materials maybe used but not mentioned

during direct training or participatory strategies.

98

The mode of strategies refers to the media of communication used during

the conduct of ECB. Table 4.15 shows the distribution of these preferred modes. It

appears that the face-to-face mode is the most preferred approach. It comprises of 95

percent of the ECB activities, that is, face-to-face only plus face-to-face combined

with other modes. Other modes include telephone and online meetings or conferences.

Only three cases of ECB used this remote ECB delivery mode only. This result tells

us that ECB delivery is far from fully adopting distance learning methodologies. The

modern virtual communication technologies appear to replicate the interaction needs

for learning purposes, at least within this sample.

Table 4.15 Mode of Strategies Reported

ECB Delivery Mode Number of ECB

Reports

Percentage

Face-to-face only 34 54%

Face-to-face combined with other modes 26 41%

Other modes not including face-to-face 3 5%

Assuming quality, content and strategies were equal, teaching ―dosage‖ may

have a significant impact on learning. More time devoted to ECB may translate to

better ECB outcomes. In this investigation, it seems that ECB appears to engage in

training of significant duration (Table 4.16).

99

Table 4.16 ECB Contact Duration

Contact Duration Number of ECB

Reports

Percentage

One day or less engagement 1 2%

Single 2-3 day engagement 8 13%

Multiple times a year or multiple

years engagement

51 81%

Note: Three (3) cases did not give an indication of ECB duration

The majority of the published ECBs (81%) engaged the ECB activities

multiple times in a year or once a year for multiple years. Only about 15 percent of

the cases had less than three days of engagement. This means that most ECB

negotiations in this sample had the underlying understanding that ECB requires

sustained engagement and goes towards demonstrating that people see it as a long

process. The distribution of duration indicates a recognized need for multiple

engagements for a longer period of time. Most of these engagements were periodic

and coupled with technical assistance in between.

In summary, ECBs were either implemented as direct training, indirect

participatory evaluation learning or a combination of both with follow through

technical assistance engagements. The majority used face-to-face ECB delivery mode

and longer duration of engagement for multiple times a year or multiple years. Only

about one-third focused on improving organizational evaluation capacities compared

with the majority that focused on improving individual evaluation capacities. These

descriptions and categorization of ECB content and implementation along with the

100

ECB contextual profiles presented earlier in this section, will provide a basis for

subsequent statistical analysis in this report.

Answer to Research Question 1A

What are the content and implementation approaches of ECBs found in

published ECB reports?

In the field practice of ECB, as represented by the sample of this

investigation, ECB content and implementation tended to focus more on individual

evaluation capacity building compared with building organizational evaluation

capacity. ECB content and implementation include building fundamental knowledge

and skills in carrying out evaluations at the program level of organizations. Only a

few ECB cases focused on organizational evaluation capacity including systems,

processes, support, and evaluation culture. The majority of the reports appear to limit

ECB to individual evaluation improvement. The ECB implementation is equally

divided into those that conducted direct training, indirect participatory learning or a

combination of both. Most ECB efforts incorporated technical assistance and the

majority favoured face-to-face engagement for a longer training duration. Most ECBs

involved program staff and managers compared with a few that involved the

organization leadership and program beneficiaries.

101

Research Question 1B


The operational definition of rigor in this study refers to how well and how

robust were the measurements carried out in ECBs. An instrument was developed

(Appendix B) to determine the rigor of ECB measurement practice based on

measurement criteria that Braverman and Arnold (2008), and Braverman (2013)

suggested. However, this research question does not apply to all ECB cases as only 22

(14 of 63) percent of the ECB cases reported measurement of ECB outcomes (Figure

4.9). It follows that answers to this research question refers only to this subset of ECB

cases. This section examines how these measurements were carried out with respect

to rigor standards of measurement.

Figure 4.9 ECB Outcomes Measurement

22%

78%

Measured ECB outcomes

Did not measure ECB

outcomes

102

Table 4.17 outlines the criteria on which ECB measurement practice can be

examined. The criteria include those that were outlined by Braverman and Arnold

(2008) and Braverman (2013), such as the scope of the variables measured, how

evidence was obtained, the reliability characteristics of the measurement tools used,

the intended utilization of ECB measures, representativeness, timing of measurement,

validity of inference and the measurement design used. The table provides columns

for comparison of percentages of the cases relative to the small sample, those that

measure their ECB outcomes, and those relative to the total number of ECB cases

examined. The comparison percentage columns show that the practice of ECB

outcomes measurement has not been a common practice relative to the number ECB

cases in this study sample. Before presenting through the details of the measurement

rigor criteria for ECB, the validity and reliability of the instrument used for this

assessment is discussed in the following section.

Validity and Reliability of the Rubrics

The content validity of the rubrics for rating the rigor of measurement in

ECB cases is based on the criteria that Braverman (2013) outlined in considering what

constitutes good measurement practice. The process of developing this tool also

included a review of each rubric item by a panel of practitioners in assessment.

Details of the development of the rubrics to establish their content validity was

discussed in Chapter 3. The Cronbach‘s Alpha, a measure of reliability of the items, is

0.83 which indicates high reliability. This provides confidence that this instrument to

measure the rigor of ECB measurement practice is stable enough to produce the same

measurement results 83 percent of the time different samples from the same

population of ECB cases is used.

103

Table 4.17 Rigor of ECB Measurement Practice

Criteria and Items

Nu

mb

er o

f E

CB

Ca

ses

Percentage

Relative to

EC

Bs

wit

h O

utc

om

es

Mea

sure

men

t ,

N =

14

All

EC

B C

ase

s, N

= 6

3

Scope of variables measured

Measured individual’s evaluation capacity (which

may include awareness, knowledge, skills or

attitudes).

10 71% 16%

Measured the organizational evaluation capacity

that includes evaluation leadership, policies,

systems, resources or structures.

2 14% 3%

Measured the organization’s contextual measures

such as social climate, learning capacity, culture

or social network.

1 7% 1.5%

Obtaining Evidence

Indirect measurement: testing such as self-report

or self-rating only.

7 50% 11%

Direct measurement: obtained by direct testing or

observation only.

0 0% 0%

Combination of indirect and direct measurements. 7 50% 11%

Reliability of Measurement Tools

Uses tools with unreported or unmeasured

reliability.

10 71% 16%

Uses tools with reported reliability and within

acceptable values.

3 21% 5%

Uses standardized or validated measurement

instruments.

0 0% 0%

Representativeness

The measurement used non-probability sample. 0 0% 0%

The measurement used probability sample with

random sampling techniques.

0 0% 0%

The measurement used all case units or most cases

of the population of interest.

13 93% 21%

104

Continuation of Table 4.17

Criteria and Items

Nu

mb

er o

f E

CB

Ca

ses

Percentage

Relative to

EC

Bs

wit

h O

utc

om

es

Mea

sure

men

t ,

N =

14

All

EC

B C

ase

s, N

= 6

3

Intended Utilization of ECB Measures

ECB measures are used to establish baseline

information to inform ECB design.

0 0% 0%

ECB measures guide ECB implementation. 0 0% 0%

ECB measures are used to evaluate ECB impact. 12 86% 19%

Timing of Measurement

Measurements were only made once at the

beginning or at the end of ECB project.

2 14% 3%

Measurements were made at the beginning and at

the end of ECB; may include measures during

ECB.

9 64% 14%

Measurements were made over an extended period

of time after ECB to see changes in the long term.

3 21% 5%

Validity of Inference from Obtained Measures

The conclusions are at best anecdotal with

descriptions of evaluation capacities but no

measures to back up claims.

0 0% 0%

Descriptions of evaluation capacities were made

and backed up by figures from measures; may also

extend to comparing measures.

4 28% 6%

Conclusions were carried out with sound measures

and statistical procedures that warrant statistical

inference, e.g. hypothesis testing or modeling.

10 71% 16%

Measurement Design Used

The measurement design use simple observational

method (no control or comparison groups and no

randomization of case units made).

2 14% 3%

The measurement design used comparison groups

but lacks random assignment.

11 79% 17%

The elements of experimental design are present

with control and comparison groups and random

assignments made.

1 7% 1.5%

105

Scope of Variables Measured

Considering the individual items of the rubrics, with regards to the scope of

variables measured, the majority (71%) of the ECB cases that measured ECB

outcomes focus on individual evaluation capacity such as evaluation awareness,

knowledge, skills and attitudes. Less than 30 percent of the cases measured outcomes

that pertain to organizational evaluation capacities. In this rubric item, the researcher

has made an error in assuming that ECB ―awareness‖ is in the same category as

evaluation ―knowledge, skills and attitudes‖. In the following analysis, particularly in

the analysis of the ECB content topics for Research Question 2, it is shown that

―evaluation readiness and awareness‖ appear to be of a separate sub-domain distinct

from knowledge, skills and attitude. This reveals an error in the rubric item

development, - wherein a single item description covers multiple characteristics. In

this case, there should have been a separate item, so that each item in the rubric

criterion could carry a singular description or idea. Nevertheless, the scope of

variables being measured in practice shows an inclination to measure individual

evaluation capacity variables.

Obtaining Evidence, Reliability of Measures and Representativeness

Of the ECB cases that reported their ECB measurement outcomes, 50

percent used predominantly indirect measurement approaches, such as self-reports

and self-rating to obtain evidence. The remaining 50 percent used a combination of

these indirect measurement along with direct testing or observation. Frequently, the

scenario was as follows: before or after an ECB workshop participants were asked to

complete survey questionnaires. The surveys conducted prior to an ECB activity

usually focused on obtaining baseline information regarding evaluation knowledge,

106

skills and attitudes of the participants. Immediately after the ECB activity, the

participants were asked to rate the positive ―change‖ they believed they had gained as

a result of the ECB activity. Descriptions of these ―evaluations‖ of ECB included

feedback with regards to the training implementation, such as the degree of

satisfaction of the attendees, the relevance of the topics to their evaluation needs, and

the quality or effectiveness of the ECB facilitators. This implies that this approach of

evaluating an ECB is only limited to the ECB workshop, the training or the

engagement that was carried out. While this approach has its merits, it appears to be

myopic and focused only on the ECB activity, rather than an evaluation of the

programmatic concept of an ECB as an intervention.

Only 21 percent of the ECBs that reported measurements provided reliability

measures of their measurement tools. The remainder did not report the reliability for

their measures. Often, these measurement tools were developed by the ECB

practitioners. This is an indication that the measurement practice with regards to the

quality of the measurement tool, often in survey questionnaire form, has not been

given emphasis.

All of the ECB cases that reported ECB outcomes measurement reported

their data using all or most of the responses of the ECB participants, based on surveys

conducted immediately after the ECB training or workshop. There were no cases that

used the sampling approach to gather data. In this respect, the ECBs performed

relatively well with regard to representativeness of the population of interest in their

reports.

107

Intended Utilization of ECB Measures and Timing of Measurements

Concerning the intended utilization of ECB measures, 86 percent sought to

measure the impacts of ECB, although ―impact‖ in this case only refers to the ―before

and after‖ effects of the ECB training. The information provided by ―timing of

measurement‖ data also supports this observation. Most measurements (64%) were

carried out before and after the training. Few cases (21%) extended measurement

activities longer after the training to ascertain the long term effects. These extended

measurements are most likely organized at the negotiation stage of ECB engagement

as these entail resource use and cost. Most ECBs end their ECB outcomes

measurements immediately after the ECB activities cease.

Measurement Design and Validity of Inference

The reports also indicate high competence of ECB practitioners with

regards to the proper use of statistical inference and measurement design. In most

cases, ECB measurements are limited to quasi-experimental and observational designs

as opposed to entirely experimental design (although one case managed to do this).

Most of the reports (78%) were carried out with sound measures and statistical

procedures that warranted statistical inference. The remainder was able to provide

descriptions of evaluation capacities backed up by figures and even extended to use

statistical comparisons.

Overall Rigor of Measurement

Table 4.18 provides a summary of scores of the rigor of measurement

practice in ECB using the criteria presented above. Using the rubrics developed in this

study to rate the performances of these ECB outcomes measurement, the weighted

108

means of the scores were calculated. The score bands used are defined as: Low: 1 –

2.33; Moderate: 2.34 – 3.66; High: 3.67 – 5; to aid interpretation, although a potential

problem here is the arbitrariness of the cut-off scores and its assigned qualitative

description. However, the purpose here is to provide an objective assessment using

the rubrics for the relative performances of the ECB to evaluate the weighted scores

against logical progression from low to high. The weighted means reveal that in the

identified criteria and the levels of rigor, these are mostly at the moderate level. Only

the ―representativeness‖ of the measurements scored high. The performances are low

for ―scope of variables measured‖ and the ―reliability of measurement tools‖. These

results imply that aside from the fact that few of ECB practitioners reported ECB

outcomes measurement, the quality of how measurements are carried out needs to

improve.

Table 4.18 Rigor of ECB Measurement Practice

Criteria Mean Score

Scale: 1 to 5

Level of Rigor

Representativeness 3.67 High

Validity of inference 3.61 Moderate

Utilization of measures 3.42 Moderate

Timing of measurement 2.76 Moderate

Obtaining evidence 2.67 Moderate

Measurement design 2.57 Moderate

Scope of variables measured 1.61 Low

Reliability of measurement tools 1.52 Low

Scale: Low: 1 – 2.33; Moderate: 2.34 – 3.66; High: 3.67 - 5

109

Answer to Research Question 1B


Rigor of measurement practice is defined in terms of the criteria established

by good measurement practices. The approach to answer this research question was to

develop rubric items for each of the criteria and then subjecting the developed

instrument to a process of ensuring content validity and acceptable reliability measure

(Cronbach‘s Alpha = 0.83).

Measurement results reveal that for the subset of ECBs (22%) that measure

their ECB outcomes, the scope of variables measured were mostly at the individual

capacity level outcomes. Results further show that in this small subset of ECB cases,

there is a need to be transparent with the psychometric properties of the instruments

being used. Representativeness ranked high in that most ECB measures involved most

of the participants in ECB. In terms of validity of inference and utilization of

measures the ECB initiatives performed moderately high. Overall, there is some level

of acceptable rigor in terms of how the ECB outcomes measurements were carried

out. However, there is room for improvement in identifying the scope of the variables

being measured which is strongly linked to the ECB content construct as well as

establishing the psychometric properties of the measurement tools used.

Research Question 1C


The ECB cases investigated in this study can be categorized into two: those

that measured ECB outcomes and those that did not measure ECB outcomes. About

110

two-thirds of published ECBs did not report outcomes measurement. To answer the

question, ―Why are ECB outcomes not measured?‖ one approach would be to explore

which possible predictor variables influence measurement practice.

Having taken stock of ECB profile characteristics such as content,

implementation and context variables as prospective independent variables, the

decision to measure or not to measure ECB outcomes can be considered as the

dependent variable. This dependent variable, that is, the practice of measurement:

those that measured ECB outcomes or those that did not, is a dichotomy. In this case

the Binary Logistic Regression analysis allows for the possible examination of

relationships with a binary dependent variable and numerical or categorical predictor

variables. The aim of the analysis is to determine which variables determine the

decision to measure ECB.

The Binary Logistic Regression results on predictor variables are shown in

Table 4.19 for publication profile variables, Table 4.20 for the ECB context variables

and Table 4.21 for implementation variables. Preliminary inspection showed that

these independent variables were correlated. To remedy this situation, the analysis

was conducted using the simple binary logistic regression approach for each variable

to identify which variables showed significant influence. Binary Logistic Regression

needs sufficiently large sample size to predict well; this study did not meet that

requirement. Thus, the resulting predictions may not be as robust. The intent of this

analysis is to determine the potential influencing variables rather that providing a

robust prediction model. The predictive aspect of the model could be improved as

additional sample cases could be added in the future.

111

From the publication profile variables (Table 4.19), the analysis reveals that

the propensity to measure ECB outcomes is independent of the journal in which these

reports are published. These publication variables, the year and country, were

included in the analysis to detect possible temporal and geographical trending that

could possibly influence to decision to measure. The journals were grouped into two

categories, those that focus on evaluation and those from other disciplines that

reported ECBs. This means that journal type is unrelated to, and does not influence,

reports on ECB outcomes measurement. This shows that there is no bias in where an

ECB is published – evaluation or non-evaluation journal.

Similar conclusions can be made for the year of publication and country in

which the ECBs are published. There is no evidence that earlier or later published

ECBs were more likely to measure ECB outcomes. Thus, there is no discernible trend

with respect to time regarding ECB measurement practice. Unlike the rise of ECB

practice, measuring ECB outcomes appears to be independent of that trend. On the

other hand, ECBs published in the U.S. and outside the U.S. make no distinction with

respect to the prospect of measuring ECB outcomes as well.

Table 4.19 Simple Logistic Regression Analysis: Publication Profile and

Decision to Measure

Predictor Variable B Wald 2 P Exp(B) Prediction

Base %:77.8

Journal (Evaluation/Non-evaluation) 0.592 2.217 0.137 2.591 NS

77.8

Year (1978 to 2013) -0.097 3.506 0.061 0.908 NS

81.0

Country (US/Outside US) -1.159 1.996 0.158 0.314 NS

77.8

NS: not significant

112

Table 4.20 Simple Logistic Regression Analysis: ECB Context Profile and

Decision to Measure


Base %:77.8

ECB Domain1

Education 0.231 0.131 0.718 1.259 NS 77.8

Health 1.045 2.748 0.097 2.842 NS 77.8

Community Development -1.325 1.471 0.225 0.266 NS 77.8

Research -20.08 0.000 0.999 0.000 NS 77.8

Child and Youth Development 0.938 0.938 0.333 2.556 NS 77.8

Type of Organization2

Non-profit -1.273 0.742 0.389 0.280 NS 77.8

Government -1.609 1.079 0.299 0.200 NS 77.8

School or School District -0.847 0.290 0.590 0.429 NS 77.8

University3 - 1.429 0.839 - 77.8

For-profit -21.20 0.000 1.000 0.000 NS 77.8

Type of Program Delivery

Services 0.875 1.122 0.290 2.400 NS 77.8

Education/Capacity Building -0.300 0.228 0.633 0.741 NS 77.8

Advocacy 0.192 0.066 0.797 1.212 NS 77.8

Research -0.931 0.707 0.401 0.394 NS 77.8

Number of organization 0.875 1.943 0.163 2.400 NS 77.8

Number of programs 0.192 0.084 0.773 1.212 NS 77.8

Number of sites -0.875 1.185 0.276 0.417 NS 77.8

Affiliation of facilitators

University 0.463 0.401 0.527 1.589 NS 77.8

Private Consultancy 0.486 0.274 0.600 1.625 NS 77.8

Interval evaluation unit 0.140 0.047 0.828 1.151 NS 77.8

Methodological paradigm

Qualitative -21.89 0.000 0.356 0.997 NS 88.9

Quantitative 3.157 12.350 0.000 23.50 ** 85.7

Combined Methods 2.175 9.302 0.002 8.800 ** 81.0

1: Multiple response, analyzed separately; 2: Mutually exclusive: single analysis, categorical case; 3:

parameter base of a single categorical model, no estimate output; NS: Not significant; **: significant

at 0.01 level of significance.

In a similar manner, Table 4.20 presents the simple logistic regression

analysis of the decision to measure ECB outcomes with respect to ECB context

variables as regressors. ECB characteristics such as domain, type of organization, type

of program delivery, number of organization, number of programs, number of sites

and affiliation of facilitators are found to be independent of the likelihood to influence

113

decision to measure ECB outcomes. Not surprisingly, the methodological paradigm of

an ECB report influences this likelihood. Upon closer inspection, quantitative

methods and combined methods are most likely to provide ECB outcomes

measurement. The binary regression model can correctly predict 85.7 percent and 81

percent of the cases, respectively. The qualitative methods, although not a significant

predictor can provide correct prediction 88.9 percent of the time in the opposite

direction (negative Beta, -21.89) which means that qualitative reports are most likely

not to report ECB measurements.

Table 4.21 Simple Logistic Regression Analysis: Implementation and Decision

to Measure

ECB Delivery B Wald 2 P Exp(B) Prediction

Base %:77.8

Teaching strategy

Direct Teaching 0.731 1.348 0.246 2.077 NS

77.8

Participatory -20.47 0.000 0.998 0.000 NS

77.8

Combined Methods 1.527 5.705 0.017 4.606 * 77.8

Mode of strategy

Face-to-face -0.575 0.883 0.347 0.563 NS

77.8

Face-to-face and other modes 0.831 1.821 0.177 2.296 NS

77.8

Other modes -20.01 0.000 0.999 0.000 NS

77.8

Contact duration -0.333 0.237 0.626 0.717 NS

76.7

Intended target change

Individual change -0.724 1.385 0.239 0.485 NS

77.8

Organizational change -19.99 0.000 0.999 0.000 NS

77.8

Combined 0.916 2.177 0.140 2.500 NS

77.8

Participant focus

Staff -2.241 5.763 0.016 0.106 * 81.0

Managers 0.300 0.228 0.663 1.350 NS

77.8

Leadership 0.670 0.731 0.392 1.955 NS

77.8

Beneficiaries 0.329 0.262 0.608 1.389 NS

77.8

NS: Not significant; *: significant at 0.05level of significance.

114

The implementation variables of ECB reveal some significant findings.

Teaching strategy matters when it comes to predicting the likelihood of measuring

ECB outcomes. Recall that when the ECB cases were grouped into three mutually

exclusive subgroups with respect to teaching strategy, they divided almost equally

(Figure 4.10). The binary logistic regression analysis reveals that implementation

characteristics influence the likelihood of the measurement of ECB outcomes. In

particular, those that approach teaching ECB as an eclectic approach of both direct

training and participatory approaches (intersection of the circles in the Venn-diagram)

are more likely to measure ECB outcomes compared with participatory alone or direct

training alone.

Figure 4.10 ECB Teaching Strategies

Participant focus is another implementation characteristic associated with

the measurement of ECB outcomes. ECBs that focus on ―staff‖ evaluation capacity

development are more likely to measure ECB outcomes. ECBs with other participant

focus categories such as ―program managers‖, ―leaders‖ and ―beneficiaries‖ do not

appear to influence the decision to measure ECB outcomes. Thus, ECB cases that

(Direct Training Participatory)‘

Direct

Training

19

Participatory

20 20

115

have the program staff as a participant focus of ECB teaching are most likely to

present measurement of ECB outcomes, compared with programs that include

leadership, managers and beneficiaries.

Answer to Research Question 1C

What determines the practice of measurement in ECB?

The binary category of the decision to measure ECB outcomes: those ECB

cases that measured and those that did not, allows for binary logistic regression

analysis to examine what variables determine this practice. Findings reveal that

―methodological paradigm of the ECB report‖ and ―participant focus‖ are likely

determinants that influence measurement of ECB outcomes. In particular, those ECB

practices that focus on combined teaching strategy and teaching program staff have a

higher likelihood of measuring their ECB outcomes. Methodological paradigm as a

determinant suggests that ECB cases using combined methods of quantitative and

qualitative and those that use solely quantitative methods have a high propensity to

measure ECB outcomes.

Answer to Research Question 1: How can ECB measurement practice be

described from empirical evidence?

ECB measurement practice can be described by first understanding the

context, content and implementation characteristics of the ECB initiatives. ECB

content and implementation tended to focus more on individual evaluation capacity

building compared with building organizational evaluation capacity. Only a few ECB

cases focused on organizational evaluation capacity building that would include

116

building systems, processes, support, and evaluation culture. ECB implementation

approaches used direct training, indirect participatory learning or a combination of

both. Most ECBs targeted individual evaluation capacity change compared with

organizational evaluation capacity change, and mostly involved program staff and

managers compared with a few that involved the organization leadership and program

beneficiaries.

Overall, ECBs tended to overlook the measurement of ECB outcomes. Only

22 percent of the examined cases measured ECB outcomes. Of those that measured

ECB outcomes, there was some level of acceptable rigor with regard to how the

outcomes measurements were conducted. However, there is room for improvement in

identifying the scope of the variables being measured which is strongly linked to the

ECB content construct as well as establishing the psychometric properties of the

measurement tools used. Those that measured ECB appeared to be influenced by the

methodological paradigms to which the practitioners adhere.

For ECB and evaluation practice, these results mean that when it comes to

ECB measurement, not much groundwork has been laid. The ECB practice has to

move from individual and program level focus to organizational level focus if the aim

of ECB is to effect organizational change with respect to making evaluation

mainstream and improve organizational outcomes. While ECB program theory

assumes both the roles of individual level and organizational level as targets of the

learning intervention, in practice, evidence suggests that ECBs are skewed to

individual level focused.

117

Research Question 2

Is there evidence to demonstrate that:

Research Question 2A: ECB content follows as unified learning construct

and a possible progressive structure?

Research Question 2B: ECB content could be grouped in specific ways?

In order to understand how ECB should be assessed as a learning

intervention, it is important to explore the construct that is ECB. That is, if ECB is a

single construct then all the ECB content topics delivered during an ECB initiative tap

into this construct. The term construct in this study refers to the concept of evaluation

capacity that could not be directly observed but can be explained by observable

phenomena – such as the ECB content topics that practitioners assumed to be the

components that demonstrate evaluation capacity. It could also be possible that this

ECB construct is a higher order construct which suggests that it may have several

domains that constitute it.

Therefore, this study attempts to examine two aspects with respect to ECB

learning content. First, it investigates whether the ECB content topics as delivered in

practice follow a unified learning construct. This is carried out by using the Item

Response Theory (IRT) analysis which will quantitatively reveal whether the ECB

topics identified from the ECB sample data adhere together as one latent entity, that

is, the unobservable unified concept that can be inferred from observed indicators.

The IRT analysis could further reveal whether these content topics form a hierarchical

or progressive structure in terms of ―difficulty‖. Second, it is possible that the ECB

topics could be grouped in specific ways such that they constitute sub-domains of

ECB construct. This can be revealed by employing exploratory factor analysis. This

118

quantitative technique can assist in determining whether the ECB content topics

examined can be organized into several factors or components.

Item Response Theory (IRT) Analysis

Central to the idea of IRT is the assumption of unidimensionality. This

analysis runs by examining the empirical data to ascertain whether they fit this

assumption. Hence, data in the form of items are examined statistically to whether

they fit the assumed model. In this study, the items are the ECB content topics

documented and coded for the analysis. Furthermore, another important notion in IRT

is the concept of ―difficulty‖. The analysis could organize the items that fit into a

latent construct into hierarchical order of difficulty (Hambleton & Swaminathan,

1985), however, in this research context, it is used to describe the ECB content topic

―level‖ in terms of what is most likely to be delivered in an ECB activity, given the

ECB practitioner‘s ability, among other factors, to deliver such content topic. For ease

of reference, the term item difficulty as a common terminology in IRT will be referred

to as ―ECB developmental proficiency‖ in this study. The Item Response Analysis in

this study uses the Rasch Model on which empirical data can be examined and

confirmed if each ECB content topic fits into a single latent construct.

The approach to this analysis begins with the identification of ECB topics

that were delivered in an ECB activity as reported by the ECB cases examined in this

study. The checklist of ECB content topics that determines which ECB cases

delivered such topics forms a matrix of binary data set. A binary data set is simply a

set that could assume the values of either Zeros or Ones: One representing ‗yes‘ while

Zero represents ‗no‘. The resulting record of Zeros and Ones could be considered as

119

the ‗response matrix‘ of ECB cases with respect to ECB content topics delivered. This

situation precisely allows for the use of Item Response Analysis. The analysis can

yield two things: (1) the determination whether the content topics consist of a single

construct through an analysis of the fits statistics and (2) the ordered ranking of the

ECB topics in a developmental continuum through the IRT ―person-item‖ map

referred to here as ―ability-proficiency‖ map.

The analysis yields estimates of an ordered ranking of ECB content topic by

levels of ―ECB developmental proficiency‖ and levels of ―ability‖ for an ECB case,

representing practitioners‘ ability to deliver the topic. This estimation approach is

considered to be an advancement over the traditional item difficulty and population

ability estimation in that the estimates are mapped into the same scale and units

(Hambleton & Swaminathan, 1985). The ―ability-proficiency‖ graph would provide

information as to what ECB topics are of low or high order in a progression scale with

respect to ECB practitioner‘s ability to deliver. Thus, the analysis results provide a

picture of how the overall ECB content delivery performs with respect to these ECB

content topics.

Figure 4.11 shows the IRT analysis ability-proficiency graph. This graph is

conventionally termed ―ability-difficulty‖ distribution on a logistic scale. The ―ECB

content topics‖ refers to the ―items‖ while the population displaying the ―ability‖ in

the scale refers to the ECB cases in the sample, representing ECB practice. This

graph provides a picture of how ECB cases perform (ability) in relation to an ordered

level of ―developmental proficiency‖ of the ECB topics (items). The items are

positioned on the right side of the graph. The left side of the graph shows the

horizontal bars formed by Xs which represent the ECB cases.

120

Figure 4.11 ECB Cases and ECB Developmental Proficiency

Low Level Topics

Moderate Level Topics

High Level Topics

121

The graph provides information not only with regards to how the content

topics are scaled in levels of proficiency, but also with regards to the relative count of

ECB cases that have the abilities to perform at this level. It should be noted that the

scale ranges from -3 to +2 for both content topics and ability score: those topics that

are easiest to deliver are those approaching -3, while those that are most difficult to

deliver are approaching +2.

To interpret the meaning of the graph, the position at a particular score

indicates which topics in the ECB cases have a 50 percent chance of being delivered.

For example, the ECB cases with ability score ―0‖ means these are ―50 percent likely

able to deliver ECB content topic numbers: 8, 7, 3, 2, 5, 6 and 4‖ (See Table 4.21 for

item numbering reference). Thus, another way to interpret is that: ECB cases have

less than a 50 percent chance of delivering ECB content topics positioned above a

particular ability score. Content topics positioned below a particular ability score

means that ECB practitioners, for that ECB case, have high probabilities of delivering

those topics. To simplify interpretation, those ECB cases that fall within the bands

―high level topics‖, ―moderate level topics‖ or ―low level topics‖ are the ECB cases

that have about 50 percent chance of attaining such levels of ECB developmental

proficiency.

In Figure 4.11, the ECB content topics are organized in such a way that they

are ordered in terms of progressive proficiency level independent of the ECB cases

performance levels. This means that the topics can actually be grouped arbitrarily

independent of the population distribution on the left side of the graph. There is no

hard and fast rule in these groupings, however, considering the middle interval of the

scale between – 1 to +1 as the moderate group, the items or ECB topics can be

organized to three groups of ―high level‖, ―moderate level‖ and ―low level‖

122

proficiency ECB content topics. This is not an indication of popularity or the most

common topics, as frequency counts can readily provide. Instead, it indicates the

estimated ―intrinsic‖ proficiency characteristics of the topics independent of whether

or not these are popular of not. Table 4.22 gives the clustering list of the difficulty

levels of the ECB topics.

Table 4.22 Topic Number List and Levels of Developmental Proficiency

Topic

Topic

Number

Reference

High Level

Evaluation readiness and willingness 10

Creating or strengthening evaluation policy requirements 13

Organization evaluation practices 9

Building leadership support 11

Creating or strengthening support for evaluation resources 16

Moderate Level

Improving organizational evaluation social network 17

Evaluation awareness and attitudes 1

Building culture for evaluation 12

Creating or strengthening evaluation structure 14

Creating or strengthening evaluation systems 15

Program implementation 8

Program planning 7

Low Level

Logic models 3

Evaluation terms, approaches or methods 2

How to do an evaluation 5

Data management, analysis, interpretation or use 6

Evaluation Plan 4

This simple ordering of ECB content developmental proficiency with respect

to ECB practitioners‘ ability to deliver may provide an important developmental

progression structure of the ECB content being delivered. This is significant in the

sense that ECB learning outcomes expected and the competencies to be gained by the

ECB participants can be similarly structured. In this respect, ECB measurement

123

practice could be structured in a similar manner. This result may imply that there is a

possible structuring of the developmental progression on which existing ECB

outcome measurement tools can be positioned. This will allow practitioners to

examine whether there is a good representation in what were delivered and measured

in ECB with respect to this developmental progression scale.

The shape of the distribution of the ECB cases with respect to ECB

developmental proficiency continuum in Figure 4.11 reveals some notable findings.

Rotating the figure horizontally, the plot of the X‘s representing ECB cases included

in the study approximates the classic bell-shaped but slightly positively skewed

distribution. Judging from its position along the scale, the majority of ECB cases fall

within the logistic scores -1 and below which corresponds to the low level ECB topics

in the developmental proficiency continuum. This means that most of the ECB cases

delivered were at this low level category only. The distribution appears to be

positively skewed, however with sufficient data it could possibly extend to the high

level position in the scale filling the distribution at the positive end. It is also

discernible that this distribution is possibly bi-modal, or having two modes, as

indicated by the rise at the positive end. This could be a super-imposition of two

normal curves, the bigger one straddling on the low level part of the scale, and a small

normal curve positiones between the moderate and high level scales. This indicates

that a small sub-group of ECB cases have focused on topics in the upper level of the

ECB developmental proficiency continuum.

The empirical data fits well with the Rasch Model in this Item Response

Analysis. Table 4.23 presents the mean square (MNSQ) fit statistics for each of the

ECB content topics. Results show that all items (ECB content topics) have MNSQ

values within the confidence intervals (T-values <|2|). Since the ECB content topics

124

are a combination of individual and organizational evaluation capacities, this result

seems to suggest that this divide blurs when developmental progression is considered.

Table 4.23 Item Mean Square Fit Statistics

Item

Weighted Fit

MNSQ CI T

High Level

Evaluation readiness and willingness (10) 1.07 (0.02,1.98) 0.3

Creating or strengthening evaluation policy requirements (13) 0.90 (0.19,1.81) -0.1

Organization evaluation practices (9) 0.89 (0.31,1.69) -0.2

Building leadership support (11) 0.86 (0.30,1.70) -0.3

Creating or strengthening support for evaluation resources (16) 0.94 (0.39,1.61) -0.1

Moderate Level

Improving organizational evaluation social network (17) 0.95 (0.45,1.55) -0.1

Evaluation awareness and attitudes (1) 1.11 (0.54,1.46) 0.5

Building culture for evaluation (12) 0.89 (0.58,1.42) -0.5

Creating or strengthening evaluation structure (14) 0.83 (0.58,1.42) -0.8

Creating or strengthening evaluation systems (15) 1.18 (0.65,1.35) 1.0

Program implementation (8) 1.04 (0.68,1.32) 0.3

Program planning (7) 1.14 (0.74,1.26) 1.0

Low Level

Logic models (3) 0.80 (0.78,1.22) -1.9

Evaluation terms, approaches or methods (2) 0.98 (0.77,1.23) -0.2

How to do an evaluation (5) 1.14 (0.75,1.25) 1.0

Data management, analysis, interpretation or use (6) 1.04 (0.75,1.25) 0.3

Evaluation Plan (4) 0.96 (0.69,1.31) -0.2

Note: Estimation using ConQuest Software: Chi-square Test of Parameter Equality = 468.71, df=16,

Sig level =0.000; Separation reliability = 0.965.

What is being measured in ECB?

As stated earlier, most of ECB cases in the study sample did not measure

ECB outcomes (Figure 4.12). This means that the answer to the question, ―What is

being measured in ECB?‖ refers to only a subset of the total cases under investigation.

In the first research question, the ECB content and implementation were described,

with the answers derived from the majority of cases. The details of ECB content and

the way ECBs perform in delivering that content were also examined. The primary

125

reason for clarifying about content details is to connect ―what has been delivered‖ and

what ―learning outcomes‖ should be measured.

Figure 4.12 ECB Outcomes Measurement

Learning outcomes are referred to as ECB outcomes, as distinct from the

program outcomes of organizations. It is imperative to distinguish between these two

programmatic outcomes. ECB outcomes refer to outcomes of ECB as an intervention,

as opposed to program outcomes from various intervention programs that

organizations deliver. Ultimately, the improvement of organization program outcomes

is the end goal for which ECB intervention is believed to be a solution. It is

unfortunate that data is meager when it comes to reports of measurements of ECB

outcomes in this study sample. Nevertheless, making use of the available information,

it is possible to describe what is being measured in ECB, based on the small sample.

Table 4.24 documents the reported measurement of ECB outcomes. ECB

practitioners provide these reports as an evaluation of their ECB initiatives. In the

sample, there were 14 ECB cases that reported their ECB outcomes. For definitional

clarity, measurement here refers to the ―changes‖ in various metrics defined and

22%

78%

Measured ECB

outcomes

Did not measure

ECB outcomes

126

observed by those who reported their ECB activities. Often, these measures are self-

reports by participants carried out almost immediately after an ECB training or

workshop.

Table 4.24 is called a ―presence-absence‖ matrix of reported ECB outcomes

with respect to ECB content topics organized in the Item Response Analysis

(Assessment and Learning Partnerships, 2012). It can be recalled from the previous

section that these content topics are organized according to levels of developmental

proficiency. The matrix identifies with a ―1‖ the ECB outcomes reported to be

measured in relation to the organized ECB content topics. The first column of the

matrix that contains ECB content topics serves as reference to match the ECB

outcomes reports. In a sense, this matrix confirms ECB outcomes measurement

validity; that is, it provides evidence that what was taught or delivered has been

measured. There would appear to be a dissonance when ECB practitioners teach or

deliver something and measure other things.

The findings reveal that the pattern of what is being measured in ECB

actually follows the pattern of the ranked ECB content developmental proficiency

(Table 4.22). That is, reported ECB measurements correspond to the difficulty pattern

produced by the Item Response Analysis of the ECB content topics. This is verified

by employing a tool called the Guttman chart (Table 4.25). The Guttman chart is

readily produced by ranking the total scores of the ECB cases. In this study sample,

the Guttman chart produced the classic ―diagonal pattern‖ when the ECB content

developmental proficiency ordering was used as a reference. This implies that the

reported ECB outcomes measurements had a similar developmental progression

structure. It shows how easy topics are most likely to be measured in practice while

the difficult ones are less likely measured.

127

Table 4.24 Presence-Absence Matrix of Reported ECB Outcomes with Reference to ECB Content

ECB Content Delivery1 Measured by Case Units during or after ECB

2 (N=14, 22% of Cases)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Total

High Difficulty Level


Creating or strengthening evaluation policy requirements 1 1

Organization evaluation practices 1 1 2

Building leadership support 1 1 2

Creating or strengthening support for evaluation resources 1 1 1 1 4

Moderate Difficulty Level

Improving organizational evaluation social network 1 1 1 3

Evaluation awareness and attitudes 1 1 1 1 1 5



Creating or strengthening evaluation systems 1 1 1 3

Program implementation 1 1 1 3

Program planning 1 1 1 3

Low Difficulty Level

Logics models 1 1 1 1 1 1 1 7

Evaluation terms, approaches or methods 1 1 1 3

How to do an evaluation 1 1 1 1 1 1 6

Data management, analysis, interpretation or use 1 1 1 1 1 1 1 1 1 9

Evaluation Plan 1 1 1 1 1 1 1 1 1 1 10

Total 7 4 5 0 6 2 2 0 10 5 3 9 4 4 1: Overall ECB content delivered;organized using IRT analysis from 63 cases.

2: Using 14 cases that reported ECB outcomes measurement.

128

The matrix also shows how proficient ECB cases, that is, ECBs with high

total scores, have a tendency to measure more outcomes. It needs to be mentioned at

this point that the ECB content developmental proficiency was estimated from the

entire ECB cases sample used in this study, while the Guttman ordering uses only the

subset of sample that reported ECB measurement. The outcomes reporting is coded

independently of reported content. The Guttman chart result confirms two things.

First, it shows that there is some kind of validation of the Item Response Analysis

organization of the content topics. Note that the set of ECB content topics is shown to

be a single construct and that the ECB outcomes measurement is independent data.

Second, it shows that there is a match between what are measured with what are being

delivered. The conclusion that can be drawn from this is that practitioners that

measured ECB outcomes know what they are measuring. Most importantly, the

results established the notion that there is a possible developmental progression

structure of the ECB content delivery and hence, measurement practice should to

follow this developmental progression.

Additionally, in the Guttman chart (Table 4.25) it can be noted that there are

ECB cases with zeros and very low total scores. This does not mean that these cases

did not evaluate their ECBs. This means that the measurement metrics used in their

evaluation report of the ECB outcomes differ from what was assumed in this

investigation. The assumption was that ―changes‖ would be measured with respect to

the competencies identified through the content topics delivered. The zeros and low

total scores reveal that other methods are used in some ECB cases to evaluate ECB

outcomes.

129

Table 4.25 Guttman Ordering of Reported ECB Outcomes with Reference to ECB Content

ECB Content Delivery1 Measured by Case Units during or after ECB

2 (N=14, 22% of Cases)

9 12 1 5 3 10 2 13 14 11 6 7 4 8 Total

High Difficulty Level


Creating or strengthening evaluation policy requirements 1 1

Organization evaluation practices 1 1 2

Building leadership support 1 1 2

Creating or strengthening support for evaluation resources 1 1 1 1 4

Moderate Difficulty Level

Improving organizational evaluation social network 1 1 1 3

Evaluation awareness and attitudes 1 1 1 1 1 5



Creating or strengthening evaluation systems 1 1 1 3

Program implementation 1 1 1 3

Program planning 1 1 1 3

Low Difficulty Level

Logics models 1 1 1 1 1 1 1 7

Evaluation terms, approaches or methods 1 1 1 3

How to do an evaluation 1 1 1 1 1 1 6

Data management, analysis, interpretation or use 1 1 1 1 1 1 1 1 1 9

Evaluation Plan 1 1 1 1 1 1 1 1 1 1 10

Total 10 9 7 6 5 5 4 4 4 3 2 2 0 0 1: Overall ECB content delivered;organized using IRT analysis from 63 cases.

2: Using 14 cases that reported ECB outcomes measurement

130

As an example of other methods uses, some cases focused on the evaluation

of outputs produced as well as evaluation systems established, rather than measuring

individual evaluation capacity changes that ECB purports to bring. This finding

implies that some ECBs in practice evaluated ECB outcomes through holistic product

outputs, such as the portfolio products in authentic assessment approaches, rather than

looking at individual or organization competency ―change‖ targeted by the ECB

activity (Table 4.26).

Table 4.26 Some Process and Outcome Areas Measured in Reported ECBs

Intention to use learning at the workplace

Level of exposure to training and materials

Number of evaluations carried out after the training

Quality of evaluation reports after the training

Quality of technical assistance

Quality of topics and learning tasks

Training satisfaction of participants

Utilization of evaluation outputs

These other ways of considering ECB outcomes, and some processes, reveal

that ECB outcomes measurement is not limited to the view of documenting individual

and organizational change as defined by the ECB content being delivered (Table

4.26). These include other factors such as the technical aspects and details of the

training, the performance of the ECB implementation, and the delivery of teaching. In

addition, there are ECB measurements that focus on the evaluation products and the

quality of these products. Some cases did not report individual or organizational

131

changes but rather reported on the improvement in the number and quality of

evaluation outputs, and the utilization of these evaluation products. These findings

show that measuring ECB outcomes is as diverse as the ECB conceptualization.

Factor Analysis: Multidimensional Assumption

Factor analysis allows for the exploration of the possible dimensions that

may provide further information regarding ECB content construct. This analysis looks

into the possible independent ECB content sub-domains rather than simply grouping

them into individual or organizational content topics. This exploration has important

implications for the pedagogical aspects of delivering ECB. For example, being able

to know what aspects of ECB have to be delivered is as important as the approach of

delivering them progressively.

The first step in factor analysis is to determine whether the data are suitable

for data reduction and how many factors or dimensions the ECB content construct can

be extracted. The scree plot shown in Figure 4.13 is a technique that determines how

many possible latent factors or dimensions underlie this set of ECB topics. The

―elbow‖ of the scree plot, where the plot begins to taper off, suggests the number of

suitable dimension groupings. The result identifies four factors corresponding to the

elbow of the scree plot.

132

Figure 4.13 Scree Plot for ECB Content

The factor loadings are shown in Table 4.27. Factor loadings less than 0.40

are not displayed in the table to facilitate the identification of topic groupings. The

values of these factor loadings mapped onto components provide the reorganization of

the ECB content topics into groups. The factor analysis details are shown in the

footnote of Table 4.27, satisfying sample adequacy index (Kaiser-Mayer-Olkin =

0.60) and factorability criterion (Bartlett‘s Sphericity p-value = 0.001). This means

that even though the sample size was small, it satisfied the sample size statistical

criterion for factor analysis. It also shows 35 percent correlation residuals above 5

percent. The Maximum Likelihood extraction method was employed with Direct

Oblimin rotation. The Direct Oblimin rotation was used with the assumption that the

factors (or sub-dimensions) in the ECB construct could be correlated (de Winter &

Dodou, 2012).

133

Table 4.27 Structure Matrix of the ECB Content Using Maximum Likelihood Method of Extraction and Direct Oblimin Rotation

Preliminary Factor Groupings and ECB Topics

Factor Loadings

1 2 3 4

Teaching Individual Evaluation Capacities

Evaluation awareness and attitude

Evaluation terms, approaches, or methods 0.466

Logic models 0.403 0.638

Evaluation plan 0.538


Data management, analysis, interpretation or use 0.486

Program planning 0.967

Program implementation 0.844

Addressing Organizational Evaluation Capacities

Organization evaluation practices 0.686

Evaluation readiness and willingness 0.450

Building leadership support 0.525 0.557

Building culture for evaluation 0.995

Creating/strengthening evaluation policy requirements 0.595

Creating/strengthening evaluation structures 0.532 0.639

Creating/strengthening evaluation systems 0.588

Creating/strengthening support for evaluation resources 0.472

Improving organizational evaluation social network 0.922

Note: Empty cells are coefficients with less than 0.400; Factor analysis using Maximum Likelihood Method and Oblimin Rotation; Kaiser-Meyer-Olkin:0.592; Bartlett’s

Test of Sphericity p-value: 0.000; Correlation residuals greater than 0.05: 35%; Cumulative % of Variance: 55.44%

134

The set of items for which the first domain has strong loadings can be

identified as ―Developing Evaluation Culture and Practices‖. These are the topics that

addressed organizational evaluation capacity building in terms of ―building culture for

evaluation‖ and ―organization evaluation practices‖. These are the leading items with

high factor loadings, 0.99 and 0.68 respectively.

The analysis results of the ECB content items could be organized into

recognizable sub-domains as shown by the extracted factors. The factors will be

referred to as sub-domains or simply domains to highlight the notion that these are

ECB content topics delivered in practice. Although there are other items this factor

loads into, these are shared by other domains. This is expected as Table 4.28 shows

the correlation matrix among the domains.

Table 4.28 Factor Correlation Matrix

Factor Correlation Matrix

Factor 1 2 3 4

1 1.000 .082 .325 .339

2 .082 1.000 .004 .146

3 .325 .004 1.000 .039

4 .339 .146 .039 1.000

Extraction Method: Maximum Likelihood.

Rotation Method: Oblimin with Kaiser Normalization.

The second domain could be named ―Managing Programs‖. These are topics

that pertain to ―Program planning‖ and ―Program implementation‖ with very high

factor loadings (0.97 and 0.84 respectively). Although this sub-domain is a couplet,

that is a factor with only two representative items, this sub-domain appears to be

independent, having low correlations with all other sub-domains. It can be deduced

from this result that, in practice, it has been an integral part of evaluation capacity

building initiatives, the topics dealing with the way organizations plan and run the

135

programs they are handling. This supports the notion that program design,

development and implementation are all considered necessary aspects in the field of

evaluation in addition to program evaluation topics.

The third sub-domain could be identified as ―Institutionalizing Evaluation‖.

This includes topics such as ―Improving organizational evaluation network‖ (0.92),

―Creating or strengthening evaluation structures‖ (0.64), ―Creating or strengthening

evaluation policy requirements‖ (0.59), ―Creating or strengthening evaluation

systems‖ (0.58), ―Building leadership support‖ (0.56) and ―Creating or strengthening

support for evaluation resources‖ (0.47). It has to be noted that two topics have

moderate factor loadings from the first sub-domain ―Developing Evaluation Culture

and Practice‖. These are ―Building leadership support‖ and ―Creating or strengthening

evaluation structures‖. This shows that leadership and evaluation structure in an

organization are both indicators of evaluation culture and practice and organizational

social context for evaluation.

The final sub-domain could be considered ―Building Evaluation Knowledge,

Skills and Readiness‖. This is the group of ECB topics that includes basic evaluation

knowledge such as ―Logic models‖ (0.64), ―Evaluation Plan‖(0.54), and ―Data

management, analysis or use‖ (0.48). ―Evaluation readiness and willingness‖ (0.45) is

an indicator of this sub-domain as well. There was an expectation that ―Evaluation

awareness and attitude‖ could belong in this cluster, however, this item has no factor

loadings from any of the sub-domains identified. This is also true for ―How to do an

evaluation plan‖. A possible explanation might be that these ECB topics are mostly

present in all ECB initiatives and do not give much information. From this, it could be

interpreted that ECB initiatives commonly recognize the need for ECB and that the

purpose of ECB is how to do an evaluation.

136

These dimension groupings of ECB content topics are somewhat

subjective. Although this data set satisfies the sample adequacy criteria, the sample

size is still a limitation compared with the ideal sample size (at least 15 cases for each

item, 17x15 = 255). Factor analysis can be highly sensitive with regards to sample

size; that is, factor loadings could change easily as new samples are added. There is

also the subjectivity of the choice of factor extraction and rotation techniques that

yield quite different results. For this study, the factor analysis procedure fortunately

yielded an interpretable result. However, it does not mean that these dimension

groupings are fixed and final. The usefulness for this analysis is that there is

something to work on when comparing the frequency levels of the topic contents

being delivered during ECBs.

Table 4.29 provides a summary list of these ECB content construct sub-

domains. While the Item Response Analysis has provided a strong case that these

topics tap into a single higher order construct and has shown that ECB content topic

delivery could be structured as a learning progression, exploratory factor analysis

demonstrated that there are discernible sub-domains. These findings show that it

could be possible that for high order constructs such as ECB content that fits a Rasch

model, sub-domains could exist. Table 4.29 also demonstrates the mapping of these

topics to the IRT hierarchy, indicating that some patterns of domain progression can

be observed. For example, in the sub-domain ―Developing Evaluation Culture and

Practices‖, there is a good distribution of low level to high level topics. The couplet

―Skills in Program Management‖ topics are both moderate levels while the

―Evaluation Knowledge, Skills and Readiness‖ domain has mostly low level topics

apart from evaluation willingness and readiness as a high level. ―Creating Evaluation

Network, Systems and Structures‖ has topics from moderate to difficult.

137

Table 4.29 Sub-domain Groupings for ECB Content and IRT Hierarchy

Classification

ECB Content Sub-domains, Topics and IRT Hierarchy Classification

Domain 1: Developing Evaluation Culture and Practices

Building culture for evaluation (Moderate Level)

Organization evaluation practices (High Level)

Creating or strengthening evaluation structures* (Moderate Level)

Building leadership support* (High Level)

Logic models* (Low Level)

Domain 2: Managing Programs

Program implementation (Moderate Level)

Program planning (Moderate Level)

Domain 3: Institutionalizing Evaluation

Improving organizational evaluation social network (Moderate Level)

Creating or strengthening evaluation structures* (Moderate Level)

Creating or strengthening evaluation policy requirements (High Level)

Creating or strengthening evaluation systems (Moderate Level)

Building leadership support* (High Level)

Creating or strengthening support for evaluation resources (High Level)

Domain 4: Building Evaluation Knowledge, Skills and Readiness

Logic models* (Low Level)

Evaluation plan (Low Level)

Data management, analysis, interpretation or use (Low Level)

Evaluation terms, approaches, or methods (Low Level)

Evaluation readiness and willingness (High Level)

* Topics that overlap between sub-domains; Two topics have no factor loadings from any of the sub-

domains: “Evaluation awareness and attitudes” and “How to do an evaluation”

138

Using these ECB sub-domains as new categories for grouping ECB topics,

Figure 4.14 shows the relative performances of the reported ECBs in terms of delivery

frequencies on these categories as weighted means, scaled 1 to 5. It should be

recalled that the initial groupings of these content topics are the broad individual and

organizational evaluation capacity categories. Factor analysis for ECB topics forced

on two factors does not confirm this individual ECB versus organizational ECB

divide. This means that ECB content construct organizes itself more on these four

identified sub-domains rather than the concept held about individual or organizational

evaluation capacity building content and approaches.

Previous results have shown high frequencies among ECB initiatives for

topics on developing individual evaluation capacities compared with topics on

targeting organizational evaluation capacities. In practice, however, there is ambiguity

with regard to how these two divide or work together for an organization. The

reorganization of topics using the factor analysis groupings reveals another

perspective. Here, the ECBs can be evaluated with respect to the four identified

dimensions. Results show that ECBs deliver relatively high frequency on ―Building

Evaluation Knowledge, Skills and Readiness‖. This is followed by a moderate

delivery on teaching ―Managing Programs‖. ECB deliveries on sub-domains

―Developing Evaluation Culture and Practice‖ and ―Institutionalizing Evaluation‖

have low frequency. Figure 4.14 may again appear to show the divide between the

individual level content constructs and the organizational level domains but a closer

view of the topics shows that these are spread in the four domains, although Domain 3

all consists of organizational level topics.

What could possibly explain this distribution pattern of topics across these

sub-domains is not addressed in this investigation; rather, the purpose is to use these

139

findings in relation to understanding measurement practices in ECB. However, one

possible explanation is the nature of demand or need for ECB. It could be possible

that most organizations that engaged in ECB contracts perceived the individual

evaluation capacity building needs, in the form of evaluation management and

evaluation skills development, as high priority compared with restructuring

organizations for evaluation mainstreaming. Another possible reason could be the

belief that individual improvement in evaluation management and skills would

consequently translate itself into improvement of organizational evaluation capacity;

just passively allowing organizational mechanisms to make these translation by

themselves. Still, another possible reason is that those engaged with ECBs were not

really viewing ECB as a progressive learning intervention with several domains. The

Findings imply that ECB, in practice and content, is highly dependent on the

perceptions of both the organizations and the ECB practitioners.

Figure 4.14 ECB Content Sub-Domains Frequencies

1.65

2.05

2.53

3.54

1 2 3 4 5

Institutionalizing Evaluation

Developing Evaluation Culture and

Practice

Managing Programs

Building Evaluation Knowledge, Skills

and Readiness

Frequency of Reported ECBs

ECB Content Sub-domains

Low Moderate High

140

ECB Content and Decision to Measure

Revisiting Research Question 1C, ―What determines practice of measuring

ECB outcomes?‖ to include the results from IRT analysis and Factor analysis to

answer this research question, Table 4.30 shoes the regression analysis with respect to

ECB content characteristics and decision to measure.

Table 4.30 Simple Logistic Regression Analysis: ECB Content and Decision to

Measure


Base %:75.4 ECB Content

Developmental Proficiency

High Level Topics -0.097 0.060 0.806 0.907 NS

75.4

Moderate Level Topics 0.161 0.550 0.458 1.174 NS

75.4

Low Level Topics 0.463 3.865 0.049 1.589* 75.4

Construct Dimensions

Developing Evaluation Culture and

Practice

-0.209 0.030 0.862 0.811 NS

75.4

Skills in Program Management 0.339 0.243 0.622 1.403 NS

75.4

Creating Evaluation Social Network,

Structures and Systems

-0.653 0.202 0.653 0.520 NS

75.4

Building Evaluation Knowledge,

Skills and Readiness

2.849 4.711 0.030 17.274* 77.2

NS: Not significant; *: significant at 0.05level of significance.

The decision to measure ECB outcomes also appears to be influenced by

ECB content developmental proficiency. ECBs focusing on low level topics are most

likely to measure ECB outcomes compared with those that deliver high level topics.

With respect to ECB content sub-domains, the analysis also shows that these have

influence in the likelihood of measuring ECB outcomes, the dimension ―Evaluation

Readiness‖ shows a highly significant influence. This means that those ECB cases

141

that include content topics relating to ―Evaluation Readiness‖ are most likely to

measure ECB outcomes compared with ECBs that include all the other topics. It has

to be noted that very few cases (3 of 63) include this specific topic in their ECB

contents. From this, it could be interpreted that those who are aware of the readiness

and willingness aspects of the organization may have more acute awareness of the

measurement aspects of the ECB outcomes.

Answer to Research Question 2

Is there evidence to demonstrate that:

Research Question 2A: ECB content follows as unified learning construct

and a possible progressive structure?

Research Question 2B: ECB content could be grouped in specific ways?

The IRT result showed that ECB content demonstrates a unified learning

construct. Item Response Analysis also enabled the scaling of ECB content topics into

a continuum of developmental proficiency levels. The ability-proficiency plot for the

ECB content versus ECB case samples reveals that most ECB cases were distributed

in the lower end of the developmental proficiency continuum of ECB content topics.

An additional way to view the construct of ECB content topics was the

possibility of sub-domains within this construct. For this data set, factor analysis

revealed that ECB content topics can be organized into four sub-domains, these are:

―Building Evaluation Knowledge, Skills and Readiness‖, ―Managing Programs‖,

―Developing Evaluation Culture and Practice‖, and ―Institutionalizing Evaluation‖.

ECBs have nearly high frequency on delivering topics in ―Building Evaluation

Knowledge, Skills and Readiness‖, moderate performance on ―Skills in Program

142

Management‖ but low frequency in the areas of ―Developing Evaluation Culture and

Practice‖ and ―Creating Evaluation Social Network, Structures and Systems‖.

Overall, evidence suggests that ECB content topics delivered in practice fit a

developmental progression continuum. This result provides an additional framework

to anchor existing and future ECB measurement tools. There is also evidence to show

that ECB is a high order construct with sub-domains. The characteristics of these sub-

domains show that within each sub-domain, the patterns of topic hierarchy can also be

discerned. The sub-domain with the most ECB content topics in the high level ECB

developmental proficiency is ―Institutionalizing Evaluation‖ and the sub-domain with

topics mostly in the low level of ECB developmental proficiency is ―Building

Evaluation Knowledge, Skills and Readiness‖.

Finally, findings also reveal that other alternatives of measuring ECBs were

used, including documenting the qualitative changes in evaluation process and the

outputs of evaluation products, as well as documenting improvements in evaluation

systems, rather than looking at change in capacities as the result of ECB activities.

There is also an indication that in practice there is a clear distinction between ECB

outcomes and program outcomes. The measurements are clearly of ECB outcomes as

a result of ECB intervention, rather than of program outcomes on which the ECB

intervention hopes to influence.

Chapter Conclusion

This chapter has presented answers to the posed research questions based

on the analyses of the data results and on evidence relevant to the research questions.

For published ECB cases included in this study, noteworthy findings are as follows:

143

ECB content and implementation tended to be more focused on individual

evaluation capacity development compared with building organizational

evaluation capacity;

Teaching strategies are equally distributed among direct training, participatory

approach or a combination of both approaches;

ECBs mostly targeted individual capacity change rather than organizational

capacity change and mostly focused on program staff and managers but

excluded leadership;

The majority of cases did not report ECB outcomes measurement;

ECB content topics can be viewed as a developmental progression for

learning; there is a good fit to Rasch Model confirming content topics tapping

on a single construct on a progressive scale;

Four sub-domains for ECB content were identified: Building evaluation

knowledge, skills and readiness; Skills in program management; Developing

evaluation culture and practice; and Creating evaluation social network,

structures and systems;

An important caveat is that conclusions drawn for these research questions pertain

only to the characteristics of the sample and contexts described. In the concluding

chapter, a synthesis of these findings will be presented to discuss the meaning of these

findings to ECB and ECB measurement practice.

144

CHAPTER 5

SYNTHESIS AND CONCLUSION


This chapter aims to show the connection and relevance of the findings,

beyond data and evidence, to what have been identified as knowledge gaps in ECB in

the literature; and hopefully, to what it means in the practice of ECB and

measurement in ECB. The synthesis begins with a recall of these gaps and how

evidence from the findings of the research questions addresses these gaps. The

contribution of this study is demonstrated by showing the relevance of these findings

in the practice of evaluation capacity building, providing possible applications of the

study results. The chapter concludes with the limitations of the study and some

possible future research directions.

Contribution of the Study

The knowledge gaps identified in the literature were distilled in the two main

research questions: (1) how can ECB measurement practice be described from

empirical evidence? And, (2) is there evidence to demonstrate that ECB content

exhibits a unified learning construct and possibly follows a progressive structure? The

first question aimed to provide evidence of what measurement practice has occurred

in ECB, which in turn provided background and evidence for the second question.

From the perspective of this study, which is that ECB is a learning intervention,

several things need to be established: Is ECB a learning construct? Does ECB follow

a progressive structure? Does the ECB construct exhibit learning sub-domains? These

145

knowledge gaps are addressed by the study findings, the way they relate back to

literature and what they mean in practice.

ECB Practice and Measurement

In ECB practice and measurement, Labin et al. (2012) and Labin (2014)

pointed out that ECB practitioners need to improve in the area of measuring ECB

outcomes. Also, Labin (2014) identified in her work the existing ECB measurement

tools that could be mapped to her proposed and validated Integrative Evaluation

Capacity Building (IECB) model. The IECB model has been a breakthrough in the

ECB literature. It conceptualized the programmatic nature of ECB from a strong

empirical research base. In this sense, the model has been grounded strongly both

from practice and from evaluation program theory. However, measurement in ECB

has to be viewed beyond measurement tools that are used and mapped in the model

components. This way of seeing measures has its own utility (i.e. usefulness) and

importance, but to look at measurement of ECB from a program theory perspective

requires looking at measures that focus on the outcomes of what has been delivered in

ECB. This is perhaps the reason for the perplexing observation that ECBs have not

adequately measured ECB outcomes: ECBs have not yet clearly defined what

outcomes to look for and measure. While the IECB has explicated the ECB program

theory components, it has not clearly defined what is being measured in ECB. For

example, the outcomes component of the IECB model correctly identified the

individual, organizational and program outcomes that are linked to the needs/reasons

and ECB activities. However, it failed to identify whether these outcomes comprise a

unified learning construct or whether they several learning domains. That is, the

characteristics of these ECB outcomes were not clearly defined.

146

In the literature with regards to measurement in evaluation and measurement

in ECB, Braverman and Arnold (2008) and Braverman (2013) argued that any good

program theory for evaluation must recognize its accompanying implicit measurement

theory if evaluation is to demonstrate methodological rigor. Findings of this study

revealed that for ECBs that measured their ECB outcomes, this understanding of

methodological rigor was understood by ECB practitioners. This is also supported by

the findings in the assessment of “Rigor of Measurement Practice in ECB” performed

in this study. However, the area that needs to be sorted out in ECB measurement

practice is that of clarifying what is to be measured in ECB outcomes. This means that

ECB practitioners need to be clear about the outcomes measured that are linked with

the ECB intervention. While the ultimate goal of ECB intervention is improved

organizational outcome, program evaluation theory and measurement in evaluation

literature would dictate that what needs to be measured in the ECB interventions are

the ECB learning outcomes. This distinction has not been made clear in practice.

In this study, the findings have shown that ECB content and implementation

tended to focus more on individual evaluation capacity building compared with

building organizational evaluation capacity. These included capacities building

fundamental knowledge and skills in carrying out evaluations at the program level of

organizations. Only a few cases focused on organizational evaluation capacity

building that would include building systems, processes, support, and evaluation

culture. Most ECBs targeted individual evaluation capacity change compared with

organizational evaluation capacity change and mostly involved program staff and

managers compared with a few that involved the organization leadership and program

beneficiaries.

147

With regards to ECB measurement, the findings revealed that only a relatively

small group of ECB practitioners measure ECB outcomes. These are mostly ECB

cases in the quantitative paradigm, delivered through combined direct and indirect

training approaches. The findings also show that the propensity to measure ECB

outcomes is likely to be influenced by the methodological paradigm of the ECB

implementation, ECB content proficiency level, construct sub-domains and

participant focus. In carrying out the measurement, there was some level of

acceptable rigor but there is need for improvement to identify the scope of the

variables being measured and to establish the properties of the measurement tools

used. These findings affirmed the need to understand the ECB content construct so as

to be able to identify what needs to be measured in ECB.

These findings in ECB practice and measurement confirmed empirically the

gaps found in the theoretical literature. There is a need to define what ECB outcomes

to measure and to understand the characteristics of the ECB outcomes to be measured.

A way to understand is to borrow the ideas from learning and measurement theory

from the field of education. This study addressed this gap by defining what is to be

measured in ECB and describing its characteristics. This is the distinctive contribution

of this study.

ECB Content Construct and ECB Developmental Proficiency Structure

Item Response Theory Analysis and Factor Analysis have provided evidence

to suggest that ECB content topics delivered in ECB initiatives fit a developmental

progression continuum and that sub-domains exist for this high order learning

construct. This means that there are two significant findings established in this study,

the confirmation that ECB is a learning construct and that it has a progressive

148

structure across several domains. This study produced two important study outputs:

(1) the ECB developmental proficiency scale; and (2) sub-domain components of

ECB. These findings suggest that the ideas in educational measurement apply in ECB

and that developmental constructs of learning actually exist in ECB. This means that

the assessment of learning and developmental approach to assessment proposed by

Griffin (Griffin, 2007) are measurement principles that could fit in ECB measurement

practice.

The survey of ECB measurement studies in Chapter 2 showed that in the

literature examined, ECB measurement has never been conceived this way. ECB

measurement studies were focused on the development of ECB measurement

instruments. These developed ECB measurements were particularly focused on the

different components and elements of the ECB program theory. As mentioned earlier,

the published ECB assessment tools have been mapped into the components of the

Integrative Evaluation Capacity Building (IECB) model (Labin, 2008, 2014; Labin, et

al., 2012). These existing ECB assessment tools, when applicable, can be mapped as

well in this ECB model of developmental proficiency.

This idea of progression and developmental models in ECB is not new. For

example, in a study that examined the link of individual and collective attributes of an

organization to build evaluation capacity and utilize outcome evaluation, Brown and

Reed (2002) documented that traditional training in evaluation tended to focus on

individual change and those that focus on organization change efforts tend to

incorporate individual change as a necessary sub-component of organizational

change. The study further found out that these approaches did not adequately

incorporate a developmental context within the evaluative framework. The study

presents an integral and developmental approach that links these individual and

149

organizational attributes. In a second study from an unpublished dissertation,

Gullickson (2010) generated a descriptive theory of evaluation mainstreaming by

examining four cases of National Science Foundation‘s Advanced Technological

Program. The study‘s findings suggest the existence of developmental stages of

evaluation within an organization. While these studies are limited to the recognition

of the significance of progression and developmental models in ECB, the present

study has contributed to this existing knowledge by demonstrating and establishing

clearly how this idea of developmental progression can be applied using ECB

empirical data.

Implications to ECB Measurement Practice

These results are important in the delivery of ECB efforts in three significant

ways. First, they provide a clear basis to determine what level of ECB content will be

considered in an ECB effort at any level in the hierarchy, depending on the assessed

needs of the organization. Needs assessment is often unbridled and could possibly

show a full range of learning needs. The ECB hierarchy provides order to this range

of learning needs, directing stakeholders to those they should prioritize and those

whice are realistic given the available time frame and resources. Second, the

identified sub-domains of the ECB content topics also identify the cluster of topics on

which the ECB initiative can focus on. Lastly, these hierarchy and sub-domains of the

ECB could be the basis of evaluation for measuring ECB learning outcomes.

Consider, for example a scenario where the results of a needs assessment for an ECB

planning in an organization reveal various evaluation capacity building needs. These

needs could range from developing skills in evaluation data management, program

delivery, evaluation reporting, and the like. Without the ECB developmental

150

proficiency and ECB learning domain references, it would be difficult for

practitioners to identify which activities to implement and prioritize. Thus, the results

of this study have practical value for ECB planners and practitioners. Furthermore, the

study results show that ECB outcomes have to be defined in terms of the ECB

learning outcomes. To illustrate this point, suppose there is an ECB initiative. The

ECB program theory model calls for investigation of the motivational factors and

reasons for conducting the ECB. One of the approaches to this proactive evaluation

phase for ECB is to conduct needs assessment and determine learning gaps. The ECB

facilitators then negotiate with the stakeholders for the design, implementation plan

and provision of the resources to carry out the ECB learning activities. If one

supposes that the competency gaps identified in the assessment phase are the basics of

evaluation knowledge and skills, and the learning activities were delivered to address

this need, then the expected ECB learning outcomes can only be limited to this level

of ECB content hierarchy. There should be no expectations to see changes beyond

what has been delivered in the ECB effort, although they may occur. For example,

when only the basics of evaluation knowledge and skills were delivered, there should

be no expectations to see changes in the evaluation system, strengthening of the

resources to support evaluation activities in the organization or improvement of the

evaluation social network if these learning outcomes were not addressed in the ECB

effort. To do so would be to set up the ECB effort for failure. It becomes further

complicated and unreasonable to expect improvement in the program outcomes

delivered by the organization after ECB training on evaluation basics. The point of

this illustration is that ECB assessment can only measure what has been delivered in

the ECB effort. The hierarchy of ECB content to be delivered provides these target

ECB learning outcomes.

151

In the IECB model, a prominent divide in the outcomes component is the

individual-level against organizational-level outcomes as a consequence of individual

or organizational focused ECB. There has been some investigation into how

individual evaluation capacities translate to organizational evaluation capacities

(Brown & Reed, 2002). However, the situation could be seen as a ―chicken or egg‖

scenario of what comes first. It has been advocated that evaluation systems and

structures need to be set up first to influence culture and practice, which in turn are

influenced by collective individual evaluation skills and vice-versa. Evidence from

this study shows that while this divide could be intuitively deduced, factor analysis of

the content construct of ECBs delivered in practice does not support this assumption.

The evidence suggests that by looking at ECB content construct, there are underlying

sub-dimensions with defensible hierarchy but the individual versus organizational

divide has not been evident.

Practitioners have long emphasized that the ultimate goal of ECB is improved

program outcomes. This means that ―ECB learning outcomes‖ could be an

intermediary step of the ECB outcomes hierarchy. There has to be a clear distinction

between the two kinds of ECB outcomes: the ECB learning outcomes and the ECB

program outcomes (also referred to as organizational outcomes). It is important to be

clear about this distinction because organizational outcomes or program outcomes

have their own set of key and mediating factors and are most likely independent of

those of the ECB learning outcomes. For example, ―Improved health outcomes‖ of an

organization‘s program recipients will have different factors and mediators

influencing this outcome compared with the ―Improved health program delivery‖ of

an ECB learning outcome of an organization‘s program staff.

152

In summary, the findings of this study suggest that while the IECB is

successful in defining the programmatic perspective of ECB, the ECB outcomes can

be re-configured into ECB learning outcomes as intermediate outcome influencing the

ultimate organizational outcomes or program outcomes. Findings further suggest that

to understand ECB learning outcomes one must take into account the hierarchical

characteristics of the ECB content construct that has been delivered in actual practice.

Limitations of the Study

The limitations of the present study should be noted. This endeavour was

carried out by a single researcher, which means there may have been possible sources

of errors in the coding and interpretation of the ECB reports. In the cases that did not

explicitly report the information needed for the study, the researcher was forced to use

his best judgment in assigning the data codes. In addition, the researcher did not

examine the organizational theories that may have underpinned the findings and their

interpretation. The discussions were narrowly focused on the areas of program theory

and learning assessment. It is possible, therefore, that further implications could have

been identified if the background literature had been expanded to organizational

theories.

Future Research Directions

The findings of this study introduced the notion of ECB developmental

proficiency. It opens a door to rethink measurement in ECB beyond psychometric

properties and development of assessment tools by component of the ECB program

theory. It challenges ECB practitioners to think of measuring ECB with respect to the

progress defined by the ECB hierarchical construct. It also challenges ECB

153

practitioners to delineate ECB evaluation between ECB learning outcomes and

program outcomes of an organization when it operates at different levels of evaluation

capacity. For example, the mapping of the ECB assessment tools could be done

according to ECB developmental proficiency and along with its sub-domains in

addition to the ECB assessment tools that have been mapped to ECB program theory.

In addition, this idea of ECB developmental proficiency can be examined through the

lens of organizational and management theories. Thus, future research directions in

ECB can explore this perspective of ECB developmental proficiency.

Conclusion

The primary aims of this study were to document and describe the

measurement practice that has occurred in ECB initiatives as reported in published

ECBs, and to investigate whether empirical evidence supports the notion of ECB

developmental proficiency that follows from the learning intervention perspective of

ECB. These aims were achieved and the results have affirmed the hypothesis of the

study: ―Evaluation capacity building as a learning intervention would call for a

progressive approach to content delivery and outcomes measurement.”

The main contribution of this study to the field of Evaluation, in particular in

the area of measurement in Evaluation Capacity Building, is the introduction of the

notion that ECB is a learning intervention. Through this assumption, it was

demonstrated that ECB follows a developmental proficiency construct. This study has

clearly established that ECB can be viewed as a learning progression. It has set a case

to reframe ECB content, implementation and measurement practice according to this

point of view.

154

REFERENCES

Adams, J., & Dickinson, P. (2010). Evaluation training to build capability in the

community and public health workforce. American Journal of Evaluation,

31(3), 421-433.

Andrews, A. B., Motes, P. S., Floyd, A., Flerx, V. C., & Fede, A. L. (2005). Building

evaluation capacity in community-based organizations: reflections of an

empowerment evaluation team. Journal of Community Practice, 13(4), 85-

104.

Arnold, M. E. (2006). Developing evaluation capacity in extension 4-H field faculty:

A framework for success. American Journal of Evaluation, 27(2), 257 - 269.

. Assessment and learning partnerships: A short course for school leaders. (2012). In

U. o. M. Assessment Research Centre (Ed.).

Atkinson, D. D., Wilson, M., & Avula, D. (2005). A participatory approach to building

capacity of treatment programs to engage in evaluation. Evaluation and

Program Planning(3), 329.

. ATLAS.ti (Version 7.1.7). (2014) [Computer Software]. Berlin: Cincom Systems,

Inc. .

Beere, D. (2005). Evaluation capacity building: A tale of value adding. Evaluation

Journal of Australasia, 5(2), 41-47.

Blalock, H. M. (1979). The presidential address: Measurement and conceptualization

problems: The major obstacle to integrating theory and research. American

Sociological Review, 44(6), 881-894.

Blalock, H. M. (1982). Conceptualization and measurement in the social sciences

Beverly Hills: Sage Publications.

Botcheva, L., White, C. R., & Huffman, L. C. (2002). Learning culture and outcomes

measurement practices in community agencies. American Journal of

Evaluation, 23, 421 - 434.

Braverman, M. T. (2013). Negotiating Measurement: Methodological and

Interpersonal Considerations in the Choice and Interpretation of Instruments.

American Journal of Evaluation, 34(1), 99-114.

Braverman, M. T., & Arnold, M. E. (2008). An evaluator's balancing act: Making

decisions about methodological rigor. New Directions for Evaluation(120), 71-

86.

Brisolara, S. (1998). The history of participatory evaluation and current debates in the

field. New Directions for Evaluation, 1998(80), 25-41.

155

Brown, R. E., & Reed, C. S. (2002). An integral approach to evaluating outcome

evaluation training. American Journal of Evaluation, 23(1), 1 - 17.

Chouinard, J. A. (2013). The case for participatory evaluation in an era of

accountability. American Journal of Evaluation, 34(2), 237-253.

Clinton, J. (2014). The true impact of evaluation: Motivation for ECB. American

Journal of Evaluation, 35(1), 120-127.

Compton, D., Baizerman, M., Preskill, H., Rieker, P., & Miner, K. (2001). Developing

evaluation capacity while improving evaluation training in public health: the

American Cancer Society's Collaborative Evaluation Fellows Project.

Evaluation and Program Planning, 24, 33 - 40.

Compton, D., Baizerman, M., & Stockdill, S. H. (Eds.). (2002). The art, craft, and

science of evaluation capacity building. San Francisco: Jossey-Bass.

Corcoran, T., Mosher, F. A., & Rogat, A. (2009). Learning Progressions in Science:

An Evidence-Based Approach to Reform: Consortium for Policy Research in

Education.

Cousins, J. B., Goh, S. C., Elliott, C., & Aubry, T. (2008). Government and voluntary

sector differences in organizational capacity to do and use evaluation. Paper

presented at the Annual meeting of the Canadian Evaluation Society, Quebec,

Canada. http://evaluation.ca/site.cgi?s=1

D'Eon, M., Sadownik, L., Harrison, A., & Nation, J. (2008). Using self-assessments to

detect workshop success: Do they work? American Journal of Evaluation,

29(1), 92-98.

Danseco, E., Halsall, T., & Kasprzak, S. (2009). Readiness assessment tool for

evaluation capacity building: The Provincial Centre for Excellence for Child

and Youth Mental Health at CHEO, Ottawa, Canada.

de Winter, J.C.F., & Dodou, D. (2012). Factor recovery by principal axis factoring

and maximum likelihood factor analysis as a function of factor pattern and

sample size. Journal of Applied Statistics. 29(4), 695-710.

Diaz-Puente, J. M., Yague, J. L., & Afonso, A. (2008). Building evaluation capacity in

Spain: A case ctudy of rural development and empowerment in the European

Union. Evaluation Review, 32(5), 478-506.

Dunaway, K. E., Morrow, J. A., & Porter, B. E. (2012). Development and validation

of the Cultural Competence of Program Evaluators (CCPE) self-report scale.


Fetterman, D., Rodriguez-Campos, L., Wandersman, A., & O'Sullivan, R. G. (2014).

Collaborative, participatory, and empowerment evaluation: Building a strong

conceptual foundation for stakeholder involvement approaches to evaluation

(A response to Cousins, Whitmore, and Shulha, 2013). American Journal of

Evaluation, 35(1), 144 - 148.

http://evaluation.ca/site.cgi?s=1

156

Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation :

alternative approaches and practical guidelines (4th ed.). Upper Saddle River,

N.J. : Pearson Education

Freedman, D. A. (2009). Statistical models: Theory and practice. Cambridge,

England: Cambridge University Press.

Griffin, P. (2007). The comfort of competence and the uncertainty of assessment.

Studies In Educational Evaluation, 33, 87-99.

Grob, G. F. (2010). Evaluation field building in South Asia: insights from the rear

view mirror. American Journal of Evaluation, 31(2), 241-245.

Gullickson, A. (2010). Mainstreaming evaluation: Four case studies of systematic

evaluation integrated into organizational culture and practices. Western

Michigan University.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: principles and

applications. Boston: Kluwer-Nijhoff Publishing.

Henry, G. T., & Mark, M. M. (2003). Beyond use: Understanding evaluation's

influence on attitudes and actions. American Journal of Evaluation, 24(3),

293-314.

Holvoet, N., & Dewachter, S. (2013). Building national M&E systems in the context

of changing aid modalities: The underexplored potential of National

Evaluation Societies. Evaluation and Program Planning, 41, 47 - 57.

. IBM SPSS Statistics (Version 20). (2011): International Business Machines

Corporation.

Huba, M.E., & Freed, J.E. (2000). Learner-centered assessment on college campuses:

Shifting the focus from teaching to learning. Boston: Allyn and Bacon.

King, J. A. (2010). Response to evaluation field building in South Asia: Reflections,

anecdotes, and questions. American Journal of Evaluation, 31(2), 232-237.

Kuzmin, A. (2012). Participatory Training Evaluation Method (PATEM) as a

collaborative evaluation capacity building strategy. Evaluation and Program

Planning, 35(4), 543-546.

Labin, S. N. (2008). Research synthesis: Toward broad-based evidence. In N. L.

Smith & P. R. Brandon (Eds.), Fundamental issues in evaluation. (pp. 89-110).

New York, NY US: Guilford Press.

Labin, S. N. (2014). Developing common measures in evaluation capacity building:

An iterative science and practice process. American Journal of Evaluation,

35(1), 107-115.

Labin, S. N., Duffy, J. L., Meyers, D. C., Wandersman, A., & Lesesne, C. A. (2012). A

research synthesis of the evaluation capacity building literature. American

157


Lam, T. C. M. (2009). Do self-assessments work to detect workshop success? An

analysis of argument and recommendation by D'Eon et al. American Journal

of Evaluation, 30(1), 93-105.

Leviton, L. C. (2001). Presidential address: building evaluation's collective capacity.

American Journal of Evaluation, 22(1), 1.

Leviton, L. C. (2014). Some underexamined aspects of evaluation capacity building.


McDonald, B., Rogers, P., & Kefford, B. (2003). Teaching people to fish? Building

the evaluation capability of public sector organizations. Evaluation, 9(1), 9.

McGeary, J. (2009). A critique of using the Delphi Technique for assessing evaluation

capability-building needs. Evaluation Journal of Australasia, 9(1), 31.

Miller, R. L., & Campbell, R. (2006). Taking Stock of Empowerment Evaluation: An

Empirical Review. American Journal of Evaluation, 27(3), 296-319.

Milstein, B., Chapel, T. J., Wetterhall, S. F., & Cotton, D. A. (2002). Building

Capacity for Program Evaluation at the Centers for Disease Control and

Prevention. New Directions for Evaluation(93), 27-46.

Nielsen, S. B., Lemire, S., & Skov, M. (2011). Measuring evaluation capacity: Results

and implications of a Danish study. American Journal of Evaluation, 32(3),

324-344.

O'Sullivan, R. G. (2012). Collaborative evaluation within a framework of stakeholder-

oriented evaluation approaches. Evaluation and Program Planning, 35(4),

518-522.

Patton, M. Q. (2002). Qualitative research & evaluation methods Thousand Oaks,

California : Sage Publications.

Preskill, H. (2008). Evaluation's second act: A spotlight on Learning. American


Preskill, H., & Boyle, S. (2008). A Multidisciplinary Model of Evaluation Capacity

Building. American Journal of Evaluation, 29(4), 443-459.

Preskill, H., & Russ-Eft, D. F. (2005). Building evaluation capacity : 72 activities for

teaching and training: Thousand Oaks, Calif. : Sage Publications.

Preskill, H., & Torres, R. T. (2000). Readiness for organizational learning and

evaluation instrument. Retrieved from [email protected]

Sanders, J. R. (2002). Presidential address: On mainstreaming evaluation. American

Journal of Evaluation, 23(3), 253 - 259.

Stockdill, S. H., Baizerman, M., & Compton, D. W. (2002). Toward a Definition of

158

the ECB Process: A Conversation with the ECB Literature. New Directions for

Evaluation(93), 1-25.

Stufflebeam, D. L., & Shinkfield, A. J. (2007). Evaluation theory, models, and

applications San Francisco : Jossey-Bass

Suarez-Balcazar, Y., & Taylor-Ritzler, T. (2014). Moving from science to practice in

evaluation capacity building. American Journal of Evaluation, 35(1), 95-99.

Taut, S. (2007). Studying self-evaluation capacity building in a large international

development organization. American Journal of Evaluation, 28(1), 45-59.

Taylor-Powell, E., & Boyd, H. H. (2008). Evaluation capacity building in complex

organizations. New Directions for Evaluation(120), 55-69.

Taylor-Ritzler, T., Suarez-Balcazar, Y., Garcia-Iriarte, E., Henry, D. B., & Balcazar, F.

E. (2013). Understanding and measuring evaluation capacity: A model and

instrument validation study. American Journal of Evaluation, 34(2), 190-206.

TCU. (2005). Organizational readiness for change (TCU ORC). Retrieved from

http://www.ibr.tcu.edu/evidence/evi-orc.html

Thompson, B. (2004). Exploratory and confirmatory factor analysis. Washington,

DC: American Psychological Association.

Urban, J. B., Burgermaster, M., Archibald, T., & Byrne, A. (2013). Relationships

between quantitative measures of evaluation plan and program model quality

and a qualitative measure of participant perceptions of an evaluation capacity

building approach. Journal of Mixed Methods Research, 201(X)(XX(X)), 1 -

24

Volkov, B., & King, J. (2007). A checklist for building organizational evaluation

capacity. Retrieved from http://www.wmich.edu/evalctr/checklists/ecb.pdf

Wandersman, A. (2014). Getting to outcomes: an evaluation capacity building

example of rationale, science, and practice. American Journal of Evaluation,

35(1), 100-106.

Weitzman, B. C., & Silver, D. (2013). Good evaluation measures: More than their

psychometric properties. American Journal of Evaluation, 34(1), 115-119.

Wholey, J. S. (1987). Evaluability assessment: Developing program theory. New

Directions for Program Evaluation(33), 77-92.

Wu, M., Adams, R., & Haldane, S. (2006). ConQuest: Multi-aspect test software

[Computer software]. Melbourne: Australian Council for Educational

Research.

http://www.ibr.tcu.edu/evidence/evi-orc.html

http://www.wmich.edu/evalctr/checklists/ecb.pdf

159

APPENDIX A

List of Published ECB Cases in the Study Sample

Case

Number

Reference

1_CU1 Andrews, A.B., Motes, P.H., Floyd, A.G., Flerx, V.C., and Lopez-De

Fede, A. (2005). Building evaluation capacity in community-

based organizations: Reflections of an empowerment

evaluation team. Journal of Community Practice, Vol. 13(4).

2_CU2 Arnold, M.E. (2006). Developing evaluation capacity in extension 4-H

field faculty: A framework for success. American Journal of

Evaluation, Vol. 27(2).

3_CU3 Randall, A.L. (2005). Training school counselors in program evaluation.

Professional School Counseling, Vol. 9(1).

4_CU4 Atkinson, D.D., Wilson, M., and Avula, D. (2005). A participatory

approach to building capacity of treatment programs to

engage in evaluation. Evaluation and Program Planning, Vol.

28.

5_CU6 Brandon, P.R., and Higa, T.A.F. (2004). An empirical study of building

the evaluation capacity of K-12 site-managed project

personnel. The Canadian Journal of Program Evaluation,

Vol. 19(1).

6_CU7 Brown, N.L., Luna, V., Ramirez, M.H., Vail, K.A., and Williams, C.A.

(2005). Developing an effective intervention for IDU women:

A harm reduction approach to collaboration. AIDS Education

and Prevention, 17(4), 317-333.

7_CU8 Brown, R.E., and Reed, C.S. (2002). An integral approach to outcome

evaluation training. American Journal of Evaluation, 23(1), 1-

17.

160

8_CU9 Campbell, R., Dorey, H., Naegeli, M., Grubstein, L.K., Bennett, K.K.,

Bonter, F., Smith, P.K., Grzywacz, J., Baker, P.K, and

Davidson, W.S. II. (2004). An empowerment evaluation

model for sexual assault programs: Empirical evidence of

effectiveness. American Journal of Community Psychology,

34 (3/4), 251-262.

9_CU10 Carden, F., and Earl, S. (2007). Infusing evaluative thinking as a

process use: The case of the International Development

Research Center (IDRC). New Directions for Evaluation, 16,

61 – 73.

10_CU11 Chinman, M., Hunter, S.B., Ebener, P., Paddock, S.M., Stillman, L.,

Imm, P., and Wandersman, A. (2008). American Journal of

Community Psychology, 41, 206 – 224.

11_CU12 Cohen, C. (2006). Evaluation learning circles: a sole proprietor‘s

evaluation capacity-building strategies. New Directions for

Evaluation, 111, 85 – 93.

12_CU13 Compton, D., Baizerman, M., Preskill, H., Rieker, P., and Miner, K.

(2001). Developing evaluation capacity while improving

evaluation training in public health: the American Cancer

Society‘s Collaborative Evaluation Fellow‘s Project.

Evaluation and Program Planning, 24, 33 – 40.

13_CU14 Diaz-Puente, J.M., Yague, J.L., and Afonso, A. (2008). Building

evaluation capacity in Spain: a case study of rural

development and empowerment in the European Union.

Evaluation Review, 32(5), 478 – 506.

14_CU17 Fetterman, D., and Bowman, C. (2002). Experiential education and

empowerment evaluation: Mars Rover Educational Program

case example. The Journal of Experiential Education, 25(2),

286 – 295.

161

15_CU18 Fetterman, D. (2001). Empowerment evaluation and self-determination:

A practical approach toward program improvement and

capacity building. In N. Schneiderman, M.A. Speers, J.M.

Silva, H. Tomes, & J.H. Gentry (Eds.), Integrating

behavioural and social sciences with public health. (pp. 321 –

350). Washington, D.C. US: American Psychological

Association.

16_CU21 Flaspohler, P., Wandersman, A., Keener, D., Maxwell, K.N., Ace, A.

Andrews, A., & Holmes, B. (2003). Promoting program

success and fulfilling accountability requirements in a state-

wide community-based initiative. Journal of Prevention and

Intervention in the Community, 26(2), 37 – 52.

17_CU22 Harper, G.W., Contreras, R., Bangi, A., & Pedraza, A. (2003).

Collaborative process evaluation. Journal of Prevention and

Intervention in the Community, 26(2), 53 – 69.

18_CU23

19_CU24

20_CU25

21_CU25ab

Hoole, E., & Patterson, T.E. (2008). Voices from the field: Evaluation

as part of a learning culture. In J.G. Carman & K.A.

Fredericks (Eds.), Non-profits and evaluation. New

Directions for Evaluation, 119, 93 – 113.

22_CU27 Katz, S., Sutherland, S., & Earl, L. (2002). Developing an evaluation

habit of mind. The Canadian Journal of Program Evaluation,

17(2), 103 – 119.

23_CU30 King, J. (2002). Building the evaluation capacity of a school district.

New Directions for Evaluation, 93, 63 – 80.

24_CU32 Lennie, J. (2005). An evaluation capacity-building process for

sustainable community IT initiatives: Empowering and

disempowering impacts. Evaluation, 11(4), 390 – 414.

25_CU34 MacLellan-Wright, M.F., Patten, S., dela Cruz, A.M., & Flaherty, A.

(2007). A participatory approach to the development of an

evaluation framework: Process, pitfalls, and payoffs. The

Canadian Journal of Program Evaluation, 22(1), 99 – 124.

162

26_CU35 Maher, C.A. (1981). Training of managers in program planning and

evaluation. Journal of Organizational Behavior Management,

3(1), 45 – 56.

27_CU36 Mathews, M., & Lynch, A. (2007). Increasing research skills in rural

health boards: An evaluation of a training program for

Western Newfoundland. The Canadian Journal of Program

Evaluation, 22(2), 41 – 56.

28_CU37 McDonald, B., Rogers, P., & Kefford, B. (2003). Teaching people to

fish? Building the evaluation capability of public sector

organizations. Evaluation, 9(1), 9 – 29.

29_CU39 Milstein, B., Chapel, T.J., Wetterhall, S.F., & Cotton, D.A. (2002).

Building capacity for program evaluation at the Centers for

Disease Control and Prevention. New Directions for

Evaluation, 93, 27 – 46

30_CU40 Moon, S.M. (1996). Using the Purdue three-stage model to facilitate

local program evaluations. Gifted Child Quarterly, 40(3), 121

– 128.

31_CU41 Myrick, R., Lemell, A., Aoki, B., Truax, S., & Lemp, G. (2005). Best

practices for community collaborative research. AIDS

Education and Prevention, 17(4), 400 – 404.

32_CU42 Naccarella, L., Pirkis, J., Kohn, F., Morley, B., Burgess, P., & Blashki,

G. (2007). Building evaluation capacity: Definitional and

practical implications from and Australian case study.


33_CU43 Nagao, M., Kuji-Shikatani, K., & Love, A.J. (2005). Preparing school

evaluators: Hiroshima pilot test of Japan Evaluation Society‘s

accreditation project. The Canadian Journal of Program

Evaluation, 20(2), 125 – 155.

34_CU44a O‘Sullivan, R.G., & D‘Agostino, A. (2002). Promoting evaluation

through collaboration: Findings from community-based

163

programs for young children and their families. Evaluation,

8(3), 372 – 387.

35_CU46 Ploeg, J., de Witt, L., Hutchison, B., Hayward, L., & Grayson, K.

(2008). Evaluation of a research mentorship program in

community care. Evaluation and Program Planning, 31, 22 –

33.

36_CU47 Porteous, N.L., Sheldrick, B.J., & Stewart, P.J. (1999). Enhancing

managers‘ evaluation capacity: A case study from Ontario

public health. The Canadian Journal of Program Evaluation,

Special Issue, 137 – 154.

37_CU48 Ryan, K.E., Geissler, B., & Knell, S. (1996). Progress and

accountability in family literacy: Lessons from collaborative

approach. Evaluation and Program Planning, 19(3), 263 –

272.

38_CU49 Schnoes, C.J., Murphy-Berman, V., & Chambers, J.M. (2000).

Empowerment evaluation applied. American Journal of

Evaluation, 21(1), 53 – 64.

39_CU50 Secret, M., Jordan, A., & Ford, J. (1999). Empowerment evaluation as a

social work strategy. Health and Social Work, 24(2), 120 –

127.

40_CU52 Stevenson, J.F., Florin, P., Mills, D.S., & Andrade, M. (2002). Building

evaluation capacity in human services organizations: A case

study. Evaluation and Program Planning, 25, 233 – 243.

41_CU53 Suarez-Balcazar, Y., Orellana-Damacela, L., Portillo, N., Sharma, A.,

& Lanum, M. (2003). Implementing an outcomes model in

the participatory evaluation community initiatives. Journal of

Prevention & Intervention in the Community, 26(2), 5 – 20.

42_CU54 Sullins, C.D. (2003). Adapting the empowerment evaluation model: A

mental health drop-in center case example. American Journal

of Evaluation, 24(3), 387 – 398.

43_CU55 Tang, H., Cowling, D.W., Koumjian, D.W., Roeseler, A., Lloyd, J., &

Rogers, T. (2002). Building local program evaluation

capacity toward a comprehensive evaluation. New Directions

164

for Evaluation, 95, 39 – 56.

44_CU56 Taut, S. (2007). Studying self-evaluation capacity building in a large

international development organization. American Journal of

Evaluation, 28(1), 45 – 59.

45_CU57 Trevisan, M. (2001). Implementing comprehensive guidance program

evaluation support: Lessons learned. Professional School

Counseling, 4(3), 225 – 229.

46_CU59 Valery, R., & Shakir,S. (2005). Evaluation capacity building and

humanitarian organization. Journal of Multidisciplinary


47_CU60 Willer, B.S., Bartlett, D.P., & Northman, J.E. (1978). Simulation as a

method for teaching program evaluation. Evaluation and

Program Planning, 1, 221 – 228.

48_CU61 Yawson, R.M., Amoa-Awua, W.K., Sutherland, A.J., Smith, D.R., &

Noamesi, S.K. (2006). Developing a performance

measurement framework to enhance the impact orientation of

the Food Research Institute, Ghana. R&D Management,

36(2), 161 – 172.

49_P58 Mayberry, R.M., Daniels, P., Yancey, E.M., Akintobi, T.H., Berry, J.,

Clark, N., & Dawaghreh, A. (2009). Enhancing community-

based organizations‘ capacity for HIV/AIDS education and

prevention. Evaluation and Program Planning, 32, 213 –

220.

50_P59 Fleming, M.L., & Easton, J. (2010). Building environmental educators‘

evaluation capacity through distance education. Evaluation

and Program Planning, 33, 172 – 177.

51_P60 Bourgeois, I., Hart, R.E., Townsend, S.H., & Gagne, M. (2011). Using

hybrid models to support the development of organizational

evaluation capacity: A case narrative. Evaluation and

Program Planning, 34, 228 – 235.

52_P61 Kapucu, N., Healy, B.F., & Arslan, T. (2011). Survival of the fittest:

Capacity building for small nonprofit organizations.

165


53_P63 Satterland, T.D., Treiber, J., Kipke, R., Kwon, N., & Cassady, D.

(2013). Accommodating diverse clients‘ needs in evaluation

capacity building: A case study of the Tobacco Control

Evaluation Center. Evaluation and Program Planning, 36, 49

– 55.

54_P64 Akintobi, T.H., Yancey, E.M., Daniels, P., Mayberry, R.M., Jacobs, D.,

& Berry, J. (2012). Using evaluability assessment and

evaluation capacity-building to strengthen community-based

prevention initiatives. Journal of Health Care for the Poor

and Underserved, 23(2), 33 – 48.

55_P65 Compton, D.W. (2009). Managing studies versus managing for

evaluation capacity-building. In D.W. Compton & M.

Baizerman (Eds.), Managing program evaluation: Towards

explicating a professional practice. New Directions for


56_P66 Baron, M.E. (2011). Designing internal evaluation for small

organization with limited resources. In B.B. Volkov and M.E.

Baron (Eds.), Internal evaluation in the 21st century. New

Directions for Evaluation, 132, 87 – 99.

57_P67 Rotondo, E. (2012). Lessons learned from evaluation capacity building.

In S. Kushner & E. Rotondo (Eds.), Evaluation voices from

Latin America. New Directions for Evaluation, 134, 93 – 101.

58_P68 Adams, J. & Dickinson, P. (2010). Evaluation training to build capacity

in the community and public health workforce. American

Journal of Evaluation, 31(3), 421 – 433.

59_P69 Rogers, S.J., Ahmed, M., Hamdallah, M., & Little, S. (2010). Garnering

grantee buy-in on a national cross-site evaluation: The case of

ConnectHIV. American Journal of Evaluation, 31(4), 447 –

462.

60_P70 Garcia-Iriarte, E., Suarez-Balcazar, Y., Taylor-Ritzler, T., & Luna, M.

(2011). A catalyst-for-change approach to evaluation capacity

166

building. American Journal of Evaluation, 32(2), 168 – 182.

61_P72 Anderson, C., Chase, M., Johnson, J., Mekiana, D., McIntyre, D.,

Ruerup, A., & Kerr, S. (2012). It is only new because it has

been missing for so long: Indigenous evaluation capacity

building. American Journal of Evaluation, 33(4), 566 – 582.

62_P75 Hanwright, J., & Makinson, S. (2008). Promoting evaluation culture.

Evaluation Journal of Australasia, 8(1), 20 – 25.

63_P76 Karlsson, P., & Beijer, E. (2008). Evaluation workshops for capacity

building in welfare work. Evaluation, 14(4), 483 – 498.

166

APPENDIX B

Coding Form

Instrument for ECB Context, Content and Implementation

and Assessment Tool for Rigor of ECB Quantitative Measurement Practice

This coding form aims to assess the Evaluation Capacity Building (ECB) practices among

organizations using published ECB reports. The primary aim is to:

Document ECB context, content and implementation variables. This refers to a checklist

where the descriptive characteristics of ECB practice are coded. The items were developed

mostly from Labin, Duffy, Meyers, Wandersman and Lesesne (2012) Integrative ECB model.

Measure the Rigor of ECB quantitative measurement practice. This refers to a rating scale to

determine the levels of ECB quantitative measurement practices. This is applied to reports that

used the quantitative approach. The items were developed partly from Braverman (2013).

This instrument is divided into several parts. The first part documents the profile of ECB

being reported, allowing to record the contextual facts of the implemented ECB as well as

about the organization and its programs. The second part documents the context, content and

implementation variables of ECB. The third part measures the rigor of ECB quantitative

measurement practice for reports that use quantitative approach.

Scoring

The ECB, organization and program profile is not scored but determines the characteristics

and typologies of ECB.

The Rigor of ECB Quantitative Measurement Practice and the Quality and Credibility of ECB

Qualitative Evidence Practice are scored using the rubrics provided.

Note: The content items were mostly adapted from Labin et al. (2012). Additions

were: Part 1, participant focus (Item 7), ECB contact duration (Item 8),

outcomes expectations (Item 9), leadership collaboration (Item 10) and Items 19 -

23. Part 2, Item 2 (F to K). Part 3 was developed by the researcher.

167

Instruction: Write or tick boxes when appropriate.

PART 1 Evaluation Capacity Building, Organization and Program Profile

Reference Number

(For Atlasti Code Tracking)

1. Title of ECB/Article

2. Authors

3. Journal/Publisher

4. Country

5. Year the report was published

6. Year ECB was completed

7. ECB Domain □ Education

□ Health

□ Child/youth development

□ Community/rural development

□ Policy research

8. Type of organization □ Non-profit

□ For-profit □ Government

□ School/school district

□ University only

□ Multiple types □ Other:

9. Technological capability

□ Presence of IT infrastructure (computers)

□ Availability of IT communications (internet)

□ Availability of IT skilled personnel

□ Other:

10. Paradigm of ECB report □ Quantitative

□ Qualitative

If both: □ Mixed methods

□ Multiple methods

11. ECB report or evaluation made by □ ECB practitioner/facilitator

□ External or independent evaluator

□ Recipient organization internal evaluator □ Other:

168

12. Evaluator Affiliation □ University □ Private consultancy

□ Internal evaluation unit

□ Other:

13. Report on □ Single organization

□ Multiple organizations

14. ECB Initiator □ Funder/Grant maker □ Organization/Grantee

□ Government

□ University

□ Research facility

15. Purpose of Measurement □ Establish baseline information or after action information □ Inform ECB Design

□ Guide implementation for adjustments

□ Evaluate ECB impact

16. Program Implementation □ Single program

□ Multiple programs

17. Program Site □ One-site

□ Multi-site

18. Program delivery □ Services

□ Education or capability building □ Advocacy

□ Research

19. ECB Intervention Description

20. ECB Stakeholders

21. Program Description

22. ECB View □ Training

□ Non-training

23. ECB Data Collection Approach □ Questionnaire/Survey

□ Individual Interview □ Focus groups

169

PART 2 ECB Content and Implementation Checklist

1. ECB content (Individual-level) A. □ Awareness/Attitudes

B. □ Terms, approaches or methods

C. □ Logic models D. □ Evaluation plan

E. □ How to do an evaluation

F. □ Management, analysis, interpretation or use of data

G. □ Program planning H. □ Program implementation

2. ECB content (Organization-level) A. □ Organization evaluation practices

B. □ Evaluation Readiness/Willingness

C. □ Building leadership support

D. □ Building culture for evaluation E. □ Mainstreaming evaluation

F. □ Creating/Strengthening evaluation policy requirements

G. □ Creating/Strengthening evaluation structures (teams, job roles

and Responsibilities, evaluation units) H. □ Creating/Strengthening evaluation systems (databases, shared

measurement tools, common metrics like KPIs, monitoring

systems and tools

I. □ Creating/Strengthening support for evaluation resources J. □ Improving organizational evaluation social context

K. □ Other:

3. Type of strategies reported A. □ Training/Teaching

B. □ Technical Assistance/Coaching/Support/Consultations

C. □ Involvement in evaluation D. □ Printed materials

4. Mode of strategies reported A. □ Face-to-face only

B. □ Face-to-face combined with other modes

C. □ Combination not including face-to-face

D. □ Other:

5. Intended target of ECB A. □ Individual only B. □ Organization only

C. □ Individuals and organizations

D. □ Not reported

6. Evaluation of ECB work □ Any evaluation reported

□ Not reported

7. Participant Focus A. □ Program staff or ground staff B. □ Program managers

C. □ Program beneficiaries

D. □ Organization top management and leadership

8. ECB contact duration A. □ One day or less engagement (teaching, training, workshop or

TA) B. □ More than one day engagement

C. □ Multiple times a year or once a year for multiple years

9. Intervention design (outcomes

expectation)

A. □ ECB as teaching/training but no ECB program design

B. □ ECB as teaching/training component with explicit ECB program design

10. Leadership collaboration A. □ ECB process did not involve leadership

B. □ ECB process involved leadership

170

PART 3 Rigor of ECB Measurement Practice

Criterion

(Progress variables)

Outcome Space

(Descriptions of evidence across progress variables)

Cannot be

determined

(0)

Low

(1)

Moderate

(2)

High

(3)

1. Scope of variables

measured

Not reported. Measured individuals‘

evaluation capacity (which may include for

example awareness,

knowledge, skills or

attitudes.

Measured

individuals‘ evaluation

capacity and the

organizational

evaluation capacity that

includes

evaluation

leadership, policies, systems,

resources or

structures.

Measured

individuals‘ and organizational

capacities as well

as the

organization‘s contextual

measures such as

social climate,

learning capacity, culture or social

network.

2. Obtaining

evidence

Not reported. Indirect measurement or

testing like self-report

or self-rating.

Direct

measurement such

as obtained by

observation and direct testing.

Combination of

direct and indirect

measurement.

3. Reliability of

measurement tools

Not reported. Uses tools with

unreported/unmeasured

reliability.

Uses tools with

reported reliability

and within acceptable values.

Standardized or

validated

measurement tools

are used with

justification of its appropriate

contextual use.

[Lack of

justification is

equivalent to low].

4. Utilization of ECB

measures

Not reported. ECB measures are used

to establish baseline

information to inform ECB design.

ECB measures did

not only inform

ECB design but also used to guide

ECB

implementation.

For example, adjustments in

ECB approach.

ECB measures are

ultimately used to

evaluate ECB impact at the end

of the program on

top of informing

ECB design and guiding

implementation.

5. Representativeness Not reported. The measurement used

non-probability sample like purposive

sampling, e.g. key

informants only.

The measurement

used probability sample with the

use of some form

of random

sampling techniques.

The measurement

used each case units of the

population of

interest, e.g. all

members of the organization.

171

6. Timing of measurement

Not reported. Measurements were only made once at the

beginning or at the end

of ECB project.

Measurements were made at the

beginning and the

end of ECB; may also include

measures during

ECB.

Measurements were made over an

extended period

time after ECB to see its changes in

the long term.

7. Validity of inference from

obtained measures

Not reported. The conclusions are at best anecdotal with

descriptions of

evaluation capacities

but no measures to back up claims.

Descriptions of evaluation

capacities were

made and backed

up by figures from measures; may

also extend to

comparing

measures.

Conclusions were carried out with

sound measures

and statistical

procedures that warrant statistical

inference, e.g.

hypothesis testing

or modelling.

8. Measurement

design used

Not reported. The measurement

design use simple

observational method

(no control or comparison groups and

no randomization of

case units made).

The measurement

design used

comparison groups

but lacks random assignment.

The elements of

experimental

design are present

with control and comparison groups

and random

assignments made.

END

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s:

PONCE, ROY

Title:

Measurement practice in Evaluation Capacity Building

Date:

2014

Persistent Link:

http://hdl.handle.net/11343/56512

File Description:

Measurement Practice in Evaluation Capacity Building

measurement practice in evaluation capacity …

Documents