measurement practice in evaluation capacity …
TRANSCRIPT
MEASUREMENT PRACTICE IN EVALUATION
CAPACITY BUILDING
Roy Guanco Ponce
Master of Assessment and Evaluation, Bachelor of Science in Statistics
Graduate Diploma in Econometrics
Submitted in partial fulfilment of the requirements of the degree of
Doctor of Education
December 2014
Centre for Program Evaluation
Melbourne Graduate School of Education
The University of Melbourne
ii
ABSTRACT
Evaluation Capacity Building (ECB) aims to enable individuals and organizations to
adopt the concepts and practices of evaluation. Its purpose is to mainstream the
generation and utilization of evaluation information within organizational systems and
structures. It is important in that it mediates the true impact of evaluation to
organizational outcomes. The aim of this thesis is to explore measurement practice in
relation to content, implementation and context, and to examine how outcomes are
measured in ECB initiatives. When ECB is viewed as a learning intervention, it calls
for a progressive approach to measurement. This means that ECB outcomes
measurement must consider the ECB content developmental proficiency. This study
used the Broadbased Research Synthesis Method and examined sixty-three (63)
published ECB reports with respect to content, implementation, context and
measurement practices in ECB. Item Response Theory Analysis documented ECB
content construct and hierarchical structure and Exploratory Factor Analysis revealed
ECB content sub-domains. The findings illustrate that ECB content topics delivered in
practice fit this developmental progression continuum. The main contribution of this
study to the field of Evaluation, in particular in the area of measurement in Evaluation
Capacity Building, is the introduction of the notion that ECB is a learning
intervention. Through this assumption, it was demonstrated that ECB follows a
developmental proficiency construct. This study has clearly established that ECB can
be viewed as a learning progression. Based on this perspective, it has set a case to
reframe ECB content, implementation and measurement practice. It is suggested that
future ECB initiatives utilize this alternative framework for ECB content delivery and
measurement of ECB outcomes.
iii
DECLARATION
This thesis contains no material which has been accepted for any other degree in any
university. Furthermore, to the best of my knowledge and belief, this thesis contains
no material previously published or written by any other person, except where due
reference is given in the text.
Signature:
Roy G. Ponce
iv
ACKNOWLEDGMENTS
I express my deepest appreciation and gratitude to the following persons who assisted in
the completion of this thesis:
Associate Professor Janet May Clinton and Dr. Amy Marie Gullickson, my supervisors,
whose critical feedback, guidance, encouragement and gentle forbearance made me
understand and appreciate the challenges and rewards of conducting research.
Professor John Hattie, chairman of my thesis committee, for the efficient facilitation of
the progress meetings and for investing time to examine critically my data analysis and
thesis argument. Dr. Ghislain Arbour, member of my thesis committee, who challenged
me to think outside the box and consider other perspectives.
My professional circle whose assistance, critical inputs and suggestions reduced the
isolation of this research journey: Dr. Edito Sumile, Dr. Amanda Bayliss, Dennis Alonzo,
Timoci O’Connor, Brad Astbury, Daniel Arifin, and Marion Joy Brown.
My family and friends, whose love and support are my inspiration: Papang Valentin,
Mamang Edita, May, Amy and Jonathan, Elias and Gina, Lee and Helen, Omar and
Melissa, Joeven and Arbeth, Bing and Marl, Hannah, Eden, Earvs and Norm.
The people of Australia for the Australia Awards Scholarship grant through the Philippine
Australia Human Resource Development Facility and the Davao Oriental State College of
Science and Technology. The Melbourne Graduate School of Education and Centre for
Program Evaluation of the University of Melbourne for the excellent student support
services and demonstrating the values of “Growing Esteem”.
May GOD, the source of all wisdom and understanding, in whom I believe, remember
and bless you for all your kindness and goodness.
v
Table of Contents
Abstract ………………………………………………………………………... ii
Declaration ……………………………………………………………………. iii
Acknowledgments …………………………………………………………….. iv
Table of Contents …………………………………………………………….. v
List of Tables ………………………………………………………………….. viii
List of Figures ………………………………………………………………… x
List of Appendices ……………………………………………………………. xi
CHAPTER Page
1. INTRODUCTION ……………………………………………….. 1
Statement of the Problem ……………………………………… 4
Purpose of the Study …………………………………………... 5
Aims of the Study ……………………………………………... 6
The Research Hypothesis ……………………………………… 7
Research Questions ……………………………………………. 7
Significance of the Study ……………………………………… 8
Limitations …………………………………………………….. 8
Outline of the Thesis …………………………………………... 9
2. REVIEW OF LITERATURE …………………………….............. 10
Overview of the Chapter ………………………………………. 10
The Study Perspective: Learning Intervention ………………... 11
Assessment of Learning and ECB Measurement ……………... 12
The Emergence of ECB ……………………………………….. 14
ECB Definitions ……………………………………………….. 17
Evaluation Approaches and ECB……………………………… 22
Program Theory and ECB ……………………………………... 24
Measurement in Evaluation and ECB …………………………. 37
Measurement in ECB ………………………………………….. 41
Knowledge Gaps: The Case for Investigating ECB
Measurement Practice ………………………………………….
46
vi
3. RESEARCH DESIGN …………………………………………… 48
Overview of the Chapter ………………………………………. 48
The Problem and Research Questions ………………………… 48
Research Design ………………………………………………. 49
Conceptual Framework of the Study ………………………….. 55
Research Instrument Development ……………………………. 59
Sources of Potential Error ……………………………………... 63
Data Management ……………………………………………... 64
Statistical Analysis …………………………………………….. 65
Role of the Researcher ………………………………………… 67
Ethical Concerns ………………………………………………. 68
Conclusion …………………………………………………….. 68
4. RESULTS AND ANALYSIS …………………………………….. 69
Overview of the Chapter ………………………………………. 69
The Sample Profile ……………………………………………. 71
ECB Contextual Profile ……………………………………….. 77
Research Question 1: How can ECB measurement practice be
described from empirical evidence? …………………………
84
Research Question 1A: What are the content and
implementation approaches of ECBs found in published ECB
reports? …………………………………………………………
84
Answer to Research Question 1A ……………………………... 100
Research Question 1B: What is the rigor of measurement
practice in published ECB reports? ……………………………
101
Answer to Research Question 1B ……………………………... 108
Research Question 1C: What determines practice of measuring
ECB outcomes? ………………………………………………...
109
Answer to Research Question 1C ……………………………... 115
Answer to Research Question 1 ……………………………….. 115
Research Question 2 …………………………………………... 117
Item Response Theory (IRT) Analysis ……………………… 118
What is being measured in ECB? ……………………………... 124
Factor Analysis: Multidimensional Assumption ……………… 131
vii
ECB Content and Decision to Measure ……………………….. 140
Answer to Research Question 2 ……………………………….. 141
Chapter Conclusion ……………………………………………. 142
5. SYNTHESIS AND CONCLUSION ……………………………... 144
Overview of the Chapter ………………………………………. 144
Contribution of the Study ……………………………………... 144
Limitations of the Study ………………………………………. 152
Future Research Directions ……………………………………. 152
Conclusion …………………………………………………….. 153
REFERENCES ………………………………………………………... 154
APPENDICES…………………………………………………………. 159
viii
List of Tables
TABLE Page
2.1 Conceptual Components of ECB Definitions …………………… 20
2.2 Participation-oriented Evaluation Approaches ………………….. 23
2.3 Three-Component Framework for ECB ………………………… 31
2.4 ECB Assessment Instruments …………………………………… 42
3.1 Information Sources Search ……………………………………... 52
3.2 Evaluation Capacity Building Content, Implementation and
Context Variables ………………………………………………..
58
3.3 Developmental Model Proficiency as Applied to Rigor of ECB
Measurement Instrument ………………………………………...
62
4.1 Case Sample of this Study and the Labin et al. (2012) Sample ….. 72
4.2 Journals that Published ECB Case Reports in the Sample ………. 73
4.3 Countries where ECB Case Reports were Conducted …………… 76
4.4 Distribution of ECB Domain …………………………………….. 78
4.5 Type of Organization …………………………………………….. 78
4.6 Type of Program Delivered ……………………………………… 79
4.7 Number of Organizations in an ECB Activity …………………… 80
4.8 Number of Programs in an ECB Activity ………………………... 81
4.9 Number of Program Sites ………………………………………... 81
4.10 Affiliation of ECB Facilitators …………………………………... 82
4.11 ECB Case Report Methodological Paradigm ……………………. 83
4.12 Intended Target of ECB …………………………………………. 93
4.13 Participant Focus of ECB ………………………………………... 94
4.14 Type of ECB Teaching Strategies ……………………………….. 94
4.15 Mode of Strategies Reported …………………………………….. 98
4.16 ECB Contact Duration …………………………………………… 99
4.17 Rigor of ECB Measurement Practice …………………………….. 103
4.18 Rigor of ECB Measurement Practice…………………………….. 108
4.19 Simple Logistic Regression Analysis: Publication Profile and
Decision to Measure ……………………………………………..
111
4.20 Simple Logistic Regression Analysis: ECB Context Profile and
ix
Decision to Measure ……………………………………………... 112
4.21 Simple Logistic Regression Analysis: Implementation and
Decision to Measure ……………………………………………...
113
4.22 Topic Number List and Levels of Developmental Proficiency ….. 122
4.23 Item Mean Square Fit Statistics ………………………………….. 124
4.24 Presence-Absence Matrix of Reported ECB Outcomes with
Reference to ECB Content ………………………………………..
127
4.25 Guttman Ordering of Reported ECB Outcomes with Reference to
ECB Content ……………………………………………………...
129
4.26 Some Process and Outcome Areas Measured in Reported ECBs .. 130
4.27 Structure Matrix of the ECB Content Using Maximum
Likelihood Method of Extraction and Direct Oblimin Rotation…
133
4.28 Factor Correlation Matrix ………………………………………... 134
4.29 Sub-domain Groupings for ECB Context and IRT Hierarchy
Classification ……………………………………………………..
137
4.30 Simple Logistic Regression Analysis: ECB Content and Decision
to Measure ………………………………………………………...
140
x
List of Figures
FIGURE Page
2.1 A Five-Step Approach to Developmental Assessment, Learning
and Teaching ……………………………………………………..
14
2.2 Evaluation Timeline: Development and Institutions …………….. 16
2.3 Integrated Evaluation Capacity Building Model ………………… 25
2.4 Multidisciplinary Model of ECB ………………………………… 30
2.5 Logic Model for ECB Theory of Change ………………………... 32
2.6 Model for Measuring Evaluation Capacity ………………………. 34
2.7 Evaluation Capacity Index ……………………………………….. 35
3.1 Multistage Selection Process …………………………………….. 53
3.2 Analysis Diagram and Research Questions Map ………………… 57
4.1 Analysis Diagram and Research Questions Map ………………… 70
4.2 Timeline and Distribution of Published ECB Case Reports ……... 74
4.3 Venn-Diagram of the Methodological Paradigms of ECB Reports 83
4.4 ECB Content Targeting Individual Level Capacity (N = 63) ……. 87
4.5 ECB Content Targeting Organizational Level Capacity (N=63) … 88
4.6 Venn-Diagram of the Capacity Change Target of ECB Reports … 89
4.7 Integrated Evaluation Capacity Building Model ………………… 92
4.8 Pairwise Combination of ECB Strategies ………………………... 96
4.9 ECB Outcomes Measurement ……………………………………. 101
4.10 ECB Teaching Strategies ………………………………………… 114
4.11 ECB Cases and ECB Developmental Proficiency ……………….. 120
4.12 ECB Outcomes Measurement …………………………………… 125
4.13 Scree Plot for ECB Content ……………………………………... 132
4.14 ECB Content Sub-Domains Frequencies ………………………... 139
xi
List of Appendices
APPENDIX Page
Appendix A List of Published ECB Cases in the Study Sample 158
Appendix B Coding Form 166
1
CHAPTER 1
INTRODUCTION
Evaluation Capacity Building (ECB) seeks to enable individuals and
organizations to adopt the concepts and practices of evaluation. Its purpose is to
mainstream the generation and utilization of evaluation information within
organizational systems and structures. ECBs are mostly aimed at improving
organizational accountability, organizational learning, and program outcomes.
Moreover, there is now a collective understanding that ECB is an intentional process
and that the ultimate goal is sustainable evaluation within the organization (Labin,
2008; Stockdill, Baizerman, & Compton, 2002).
This study takes the perspective that ECB is a designed intervention that
targets improvement of evaluation capacities of individuals or organizations. This
idea has long been supported by practitioners of the field, and there are many logic
models for ECB that have been developed by evaluation theorists and practitioners.
Also, there have been several attempts to synthesize these ECB models to attain a
unified understanding of the whole logic of ECB (Labin, 2014; Labin, Duffy, Meyers,
Wandersman, & Lesesne, 2012; Milstein, Chapel, Wetterhall, & Cotton, 2002;
Preskill & Boyle, 2008; Taylor-Powell & Boyd, 2008; Taylor-Ritzler, Suarez-
Balcazar, Garcia-Iriarte, Henry, & Balcazar, 2013). From this perspective it follows
that evaluation of ECB is essentially a notion of program evaluation that can be
embedded in ECB designs.
Furthermore, the nature of ECB as a designed intervention may be
considered as a learning intervention within organizations. This is a crucial
assumption if ECB is to be considered an evaluand itself, the object of an evaluation.
2
This could mean that ECB may be considered to have close similarity to learning
interventions in the education setting, although differing in many ways according to
context and purpose. For example, in organizations, adult learning can be achieved by
direct or indirect training which is mostly conducted in actual workplace settings.
When ECB is viewed as a learning intervention, three possibilities may be
explored. First, it enables an investigation of ECB through the lens of learning and
measurement theories that could be borrowed from the education discipline. The field
of program evaluation is accustomed to the idea of eclectic approaches and to drawing
from multidisciplinary approaches. Second, it warrants the necessity to examine the
content of ECB whether it has a single unifying construct. That is, to verify whether
the practice of ECB holds together as an entity itself called ECB. Third, it calls for an
investigation of how measures are carried out with respect to evaluating the effects of
the learning intervention. This means that learning outcomes are central to ECB
measurements.
It is an issue as to whether these ECB learning outcomes occur individually
or collectively, whether they affect individual behaviour or collective practice, or
whether they influence systems and structures. These ECB learning outcomes could
be examined if they follow a structured developmental learning progression from the
ECB learning content being delivered. Learning progression refers to the structure
that learners build as they progress towards mastery of the knowledge and skills
needed for evaluation capacity, as in the case of ECB. Once learning progression is
identified, the implications would be important in ECB practice because it could mean
purposeful sequencing of teaching and learning expectations across multiple
developmental stages. Hence, it is possible that this notion of developmental stages
holds in building evaluation capacities and needs to be investigated.
3
While it is possible that improved evaluation capacities among individual
and organizations manifest themselves in improved program delivery, and, by
extension, program outcomes, an organization‘s program outcomes are a different
evaluand with different sets of intervening factors compared with ECB learning
outcomes as an evaluand. A new understanding of ECB measurement and evaluation
may be found by being clear about what the ECB learning outcomes really are. This
could be the necessary preliminary step in understanding measurements in ECB.
Several implications come to mind when ECB is considered as a learning
intervention. First, it implies that ECB is not simply perceived as a mere
demonstration of evaluation skills or approaches to conducting evaluation, but it can
be viewed as programmatic intervention, following a designed logic. Often,
evaluators teach evaluation to organizations when there is opportunity, and to some
extent persuade these organizations to make evaluation a way of life. This is what can
be called a positively opportunistic ECB, as opposed to a more formal ECB that
perceives it as a serious intervention that requires accountability to measure learning
outcomes.
Second, viewing ECB as a learning intervention could clearly demand
cooperation of the stakeholders engaged in ECB. Because of their significant roles in
the organizational system, it is important to consider target stakeholders who
undertake these learning interventions. Clinton (2014) argued that the primary reason
for doing ECB is how the stakeholders – their willingness and readiness to adopt
evaluation – mediate the true impact of evaluation to organizations. This proposition
claims that ECB is a deliberate intervention to organizations that are affected by
stakeholders‘ decisions.
4
Lastly, to consider ECB as learning intervention implies that there is an
associated program theory that would lend itself to the rigors of program evaluation.
A program theory may identify with an implicit auxiliary measurement theory, which
is necessary to establish methodological rigor for any program evaluation initiative.
Several authors have suggested that for a program theory to be operational for
evaluative investigation, an accompanying measurement theory needs to be in place to
allow examination of its methodological rigor. (Blalock, 1979, 1982; Braverman,
2013; Braverman & Arnold, 2008). Thus, investigation of the measurement practice
of an ECB is a necessary preliminary exercise to ascertain how ECBs fare with
respect to methodological rigor as a programmatic intervention and determine
whether practitioners understand what ECB measurement practice is really about.
Hence, the motivation of this study stems from the perspective that ECB is a
learning intervention. The implications of this perspective could possibly provide new
ways of thinking about ECB measurement and evaluation. An examination of
empirical data through this lens may yield useful results to inform ECB practice.
Statement of the Problem
It seems surprising to find that ECB practitioners pay little attention to
measuring ECB outcomes. A research synthesis of ECB literature has documented
―very limited reporting of measures and quantitative data… for a field embedded in
evaluation and populated by evaluators.‖ (Labin, et al., 2012). In addition, the
reported evaluations of ECB are commonly carried out in qualitative narratives and
use anecdotal evidence. This is not to say that qualitative reports are inferior.
However, there are higher expectations of quantitative evidence-based claims if
5
evaluation practitioners are to convince organizations to mainstream evaluation – that
is to make evaluation part of the regular routine in an organization. While resting on
the assumption that practitioners in the evaluation field are mostly accustomed to
measurement, the phenomenon of low measurement of ECB outcomes in practice is
sufficient to warrant this investigation. This study aims to investigate the
measurement practice in evaluation capacity building. This is important because
through an understanding of the existing measurement practice it could be revealed
how practitioners understand the content structure, delivery, and evaluation of ECBs.
The lack of attention to measures, and ultimately to evaluations of ECB, may
not necessarily imply lack of skills and competencies on the part of ECB practitioners.
Evaluators – who are most likely to be the consultants, resource persons and trainers
for ECBs – are accustomed to measurement principles and methodologies. The
evaluation profession demands methodological rigor for the evaluations the
practitioners perform and so it is permissible to assume that evaluators are familiar
with the critical role of measurements in evaluation. Perhaps one plausible
explanation for this lack of attention to ECB evaluation is Braverman‘s (2013) notion
of the trade-off between measurement rigor and feasibility of measurement
implementation. However, what levels of measurement rigor occur, how ECB
measurements are carried out in practice, and how much measurement is conducted
all remain to be investigated in the empirical field.
Purpose of the Study
This research looks at the broader evaluation practice of Evaluation Capacity
Building (ECB) in published ECB reports. It attempts to examine two key
6
components of ECB evaluation from the learning intervention perspective. It looks at
the ECB measurement practice as a way to document and determine whether ECB has
a verifiable content construct, and possibly a structure of developmental proficiency
and how ECB outcomes measurement are supposed to be carried out.
Aims of the Study
This study hopes to achieve two major aims. First, it aims to document and
describe the measurement practice that occurred in ECB initiatives as reported in
published ECBs. Second, it seeks to investigate whether empirical evidence supports
the notion of ECB developmental proficiency that follows from the learning
intervention perspective of ECB. To achieve these primary aims, the following
detailed objectives are to be carried out:
Describe context, implementation, and content of ECB initiatives;
Describe the ECB measurement practice with respect to what is being
measured, the rigor of measurement and how much measurement is
undertaken;
Determine what influences the decision to measure ECB outcomes in practice;
and
Investigate whether ECB content delivered follows a unified learning
construct, and possibly a progressive structure, and whether outcomes
measured demonstrate this thinking in practice.
7
The Research Hypothesis
The premise of the research is the view that ECB is a learning intervention,
then from this perspective, the ECB content delivered in ECB activities could be the
focus of ECB outcomes measurement. Formally, the research hypothesis is stated as
follows:
Evaluation capacity building as a learning intervention would call for a
progressive approach to content delivery and outcomes measurement.
Research Questions
The main questions for this research are:
Research Question 1:
How can ECB measurement practice be described from empirical evidence?
Research Question 2:
Is there evidence to demonstrate that ECB content exhibits a unified
learning construct and possibly a progressive structure?
These questions may be broken down into the following sub-questions:
Research Question 1:
What are the contexts, implementation approaches, and content of
ECBs delivered in published ECB reports?
What is the rigor of measurement practice in published ECB reports?
What determines practice of measuring ECB outcomes?
8
Research Question 2:
Does ECB content demonstrate a unified construct and progressive
structure?
Does ECB content group together in specific ways?
Significance of the Study
This study hopes to contribute to the body of knowledge in evaluation
teaching. The characterization of ECB measurement practice as well as understanding
the nature of its relationship with ECB content, implementation and contextual factors
may provide answers to the problem of low response to ECB evaluation. Findings of
this investigation may provide alternative ways of looking at how practitioners
conceptualize, deliver and measure ECB.
Limitations
This study is limited to completed and published ECB reports. Published
ECB reports provide a feasible opportunity to examine how measurement practices in
ECB were carried out from a range of organizational contexts that would be otherwise
impossible to conduct individually on ECB initiatives in situ. This means that
conclusions from this study are limited to the population represented by the sample. In
addition, published ECB reports were aimed at different audiences and not for the
purpose of reporting measurement practices. This means that information may not be
complete or readily extracted from the report. This possible source of bias will be
minimized by establishing clear inclusion criteria for sample selection. A coding and
assessment instrument will be developed for data gathering consistency.
9
Outline of the Thesis
The thesis is outlined as follows. Chapters 1 to 3 set the scene of the study.
The first chapter established the rationale and identified the central question that the
study will attempt to answer. The literature on ECB is explored in Chapter 2 to
identify the existing understandings of ECB and to identify to factors already
established. A particular emphasis is placed on the emergence of ECB in the field of
evaluation, the influence of evaluation approaches and program theory thinking on
ECB, the state of ECB measurement studies and the case for the need to examine
measurement practices in ECB. The introductory part of the thesis ends in Chapter 3
with a detailed overview of the research design including its conceptual framework,
theoretical underpinnings and methodology used. Also, this chapter includes detailed
descriptions of the sample selection and inclusion criteria and the analysis tools that
are used to answer the research questions. Chapter 4 deals with the findings of the
study. Chapter 5 provides the synthesis and conclusion of the study. The synthesis of
the findings elaborates the significance of the study results in relation to the concepts
and ideas presented in the literature review. The conclusion summarizes the findings
of the study and makes the case for the research contribution. This is concluded by
some suggestions for further research studies and opportunities to apply the
recommendations from the findings of the study.
10
CHAPTER 2
REVIEW OF LITERATURE
Overview of the Chapter
This chapter has three objectives. First, it aims to locate the research topic in
the broad field of evaluation by identifying its position particularly in the areas of
evaluation capacity building (ECB) and measurement in evaluation. Second, it
presents the perspective through which the study investigates the problem and the
possible literature gaps it addresses through this lens. Lastly, it highlights why ECB
measurement practice has to be examined and demonstrates what this exercise can
reveal to further the development of ECB practice.
The first section presents the perspective this study adopts to frame the
concepts and ideas that existing literature may provide regarding ECB. Although the
approach to this study is through the quantitative method and positivist view, it is
recognized that the framing of the research questions to some degree has subjective
bias with respect to views and beliefs about the nature of ECB. This explication of the
research perspective is intended to provide a better understanding of the significance
of this study. The following sections begin with a brief narrative about the emergence
of this branch of evaluation practice and then provide an examination of the
definitions and conceptualizations of ECB. This is followed by a survey of the
dominant ideas of ECB approaches and models. Existing studies and issues on ECB
measurement are presented, and the chapter concludes by identifying the possible
gaps in ECB measurement literature.
11
The Study Perspective: Learning Intervention
This investigation is based on the view that ECB is a learning intervention.
This is the lens through which this study is conducted. The first component of this
perspective is the idea of ―learning‖. At the very essential level, it means ECB
initiatives can be perceived as analogous to the teaching-learning situation in adult
professional learning. Although ECB in organizations is more complex than the
picture of a classroom learning setting, the analogy could simplify and drive the point
that ECB as a learning intervention could provide a different understanding of ECB.
The classroom parallels ECB initiatives in several characteristics. The students
correspond to adult learners, mostly professionals who are key players or stakeholders
of the organization. The classroom learning environment, - which includes all factors
that enable or hinder learning, corresponds to an organizational environment that
could enable or hinder the development of evaluation capacities across individuals or
organizations. The classroom management systems, structures and rules also parallel
those of many organizations. Most importantly, the classroom learning content which
can be defined and structured as the foundation for learning assessment and
evaluation, equates with the knowledge and skills and abilities that organizational
training aims to develop.
However, this classroom-organization analogy for ECB diverges particularly
on the fact that organizations are expected to perform collective actions through
processes and systems that cannot be performed by individuals. Classroom learning
settings are often focused on individual learning, but organizational learning involves
collective and collaborative processes and systems.
For an ECB, whether the focus is on individual, team or organizational
learning, the teaching-learning processes need investigation. Theories of learning and
12
assessment may help reveal how ECBs could work, for example, by examining the
content material, the learning activities, and the way learning is assessed. Keeping this
analogy in mind while recognizing the fact that organizations are more complex than
classroom learning situations, this study focuses on drawing and integrating concepts
from educational and organizational paradigms in the hope of contributing to a deeper
understanding of ECB practice.
The second component of the perspective is the concept of ―intervention‖.
Intervention in this study suggests the idea of an intentional program design. This
means that ECB has implicit program theory with the basic structure of inputs-
activities-outputs-outcomes components. Thus, the assumption of ECB as a ―learning
intervention‖ is to recognize the view that ECB is both a teaching-learning process in
the area of educational theory as well as a programmatic intervention in the area of
program evaluation.
Assessment of Learning and ECB Measurement
Some concepts and approaches from educational measurement could be
applied to ECB measurement. The two prominent ideas from educational
measurement that appear to be useful with respect to ECB measurement are: (1)
developmental constructs of learning; and (2) developmental approach to assessment.
In learning intervention settings, developmental constructs are formulations
of the steps or stages of increasing competence. It is important that practitioners use
these stages to think developmentally about the intervention to support learning. The
importance of developmental constructs is that they can provide a basis for identifying
the Zone of Proximal Developmental (ZPD) of the learners, that is, the position in the
learning progression a learner is ready to learn (Griffin, 2007). Once the learner‘s
13
ZPD is identified, this information can be used to plan and monitor the teaching-
learning intervention. Developmental framework theories include Krathwohl‘s
Affective Domain, Bloom‘s Taxonomy and Dreyfus‘ Model of Skill Acquisition
("Assessment and learning partnerships: A short course for school leaders," 2012).
Thus, in ECB, the idea of content progression may also be considered. This is the
possible structuring of ECB content topics in developmental progression as a
reference for ECB implementation and measurement. This ECB developmental
progression will be referred to in this study as ECB developmental proficiency.
The developmental approach to assessment followed from the ideas of
developmental construct of learning and measurement theories of Rasch (1960,1980)
and Glaser (1963) as cited by Griffin (2007). This approach proposed that once a
developmental progression to learning is identified, it can be used for assessment that
could subsequently be used as a starting point for learning and the beginning of
change. This is explicated by the Five-Step Approach to Developmental Assessment
proposed by Griffin (2007) and shown in Figure 2.1. With regards to ECB, the key
point to be made from Griffin‘s developmental assessment framework is that ECB
measurement can only find meaning when it can be interpreted as a performance level
on the development progression.
These educational theories to learning and measurement provide a fresh look
at ECB as a learning intervention. It provides a basis for the need to examine ECB
measurement practice. The following questions could be asked: (1) Does ECB
measurement consider the idea of ECB as a developmental learning construct? (2) Is
there evidence to show that ECB is a unified learning construct that demonstrates
progressive structure? (3) Can the idea of developmental proficiency be applied to
14
ECB? Answers to these questions could perhaps redefine how ECB content,
implementation and measurement should be approached.
Figure 2.1 A Five-Step Approach to Developmental Assessment, Learning
and Teaching (Griffin, 2007)
The Emergence of ECB
An awareness of the emergence of ECB in the historical timeline of the
evaluation discipline could provide some background to the rise of the need for ECB.
The description of the continuing rise of evaluation discipline may help position ECB
in relation to the development of ideas in the field. This provides a backdrop for
conceptualizations and definitions of ECB that currently exist in the literature.
Many authors recognize that ECB has been a practice for some time
(Compton, Baizerman, & Stockdill, 2002; Milstein, et al., 2002; Preskill, 2008);
otherwise the idea of evaluation would not have developed into a distinct discipline as
it is today. ECB is recognized as a process ―long practiced but only recently named,
illuminated and explicated‖ (Compton, et al., 2002). Perhaps the most prominent
15
events that have become significant turning points for the highlighting of ECB were
the 2000 American Evaluation Association conference theme ―Evaluation Capacity
Building‖(Leviton, 2001) and the 2001 AEA conference theme ―Mainstreaming
Evaluation‖ (Sanders, 2002). The first decade of the 21st century could be considered
as an important evolutionary stage in the evaluation profession in that this was the
time when evaluators and organization leaders became interested in ECB along with
the wide acceptance of participatory, collaborative and stakeholder forms of
evaluation (Preskill & Boyle, 2008).
The development and institutionalization of evaluation, as a distinct field, is
fairly young (Brisolara, 1998; Preskill & Russ-Eft, 2005; Stufflebeam & Shinkfield,
2007). A sketch of the evaluation historical timeline, shown in Figure 2.2, gives an
indication of this relatively young discipline, roughly less than a hundred years. The
location of ECB in the timeline as an emerging area of the field of evaluation falls
within the most recent decade, and it took over ten years for ECB to be
conceptualized in the shape of a program theory. It was in this latter period of the
timeline that ECB practitioners became serious about ECB measurements and began
to think about evidence-based ECB outcomes.
Observing the events that unfolded prior to the emergence or, more
appropriately, emphasis of ECB, it can be seen that major conceptualizations and
approaches in evaluation have already taken ground. For example, the ideas of
participatory approaches to evaluation, utilization-focused evaluation and program
theory had already been established prior to the emphasis of teaching and
mainstreaming of evaluation during the AEA conferences. There is no evidence to
suggest that ECB has developed because the concepts of evaluation have matured.
16
Figure 2.2 Evaluation Timeline: Development and Institutions
1950 1960 1970 1980 1990 2000 2010 2020
Evaluation as
Educational
Assessment
(1950s)
Publication
of
Evaluation
Journals
(1970s)
Scriven’s
Meta-
evaluation
(1975)
Experimental and Quasi-
experimental Approaches to
Evaluation (1970s to 1980s)
Evaluation Network
and Evaluation
Research Society
(1976)
American
Evaluation
Association
(1985)
Program
Evaluation
Standards
(1994)
AEA’s Evaluation
Capacity Building
(2000)
AEA’s
Mainstreaming
Evaluation
(2001)
Wholey’s
Program
Theory
(1987)
Participatory Evaluation,
Utilization-Focused Evaluation,
Transformative Evaluation
(1980s to 1990s)
Integrated
Evaluation
Capacity
Building
Model (2008)
Scriven’s
Formative and
Summative
Evaluation
(1967) Scriven’s
Evaluation as
Alpha
Discipline (?)
(2013)
National
Evaluation
Societies
(1990s to
Present)
17
Perhaps the rise of ECB could be attributed to the socio-political demands for good
governance in the form of accountability and evidence-based social interventions in
which evaluation has taken a vital role, as well a rise in the need to understand and
develop the skills to carry out evaluations (Chouinard, 2013).
The forerunners of the evaluation profession succeeded the
institutionalization of evaluation, for example, in the form of national evaluation
organizations. The notion of evaluation has expanded from the confines of academic
institutions and federal requirements. This happened first in the United States of
America, and subsequently the rest of the world. Preskill (2008) has provided a
vision for what could be imagined as the future of evaluation with this emergence and
increasing commitment to ECB. It is a world where evaluation is a ―social epidemic
where individuals, groups, organizations and communities are constantly learning
about and from evaluations… creating a ‗global cascade‘ of evaluative thinking and
practice‖ (Preskill, 2008, p. 127). This vision of the use, influence and impact of
evaluation aligns with Michael Scriven‘s concept delivered at the 2013 Australasian
Evaluation Society conference in Canberra, Australia, that the future evolutionary
status of evaluation is to become an ―alpha discipline‖. To ensure movement towards
this evolutionary goal for evaluation, ECB has to be at the forefront. In this sense, the
advocates of evaluative thinking and practice appear to be heading in the right
direction. Having provided this brief backdrop of the position of ECB in the
evaluation field, the next section describes how ECB is defined.
ECB Definitions
The literature provides a compendium of definitions and conceptualizations
of ECB. Some of these definitions are presented here to provide a scope of
18
conceptualizations of ECB and describe the boundaries in which the study operates.
The prominent ones are the earlier definitions which are almost always referred to by
scholars of the field. Perhaps the most cited definition is from Stockdill, Baizerman
and Compton (2002, p. 14):
ECB is the intentional work to continuously create and sustain overall
organizational processes that make quality evaluation and its uses routine.
This definition views ECB as a systems approach. It considers the whole of
the organization as necessary to set up processes that facilitate routine evaluation
practice. The authors‘ influential work, the creation of evaluation systems for the
American Cancer Society (ACS) (the context for this definition), recognizes the
centrality of the organizational context to support ECB activities. On the other hand,
some scholars view ECB as an approach through which influential individuals can
influence the larger organizational structure to improve performance evaluation:
Evaluation capacity-building within an organization is typically understood as
an exercise in developing the evaluation skills and knowledge of some, or all,
of the organization‘s staff, with a view to increasing their ability to undertake
high-quality evaluations of the organization‘s projects and programs (Beere,
2005, p. 41).
Evaluation capacity building (ECB) is an intentional process to increase
individual motivation, knowledge, and skills, and to enhance a group or
organization‘s ability to conduct or use evaluation (Labin, et al., 2012, p. 308).
Others point to the importance of considering the full spectrum of stakeholder levels
within the organization:
19
ECB involves the design and implementation of teaching and learning
strategies to help individuals, groups, and organizations, learn about what
constitutes effective, useful, and professional evaluation practice. The ultimate
goal of ECB is sustainable evaluation practice—where members continuously
ask questions that matter, collect, analyze, and interpret data, and use
evaluation findings for decision-making and action. For evaluation practice to
be sustained, participants must be provided with leadership support,
incentives, resources, and opportunities to transfer their learning about
evaluation to their everyday work. Sustainable evaluation practice also
requires the development of systems, processes, policies, and plans that help
embed evaluation work into the way the organization accomplishes its mission
and strategic goals (Preskill & Boyle, 2008, p. 444).
The preceding definitions of ECB have commonalities and difference. Table
2.1 shows a comparison of the conceptual components found in these definitions. The
most common is the concept that ECB refers to improvement of evaluation capacities
through the development of individual knowledge and skills. This suggests that ECB
is seen as a program level intervention – that is, increasing the evaluation capacity of
individuals. Some definitions fall within this interpretation. Other authors view ECB
as a form of broader organizational practice. For example, Stockdill et al. (2002) and
Preskill and Boyle (2008) defined ECB with sustained or routine evaluation practice
within organizational processes.
From these definitions, it can be noted that there is no current agreement in
the field about what really constitutes ECB. However, Preskill and Boyle‘s (2008)
definition appears to present the most explicit and comprehensive definition (Table
2.1), one that is attuned to the ―learning intervention‖ perspective of this study. It
recognizes explicitly the component of ―teaching and learning strategies‖ of an ECB.
20
Table 2.1 Conceptual Components of ECB Definitions
Concepts
Reference
Stockdill,
Baizerman &
Compton
(2002)
Beere (2005) Labin, Duffy,
Myers,
Wandersman
& Lesesne
(2012)
Preskill and
Boyle (2008)
Intentional
work
Organizational
processes
Evaluation
quality
Sustained or
routine
Developing
knowledge,
skills or
motivation
Improved
ability or
transfer of
skills
Improved
organizational
programs
Individual
level
Group or team
level
Organization
level
Other definitions only imply the teaching-learning aspect of ECB by using
the terms ―develop‖ or ―increase‖ evaluation capacities. This definition is selected to
set the conceptual basis of ECB in this study. It not only affirms the premise of this
study, but also includes the ideas of: (1) ECB as intentionally designed intervention;
21
(2) considering both the program and organizational levels; and (3) the target
outcomes of ECB from individual to organizational capacities.
As ECB literature has grown, more scholars have continued to debate and
examine the different facets of ECB. Taut (2007), for example, emphasized that ECB
is ―not an area where a blue print approach could work‖. She proposed the idea that
no single definition can be comprehensive for ECB and that evaluation capacity is
situational and ever-changing in accordance with the local contexts.
Some have maintained the view that ECB is primarily about sustainability
of evaluation practice. McDonald, Roger and Kefford (2003), in their work ―Teaching
people to fish: Building the evaluation capability of public sector organizations‖,
contend that evaluation ‗capability‘ is a more appropriate term to use instead of
evaluation ‗capacity‘, since evaluation capability aims to provide ―enduring
organizational benefits, including a sustaining resource for producing evaluation as
well as a system for encouraging and using evaluation‖ (McDonald, et al., 2003, p.
10). While McDonald and colleagues (2003) associate the term ―capability‖ to
sustainability with respect to evaluation production and utility, some evaluators use
the terms interchangeably (Adams & Dickinson, 2010; McGeary, 2009).
This section concludes with the thought that although there are differences in
the conceptualizations of ECB from the definitions that have been explored, the
Preskill and Boyle (2008) definition appears to be the most explicit and
comprehensive in terms of the ECB components. The definition includes components
that are missing from the other sampled definitions. Although there is no current
agreement in the field about which definition to subscribe for the practice of ECB,
this study is positioned from Preskill and Boyle‘s (2008) definition. The primary
reason is that their definition grounds this study in the perspective of ECB as a
22
learning intervention. Furthermore, the definition is explicit in terms of teaching-
learning components, establishing a foundation of ECB outcomes as comprising ECB
learning outcomes from the ECB content delivered
Evaluation Approaches and ECB
Perhaps one of the evaluation concepts most associated with ECB is the
school of thought related to ―participant-oriented evaluation‖ (Fitzpatrick, Sanders, &
Worthen, 2011). The term school of thought is used to recognize the current diverse
ideas that participant-oriented or stakeholder-focused approaches introduced to
evaluation theory and practice. The term collectively refers to the following related
concepts: participatory evaluation, collaborative evaluation, empowerment evaluation
and utilization-focused evaluation (Fetterman, Rodriguez-Campos, Wandersman, &
O'Sullivan, 2014). A useful resource of the semantics and distinctions of these
concepts was provided by O‘Sullivan (2012) in Table 2.2. He clarified the four
approaches and related them to the implementation of evaluation and ECB.
Furthermore, these participant-oriented approaches appear in many articles published
on ECB, for example: collaborative evaluation (Arnold, 2006), participatory
evaluation (Atkinson, Wilson, & Avula, 2005; Kuzmin, 2012), empowerment
(Andrews, Motes, Floyd, Flerx, & Fede, 2005; Diaz-Puente, Yague, & Afonso, 2008;
Wandersman, 2014) and utilization-focused evaluation (Compton, Baizerman,
Preskill, Rieker, & Miner, 2001).
It appears that with the influence of these ideas in the evaluation field, they
could also influence the core approaches and concepts of ECB. This leads to a need
for an investigation of the implementation strategies of ECB. This is an issue for ECB
as a program level intervention which aims to increase the capacity of individuals
23
through working on program evaluations. This study addresses this issue by
examining what practitioners actually do as an approach to deliver ECB content. It
examines whether practitioners use a participatory approach to teaching evaluation,
direct training or a combination of both approaches.
Table 2.2 Participant-oriented Evaluation Approaches (O'Sullivan, 2012)
Aspects of
Evaluation
Collaborative
Evaluation
Participatory
Evaluation
Empowerment
Evaluation
Utilization-
focused
Evaluation
1. Primary evaluation focus
Promote participation
throughout
Engage some stakeholders
Stakeholders use evaluation tools
to achieve results
Promote the use of evaluation
findings
2. Evaluation decision-making
Negotiated Evaluator and participants
Participants Negotiated
3. Stakeholder roles
Clients, partners,
assistants, data sources
Clients, data sources
In charge of or partners
Key stakeholders
collaborate
4. Evaluator roles Team leader, collaborator
From participant observer to team
leader
Guide, facilitator,
critical friend
Active-reactive-interactive-
adaptive
5. Pre-evaluation
clarification activities
Probe program
purposes and resources
Unknown Addressed in
conduct of evaluation
Extensive
6. Design As rigorous as possible
Varies with evaluator role
Participant-centered
As rigorous as appropriate
7. Types of data
collection
Quantitative
and qualitative
Quantitative and
qualitative
Quantitative and
qualitative
Quantitative and
qualitative
8. Types of data reporting
As agreed upon Unknown Process, results, outcomes
On-going data as available
9. Evaluation Capacity
Building
Yes Unknown Yes Yes
10. Cultural
responsiveness
Yes Unknown Yes Yes
11. Systems or networking considerations
Yes Unknown Yes Yes
12. Implementation-
stakeholders as:
Instrument developers
Yes No Yes No
Data collectors Yes No Yes No
Data analyzers Yes No Yes No
Data interpreter Yes Yes Yes Yes
Data reporter Yes No Yes No
24
Program Theory and ECB
Interventions such as ECB can be perceived as a designed program. A
program is a set of resources and activities directed toward one goal or common
goals. A program theory identifies program resources, program activities, and
intended program outcomes, and specifies a chain of causal assumptions linking
program resources, activities, intermediate outcomes and ultimate goals (Wholey,
1987). The key in this definition is the identification of the components of a program
theory. These are the resources, activities and outcomes. Another important aspect of
this definition is the emphasis on causality in the chain of assumptions. Wholey
(1987) put it this way in ordinary language: ―If the following program resources are
available, then the following program activities will be undertaken… If these program
activities occur, then the following program outcomes will be produced… If these
activities and outcomes occur, then progress will be made toward the following
program goals‖ (Wholey, 1987, pp. 78-79). This understanding of the concept of
program theory is important in understanding the nature of ECB conceptualizations
through models and frameworks.
Integrative Evaluation Capacity Building Model
Program theory thinking pervades ECB thinking. The Integrative Evaluation
Capacity Building (IECB) model (Labin, 2014; Labin, et al., 2012; Leviton, 2014)
that has been widely circulated in the evaluation community has been modelled on
program theory. For example, this model has the structure of a program theory having
the basic components of inputs, activities and outcomes. The merit of IECB is that it
is empirically grounded and was established and updated using a synthesis approach
25
Figure 2.3 Integrated Evaluation Capacity Building Model (Labin, 2014)
26
to investigating published ECB reports. Figure 2.3 shows a diagram of the IECB
(Labin, 2014).
At the heart of the model are the identification of the program theory
components and links between the components Needs/Reasons-Activities/Mediators-
Outcomes. This is a clear adaptation of the program theory approach to understanding
and defining the concept of ECB. There are two key features of the IECB model
worth emphasizing other than the identification of the components and the assumed
links between these components. First, there is an implicit assumption that the IECB
model assumes the divide between individual and organizational levels of the
intervention. Although it is not clear how individual and organizational capacities
influence each other, this assumption appears to be widely accepted and often ECB
content delivered is based on this divide (Brown & Reed, 2002; Henry & Mark, 2003;
Taylor-Ritzler, et al., 2013). To an extent, this divide is also extended to several levels
such as individuals, teams or groups, organizational, community and even national
level (Holvoet & Dewachter, 2013; King, 2010; Preskill, 2008). Second, the
organizational program is also embedded in this model. The organization‘s program
goals appear as part of the Needs/Reasons component, the evaluation of the programs
in the Activities component and the program outcomes in the Outcomes component.
This shows the implicit assumption that ECB as an intervention is intertwined with
the interventions the organizations are running. While most of the ultimate goal of
ECBs is improved organizational outcomes, ECB intervention outcomes could be
confused by organizational intervention outcomes. This is very important in terms of
which ECB outcomes to measure. The IECB model is quiet clear in its distinction
between the individual, organizational and program level outcomes. It can be argued
that when ECB is viewed as a learning intervention, then the ECB outcomes that can
27
logically be linked to ECB activities are the individual and organizational outcomes.
The program level outcomes have a different set of context, implementation and
mediating factors. The improved evaluation capacity of individuals and organizations
is only one of the inputs for improved program outcomes. Hence, program outcomes,
while recognized as an ultimate ECB goal is a nested program theory that embedded
ECB program theory. ECB evaluation must be different from organizational program
evaluation.
In summary, the above discussion has shown that IECB is a program theory
that recognizes different possible levels of intervention. An organization‘s
intervention outcomes and ECB outcomes could be mixed up, adding confusion
regarding what to measure for ECB. This explains why there is a need to examine
ECB measurement practice. Documenting and investigating ECB measurement
practice could reveal not only how programmatic concept of ECB actually occurs but
also whether the measurement practice of organizational outcomes and ECB
outcomes are clearly delineated. The investigation findings could provide insights into
what could be done for future ECB measurement practice.
More ECB Models
There are several ECB frameworks and models that have contributed to the
current understanding of ECB. This is not a comprehensive review of the ECB models
but the ones presented here are prominent in the literature search. The intention of the
survey of these ECB models was to identify the range of conceptual ideas that were
published in this emerging area of evaluation discipline. Some of these models
concern program theory or logic models about how to conduct ECB, while others
relate to the assessment of evaluation capacity. These models and frameworks are:
28
Program Theory/Logic Models
General Framework for ECB (Milstein, et al., 2002)
Multidisciplinary Model of ECB (Preskill & Boyle, 2008)
Three-Component Framework and Logic Model for ECB Theory of
Change (Taylor-Powell & Boyd, 2008)
Synthesis Model of Evaluation Capacity (Taylor-Ritzler, et al., 2013)
Assessment of Evaluation Capacity
ECB Supply and Demand Model (Nielsen, Lemire, & Skov, 2011)
Getting to Outcomes (Wandersman, 2014).
General Framework for ECB
The General Framework for ECB reported by Milstein, et al. (2002) is in the
context of the Center for Disease Control and Prevention and Public Health system in
the United States. The framework is essentially a systems approach adhering to the
belief that ECB is about organizational principles, processes and procedures within its
organizational cultural and infrastructure contexts. However, it recognizes that
training in evaluation is a key component:
Evaluation capacity in public health would require a process of culture change,
including significant reforms to their own organizations… should build an
evaluation literate workforce and maintain a cadre of applied evaluation
scientists throughout the agency… These goals are partly accomplished
through training in evaluation, with additional strategies focusing on
leadership and other aspects of organizational infrastructure(Milstein, et al.,
2002, pp. 32-33).
The framework provides the ECB principles and guidelines that are deemed able to
promote program evaluation in the organization. This framework appears to be tied to
29
organizational management with a view that ECB can be embedded in the
organizational system and form part of the organizational operations. The framework
has an implicit goal of mainstreaming evaluation in the organization. Success measure
is defined in terms of functional evaluation systems in the organization.
Multidisciplinary Model of ECB
Preskill and Boyle (2008) proposed a multidisciplinary model of ECB (Figure
2.3). The model draws from the field of evaluation, organizational learning and
change, and adult and workplace learning. Its purpose is to provide a perspective for
understanding cohesion and organization of ECB. The model diagram in Figure 2.4
represents the key aspects of the model. It shows ―Transfer of Learning‖ as a link
between the ECB component and the organizational ―Sustainable Evaluation
Practice‖ component. The ECB component includes the goals of ECB, the motivation,
assumptions and expectations of ECB, the ECB design, and the ECB strategies. This
model identifies the idea that at the core of the ECB component is the teaching-
learning component and the transfer of learning. The model does not distinguish
between individual and organizational levels but, instead, emphasizes the
organizational learning capacity context in which this ECB and sustainable evaluation
practice could thrive and diffuse evaluation learning. This model is similar to the
General Framework for ECB; however it expands on the details in the ECB practice
and the organizational sustainable evaluation practice.
30
Figure 2.4 Multidisciplinary Model of ECB (Preskill & Boyle, 2008)
Three-Component Framework and Logic Model for ECB Theory of Change
Taylor-Powell and Boyd (2008) presented the Three-Component Framework
for ECB (Table 2.3) and a logic model for evaluating ECB (Figure 2.4). This
framework is in the context of complex organizations, specific to the case of state
education extension organization. The three components for ECB are identified as
professional development, resources and support, and organizational environment.
This is a framework where ECB is viewed as a form of professional development,
emphasizing that learning could take place in the workplace or during formal training
in educational institutions. It also recognizes the significance of resource support and
organizational environment in undertaking ECB. The proponents believe that when
these key components and elements are present in an organization, it sets the system
31
for ECB to diffuse in the organization represented by the logic of ECB theory of
change in Figure 2.5.
Table 2.3 Three-Component Framework for ECB (Taylor-Powell & Boyd,
2008)
Component
Elements
Professional development Training
Technical assistance
Collaborative evaluation projects
Mentoring and coaching
Communities of practice
Resources and supports Evaluation and ECB expertise
Evaluation materials
Evaluation champions
Organizational Assets
Financing
Technology
Time
Organizational environment Leadership
Demand
Incentives
Structures
Policies and procedures
In this theory of change, the ECB components are considered as activity
inputs and the outcomes include individual change with respect to the four learning
domains. This shows implicitly the ‗teaching-learning‘ assumption that is required for
professional development, whether through in-house or formal training. The model
also assumes that changes in the team and program could be simultaneous and that
organization change and social betterment are cumulative effects of individual, team
and program improvements.
32
Figure 2.5 Logic Model for ECB Theory of Change (Taylor-Powell & Boyd,
2008)
This model is similar to that of the IECB but emphasizes the idea of
cumulative effects. The ECB component of the model is not as detailed as presented
by the Multidisciplinary model for ECB. As with the first two models presented, these
models did not provide any reference to ECB evaluation or measurement, although
Preskill and Boyle (2008) suggested that it could be expected in practice.
33
Synthesis Model of Evaluation Capacity
The Synthesis Model of Evaluation Capacity described by Taylor-Ritzler, et
al. (2013) is a model produced by systematic review from existing ECB conceptual
models, principles and factors in the context of non-profit organizations. It identifies
individual and organizational factors that are believed to predict evaluation capacity
outcomes. The individual factors are: awareness of the benefits of evaluation;
motivation to conduct evaluation; and competence (knowledge and skills) to engage
in evaluation practices. The organizational factors include: leadership for evaluation;
a learning climate that fosters evaluative thinking; and resources that support
evaluation. The synthesis model also identifies the evaluation capacity outcomes,
these are: mainstreaming evaluation into work processes; and use of evaluation
findings. The model also emphasizes that organizational factors and organizational
learning capacity mediates ECB outcomes. Compared with the other models, this
model is consistent with Preskill and Boyle‘s Multidisciplinary model and also
consistent with the Labin‘s IECB. The contribution of this model includes empirical
validation of the factor relationship in the context of non-profit organizations. This
model has also become the basis of the development of an evaluation capacity
assessment instrument by the same team of ECB scholars.
ECB Supply and Demand Model
One existing ECB model that provides an operational framework for
measurement is that of Nielsen, Lemire and Skov (2011). The idea is modelled from
the supply and demand concept from economics. It identifies developing human
capital, tools and resources as ECB supply while ECB demand comprises of
organizational policies, plans, structures, processes and culture. The main idea is that
34
for any organization to be able to define the scope and objectives for ECB, it should
be able to determine that relative quantitative measures of these two sides of ECB.
This model, shown in Figures 2.6 and 2.7, identifies the components of ECB supply
and demand. It is important to mention that this model assumes that it could cut across
three levels: macro (societal level), meso (organizational level), and micro (individual
level).
Figure 2.6 Model for Measuring Evaluation Capacity (Nielsen, et al., 2011)
35
Figure 2.7 Evaluation Capacity Index (Nielsen, et al., 2011)
This conceptualization is entirely different from the program theory based
models presented earlier. The model provides a way of measuring existing demand
and supply of ECB and could be useful for a needs assessment for ECB teaching and
learning plan. The assumption here is that ECB is not necessarily an organizational
intervention but part of the inherent qualities of the human capital, derived from
evaluation training, experience and the education they receive. The point of this
model is that scores on both supply and demand need to be matched with the
organization demand for sustainable evaluation practice to take place in an
organization. The significant contribution of this line of thinking is that this could
serve as a way to assess the organizational context for setting the scope and objectives
of ECB. This concept could be made operational through a quantitative measurement
approach presented in Figure 2.7.
36
Getting to Outcomes
The final ECB framework included in this survey is the Getting to Outcomes
(GTO) framework proposed by Wandersman (2014). This is the latest framework
following the IECB proposed by Labin and colleagues in 2012. The purpose of GTO
is to provide an operational framework for the IECB model. Grounded in the
principles of empowerment evaluation, this framework addresses the dissatisfaction
that came with evaluations showing a lack of outcomes. The idea of GTO is to
provide key stakeholders of a program initiative with outcomes ―up-front‖. That is, ―if
key stakeholders including program staff had the capacity to use the knowledge and
tools of evaluation to help them plan more systematically, implement with quality, self
evaluate, and use the information for continuous quality improvement, then they
would more likely to achieve outcomes‖. The framework then provides a 10-step
approach to guide ECB practice matched to the science of ECB, as provided by the
IECB model. This operational framework shows that GTO in itself is an ECB
approach targeting outcomes-oriented evaluation planning, implementation and
evaluation. The contribution of this framework includes practical steps to follow for
ECB training.
In conclusion, three key points can be identified from the collective
contributions of these models. First, a range of principles, components, factors and
relationships are thought to be important in ECB. Second, the prominence of the
program theory thinking is common to some ECB models. Lastly, the models all
provide frameworks from which ECB can be evaluated. Perhaps one of the reasons
why IECB stands out among the models is its clear identification with a program
theory. This ensures the possibility of program evaluation which, in turn, means ECB
evaluation is essentially program evaluation. The idea of measuring ECB outcomes
37
takes a central role. The following sections examine the concepts of measurement in
the area of evaluation and ECB.
Measurement in Evaluation and ECB
Measurement in evaluation could refer to several things. It could mean the
process of identifying indicators, setting of standards and development of assessment
tools to assist the process of evidence building in evaluation. It could also refer to the
range of quantitative methodological processes of data collection, management,
analysis and interpretation that would provide some means to answer the evaluation
questions. Within this range, perhaps, the most important factor is providing evidence
that is quantifiable and verifiable. Thus, in this thesis the idea of measurement cannot
be separated from a quantitative paradigm. This section will discuss the major
thoughts relating measurement to evaluation and then to ECB.
In a dialogue on measurement in evaluation, Braverman (2013) proposed a
strong case for measurement in evaluation and brought attention to the realities and
challenges it involves. He argued that the convincing power of evaluation for
evaluation, when used by stakeholders, is only as good as its credibility to all
stakeholders. This view regarding measurement builds on the view of Patton (2002) in
his seminal and influential work on Utilization-Focused Evaluation. Braverman
(2013) argued that utilization of evaluation information only gains credibility when
methodological rigor is established well enough to convince evaluation users. The
most feasible way to attain methodological rigor is to consider the whole gamut of
validity issues of an evaluation activity: measurement decisions, standards for
strengths of evidence, alternative measurement options, measurement requirements
38
and the like. Central to his argument is the critical role that measurement holds in
evaluation and the contextual issues that surround measurement decision-making in
evaluation:
The technical aspects of an evaluation study that are associated with
methodological rigor are directly linked to the quality of evidence that
the study is able to produce… An evaluation‘s measurement-related
planning decisions and implementation activities, that is, it‘s
measurement-related rigor, will influence the quality of evidence
(Braverman, 2013, p. 101).
This position on the significance of measurement is not a shallow
afterthought. Braverman (2013) draws the theoretical underpinning of this claim from
social science theory. The sociologist Hubert Blalock (Blalock, 1979, 1982) made a
case on the relationship between theory and measurement. He noted that social
science theories, whether explicit or not, are accompanied by ―auxiliary measurement
theories‖ that underlie the use of whatever specific measures have been chosen.
Quoting from Blalock (1982) as cited by Braverman (2013):
In short, we must become more attentive to the need for stating explicit
auxiliary measurement theories and for examining comparability of
measurement, just as we must also be concerned about the
generalizability of our substantive theories (p.31).
This case on the necessity of measurement is straightforward: methodological rigor
requires an underlying measurement rigor. The extent to which measurement rigor is
valid and acceptable among intended users determines the quality of methodological
rigor of any evaluation activity.
While arguing for the significance of rigorous measures for evaluation,
Braverman (2013) considered the important issues of feasibility in carrying out these
measurements. He emphasized that evaluators, at the negotiation stage of evaluation
39
planning, need to be upfront with stakeholders about the tradeoffs between the
demands of rigorous measurements and feasibility. This means that the negotiation
stage of planning for evaluation is critical for considering the feasibility-rigor
dynamics of measurement. Measurement rigor is referred to as the quality of
measures with respect to the psychometric properties of the measurement instruments
while feasibility refers to the resources needed (such as time, finances and expertise in
developing and carrying out the measurements). Running measurements is relatively
easy, while developing a valid and reliable measure requires a substantial investment
of resources, including expertise.
This concept of rigor-feasibility dynamics in evaluation measurement
carries an important significance for ECB measurement. This dynamism could readily
answer the question, ―What might prevent rigorous measurement of ECB?‖ A
straightforward answer would be that it depends on the feasibility of measuring given
context at hand. If we assume that rigor-feasibility exists, then the nature and quality
of measurement of completed ECBs in the published reports would be products of this
dynamic. This means that whatever measurement rigor level we observe in the reports
had already been decided by the stakeholders at the time and context of the ECB
initiative. This leads to one of the research questions for this study, ―What is the rigor
of measurement practice in published ECB reports?‖ An answer to this question could
provide information on the status of measurement practice in the field possibly
providing information on the quality of this rigor-feasibility dynamics in the empirical
world. To observe low levels of measurement rigor in practice would then possibly
imply different priorities in the evaluation of ECB other than measurement.
It may appear that the focus of measurement rigor discussed here is limited
to quantitative primary data and types of data that lend to reliability and validity
40
standards. It may also seem that there is no recognition of the contributions of
qualitative evaluation (that is, when methodological rigor is only associated with
measurement rigor). This concern is at the heart of Weitzman and Silver‘s critique
(2013) on Braverman‘s (2013) position in the dialogue. Weitzman and Silver (2013)
challenge evaluation practitioners to a perspective shift. They argue that while they
believe rigorous measurement is ideal, this is seen through the lens of measurement
experts and psychologists where a potential bias towards measurement perfection
appears to be the only primary goal. They further argue that evaluators recognize that
in the other disciplines (where most evaluation demands are made), rigorous
measurements are not as popular as measurement-oriented practitioners think.
Weitzman and Silver (2013) proposed that what are needed are not ‗perfect‘ measures
but ‗good‘ measures that are timely, relevant and good enough for the stakeholders.
In effect, Weitzman and Silver (2013) state that in evaluation measurement,
what matters is what is measured, rather than spending energy and resources on trying
to perfect rigorous measurement approaches. They submit that in evaluation
measurement it would be a good practice to take stock of all available and relevant
data that can be feasibly measured (much like low hanging fruit), and invest ‗thickly‘
in the most important aspects of the program. They are not arguing against rigorous
measurement but for the idea of the rigor-feasibility dynamics. This line of thought is
also addressed by Braverman (2013). He emphasizes the consideration of ―alternative
approaches for generating evidence in support to causal claims‖. In an earlier article,
Braverman and Arnold (2008) emphasized on ―context-dependent decision making in
levels of methodological rigor,‖ and ―the importance of relevance and feasibility‖. In
the light of this exchange about evaluation measurement, the concepts and arguments
41
for approaches to measurement in evaluation also applies to ECB measurement. This
is because ECB measurement is an essential aspect of ECB evaluation.
Looking now at this study on ECB measurement practices, it is imperative to
examine what practitioners in the field are measuring in ECB. Identifying what is
being measured will reveal the scope of variables that are measured in ECB, and
possibly see them against the backdrop of a bigger question of what really matters in
ECB measurement. Being able to identify what matters in ECB is where ECB
practitioners should be investing thickly when it comes to developing measures for
ECB. The next section will provide a survey of the studies that deal with
measurement in ECB.
Measurement in ECB
There are several existing studies on measurement in ECB. These are
reviewed to examine the focus of ECB measurement these studies have carried out.
The studies can be grouped into three categories: (1) those studies that developed
measurement tools for various components or elements of ECB; (2) studies that
critique measurement approaches; and (3) research that validates ECB models.
The group with most studies on measurement in ECB comprises those that
developed measurement tools for various components or elements of ECB. For
example, Taylor-Ritzler et al. (2013) documented studies with existing ECB
assessment instruments. Table 2.4 shows this list of published ECB measurement
tools, updated for the present study. It can be observed that from among the published
ECB measurement tools, the majority deal with organizational evaluation capacity
measurements. Three of the seven ECB assessment instruments measure
organizational evaluation ―readiness‖. Other areas include organizational learning
42
culture, leadership, systems and structures, motivation, organizational contexts and
other relevant ECB variables. The last two entries of Table 2.4 are new additions
published ECB assessment instruments.
Table 2.4 ECB Assessment Instruments
Name of Instrument Author and Year Components Measured by the Instrument
Readiness for
Organizational
Learning and
Evaluation (ROLE)
Preskill and Torres
(2000)
Culture (organizational)
Leadership
Systems and structures
Communication of information
Teams (working as a team)
Assessing Learning
Culture
Botcheva, White, and
Huffman (2002)
Outcome measurement practices
Learning culture
Organizational
readiness for change
(TCU-ORC)
TCU Institute of
Behavioral Research
(2005)
Motivation for change (program needs, training
needs, pressure for change)
Resources
Staff attributes
Organizational climate
Evaluation process use
measure
Taut (Taut, 2007) Evaluation
Section 1: Views of evaluation, decision making,
expectations, sharing knowledge, and learning
culture
Section 2: Opinions and experiences with
evaluation, available resources, internal and
external monitoring and reporting
Section 3: Previous experience with evaluation
A checklist for
building organizational
evaluation capacity
Volkov and King
(2007)
Organizational Context
ECB Structures
Resources
Evaluation and
organizational capacity
Cousins, Goh, Elliot,
and Aubry (2008)
Organizational Learning Capacity
Organizational support systems
Capacity to do evaluation
Specific types of evaluation activities
Stakeholder participation in evaluation
Use of evaluation findings
Use of evaluation process
Conditions mediating evaluation use
Readiness assessment
tool for evaluation
capacity building
Danseco, Halsall, and
Kasparzak (2009)
Experience with evaluation
Leadership and collaboration
Systems and structures
Evaluation practice
Cultural Competence
of Program Evaluators
Dunaway, Morrow and
Porter (2012)
Cultural Competence in Program Evaluation
Systems Evaluation
Protocol
Urban, Burgermaster,
Archibald, and Byrne
(2013)
Quality of evaluation plans and models
The second group of ECB studies are those that critique ECB measurement
approaches. For example, the most common after ECB activity measurement tool
43
used are self-assessments to detect workshop success (D'Eon, Sadownik, Harrison, &
Nation, 2008). Lam (2009) refuted the argument that the measure could work. He
pointed out that, among other things, self-assessments – even with balanced over and
underestimates – remain biased and should not be used to evaluate workshops. He
argued that participants‘ performance should not be attributed directly to training even
if the self-assessments are psychometrically valid and obtained prior to the workshop.
He further cautioned that self-assessment findings should not be generalized to other
situations without further analysis. The study concluded with principles in using
assessments for evaluating training such as ECB. Another example for this group of
ECB measurement literature is Braverman‘s (2013) concern for program evaluations
(possibly a large number relative to practice) that are outside the rigor requirements of
journal editorial boards and professional peer review mechanisms. The study pointed
out that for small scale program settings, rigorous measurement strategies are often
not given attention. He argued that sound evaluation planning requires numerous
decisions about how constructs in a program theory will be translated into measures
and instruments that produce evaluation data. The study concludes with a suggestion
that in making measurement decisions, standards for strength of evidence that a given
measure produces must be established, alternative measurement options weighed and
measurement requirements are carefully communicated with clients and stakeholders.
The third set of ECB studies is comprised of studies that validate ECB models.
Often, evaluation theorists seek empirical data to validate a proposed model of a
phenomenon. The same is true for ECB conceptualizations. Researchers turn to
empirical measurements to validate, refine and test conceptualized ECB models. For
example, in a Danish study of public sectors, Nielsen et al. (2011) proposed the
Supply and Demand model for ECB. Along with this conceptualization was the
44
inclusion of strategy to measure the Evaluation Capacity Index to map Danish public
sector organizations. The study concludes with results that support the validity of the
proposed model. In the proposed Synthesis Model of Evaluation Capacity, Taylor-
Ritzler et al. (2013) turn to developing the Evaluation Capacity Assessment
Instrument (ECAI). The instrument was tested on 169 staff of non-for-profit
organizations. The 68-item measure assessed participants‘ perceptions of individual
and organizational factors predictors of two ECB outcomes, the mainstreaming and
use of evaluation, and demonstrated that the instrument met internal consistency
criteria. The study also concluded that the ECAI validated the synthesis model and its
depiction of relationships between the evaluation capacity predictors and outcomes.
Among this third set of ECB studies is the proposed Integrative Evaluation Capacity
Building (IECB) proposed by Labin et al. in 2012 and was updated in 2014. This team
of researchers turn to quantitative measures to describe and validate the proposed
ECB program theory.
Labin‘s (2014) latest work on ECB, as of this writing, provides a stock-take
of the existing ECB measurement tools documented in the literature. She identifies
the links between these measurement tools and the IECB model. The mapping of
these measurement tools to the constructs and indicators shows convergence of ECB
concepts affirming the validity of the framework and at the same time provided for
the expansion of the details of the model. Labin‘s et al. (2012) synthesis study and the
follow up study (2014) appear to unify the concepts and ideas on ECB using mapping
of these ECB measurement tools. For example, in the Needs/Reasons component of
the ECB framework on Motivation, a measurement tool was identified that provides
specific items that measure the indicator and construct for Motivation. In this case, the
measurement tool used by Botcheva, White and Huffman (2002) addressed the
45
indicator ―Internal (motivation): incentives, rewards, and recognition‖. As to the
overall content of the measurement tools Labin (2012) noted that specifics on training
and technical assistance appear to be more widely included compared with specifics
on leadership and collaborative skills.
Labin‘s (2014) findings have also shown that while the primary reason for
doing ECB is to improve program outcomes among organizations, there are only two
measurement tools that focus on program outcomes. This supports the view that it
could be possible that measurement developers for ECB perceive a distinction
between measures for ECB outcomes and measures for program outcomes. Program
outcomes and ECB outcomes could have different mediating factors and contexts.
ECB outcomes are usually mediated by contexts of the organization while program
outcomes are mediated by contexts of the organization and the environment of the
intervention target.
The findings presented by Labin (2014), in the light of the analysis of the
research problem in this study, establish the need for the first research question of this
study: ―What are the content and implementation approaches of ECBs found in
published ECB reports?‖ There is a need to investigate independently the content and
implementation factors of ECB as practiced. This is important because in teaching-
learning environments, one can examine the competency learned from the content
delivery, and for that matter assess or measure the competency taught on a particular
learning environment. Hence, measures can focus on the things that ECB activities
have delivered and this delineates the difference between ECB outcomes and program
outcomes.
In summary, the studies on measurement in ECB focus on developing ECB
measurement tools, analysis and critiques of measurement approaches and
46
confirmation of proposed models. In the light of ECB as a learning intervention, ECB
outcomes could be perceived as learning outcomes of the ECB initiative. This differs
from program outcomes that organizations expect from program design and
implementation. These two sets of outcomes might help clearly define what needs to
be measured in ECB.
Knowledge Gaps: The Case for Investigating ECB Measurement Practice
At this point, this chapter has presented the perspective adopted in this study.
From this viewpoint, ECB is regarded as a learning intervention in a complex
organizational setting. The conceptualizations and definitions of ECB were presented
favouring the view that ECB outcomes include mainstreaming and use of evaluation
from learning interventions directed at individual, teams or organizational level
targets with the ultimate goal of improving organizational outcomes. A brief overview
of the evaluation timeline positioned the emergence of ECB to be formally recognized
by the body of professionals in the recent decade. The chapter proceeded with the
examination of the influence of program theory and evaluation approaches in the
practice of ECB. It then covered the different conceptualizations of ECB models and
frameworks, investigated how measurement in evaluation relates to ECB, and
presented a brief survey of ECB measurement. This chapter is therefore able to
position the present study in the evaluation field timeline under ECB studies in
furthering discussions on ECB measurements.
Some knowledge gaps were documented in this literature review, warranting
examination of ECB measurement practice in the field. First, while there are ECB
models that determine the factors that relate ECB determinants to ECB outcomes,
47
there is no existing study on the structure of the ECB content delivered. The
researcher‘s argument here is that under the assumption that ECB is a learning
intervention, ECB outcomes can be considered as learning outcomes. Content
structure is critical because it provides a basis for measurement of learning outcomes.
In the survey of the ECB measurement studies, most of the existing studies were
focused on developing ECB assessment tools in the various components and elements
of proposed ECB models, critiques on measurement approaches and validation studies
on ECB models. There were no studies that investigated how the measurement
practices were carried out regarding rigor of measurement. Lastly, while the idea of
developing measurement tools and mapping of these tools is addressed in the overall
program theory for ECB, this does not answer what ECB really seeks to measure. An
investigation of what empirically happens in the field might be able to further
knowledge in this area of the evaluation field. The next chapter details the research
design of this study.
48
CHAPTER 3
RESEARCH DESIGN
Overview of the Chapter
A review of the literature in the previous chapter positioned the research
topic in the broader field of evaluation, evaluation capacity building (ECB),
assessment of learning, and measurement in ECB. This chapter presents the actual
mechanics of the research, the procedures and instruments adopted in this study. The
research problem is restated at the beginning of the chapter to refresh focus on the
main concerns of this research. The conceptual framework of the study is then
presented to provide the background understanding of the assumptions and set the
direction of the inquiry of this research. This is followed by a description of the data
management and statistical analysis applied in the study and concludes with notes on
the role of the researcher and ethical concerns.
The Problem and Research Questions
The overarching objective of this study is to look at the measurement
practices in ECB to draw empirical evidence that could possibly explain the level of
attention practitioners give to evaluation of ECB outcomes. This is approached by
investigating what has been happening in the field as can be examined from
completed and published ECB reports.
The main questions for this research are:
Research Question 1: How can ECB measurement practice be described
from empirical evidence?
49
Research Question 2: Is there evidence to demonstrate that ECB content
follows a unified learning construct and possibly a progressive structure?
These questions may be broken down into the following sub-questions:
Research Question 1:
What are the contexts, implementation approaches, and content of
ECBs delivered in published ECB reports?
What is the rigor of measurement practice in published ECB reports?
What determines practice of measuring ECB outcomes?
Research Question 2:
Does ECB content demonstrate a unified construct and progressive
structure?
Does ECB content group together in specific ways?
Research Design
This section elaborates the way the research is conducted. It presents,
explains, and justifies the approach adopted to address the research questions of the
study.
Broad-based Research Synthesis Method
This study employed an adaptation of the broad-based research synthesis
method proposed by Labin (2008). It is a type of research synthesis that aggregates
findings from primary research as a secondary data analysis. Its main aim is to
establish a base of current knowledge of the subject of interest. Research syntheses
50
are useful tools to clarify and direct future research. Broad-based research synthesis is
different from the traditional research synthesis approach that usually has restrictive
inclusion criteria, often based on randomized controlled trial (RCT) designs. Labin
(2008) argued that research synthesis like meta-analysis is highly restrictive in its
tendency to use RCT data and pooling of quantitative results from similar designs.
Broad-based research synthesis is explicitly systematic but has the characteristics of a
qualitative review. It emphasizes systematic decision rules, uses qualitative and
quantitative means to summarize findings, and integrates qualitative and quantitative
data from various sources and designs, thus the term ―broad‖. In employing this
method, the researcher takes the view that ―random assignment‖, the basis of most
RCT and meta-analysis synthesis, cannot stand alone. In addition, researchers and
policymakers need to find other approaches of synthesis study beyond experimental
and non-experimental debate (Labin, 2008).
This method is deemed suitable for this study for several reasons. First, the
nature of the main research questions demands a reasonable scope of information if it
is to describe the ECB measurement practice in the field. To do this, a research
synthesis approach is necessary since this allows the methods of systematic review
that would ensure a certain level of information adequacy. Second, Labin‘s notion of
―broadbased‖, which means not restricting samples on the basis of a particular
statistical design, is applicable to the situation of ECB studies. ECBs studies are often
carried out and reported on a wide range of research designs and approaches. If the
systematic review of ECB cases is limited to specific evaluation approaches, then it
may defeat the purpose of the synthesis to provide a reasonable scope of information
with regards to ECB measurement practice in the field. Third, this research is
intended as an extension of the work of Labin et al. (2012). The study hopes to deepen
51
the understanding in the area of ECB measurements following the successful and
well-accepted presentation of the synthesis program theory for ECB. Finally, this
approach is feasible for a single researcher with limited resources. The survey of
literature and ECB cases using this approach could reasonably provide breadth and
depth of information on ECB measurement practices in a relatively short period of
time. This approach is more efficient compared with actual immersion ECBs which
costly and time demanding. This means that, given the student researcher context, the
broad-based synthesis approach provides the best alternative to respond to the
problem at hand, affording a wide-range of ECB perspectives, approaches and
contexts which cannot be captured by single actual immersion case studies.
The method stresses the importance of the documentation of the decision
rules. It addresses the questions: What databases were searched? What key words
were used? What design or quality features were used as selection criteria for
inclusion? What coding criteria were used for outcomes or effect sized? What level of
reliability was obtained by coders of results?
The following steps guide the conduct of broad-based research synthesis
(Labin, 2008):
1. Define the research question
2. Collect information sources
3. Select information sources based on inclusion criteria
4. Extract and code data
5. Analyze data
6. Present findings
52
Study units and Selection Procedure
This research is based on completed and published ECB reports, therefore
the units of analysis of the study are the ECB reports that made it through the
selection procedure and criteria. Table 3.1 provides the information sources searched
to collect the case units. The initial listing was adapted from Labin et al. (2008) and
was supplemented with more recent sources. The search protocol follows a
systematic approach using a selection procedure adapted from Miller and Campbell’s
(2006) multistage literature selection design (Figure 3.1).
Table 3.1 Information Sources Search
Databases Searched: Dissertations and Theses (ProQuest)
Academic Search Premier (EBSCO)
Education Research Abstracts
ERIC
Expanded Academic ASAP
Informit, Informaworld, ProQuest (CSA)
International Bibliography of the Social Sciences
JSTOR
PsychInfo
Sociological Abstracts
Social Work Abstracts
Web of Science
Search Terms Used: Developing evaluation capacity
Empowerment evaluation
Evaluation capacity building
Evaluation capacity development
Evaluation skill building
Evaluation technical assistance
Evaluation training
Evaluative inquiry
Mainstreaming evaluation
Participatory evaluation
Evaluation capacity measurement
Journals: The American Journal of Evaluation
The Canadian Journal of Evaluation
The Evaluation Journal of Australasia
Evaluation
Evaluation and Program Planning
Evaluation Review
Evaluation and the Health Professions
The Journal of Multidisciplinary Evaluation
Journal of Development Effectiveness
New Directions for Evaluation
53
Educational Evaluation and Policy Analysis
Evaluation and Research in Education
Journal of Educational Evaluation for Health Professions
The Journal of Evaluation in Clinical Practice
Electronic Journal of Information Systems Evaluation
The Journal of Nondestructive Evaluation
The Journal of Personnel Evaluation in Education
Language Resources in Evaluation
Educational Research and Evaluation
Measurement and Evaluation in Counselling and Development
Practical Assessment Research and Evaluation and Studies in
Educational Evaluation
Stage 1
Stage 2
Stage 3
Stage 4
Figure 3.1 Multistage Selection Process
Searched
N0 databases,
evaluation journals,
Google Scholar
N1=102
Articles, Chapters, Book
Reviews
Reviewed
References and cited
works in N1 articles,
chapters, book reviews
N2=130
Articles, chapters, book
reviews
Reviewed
N2 articles, chapters, book
reviews against inclusion
criteria
N3=70
Case examples
Reviewed
N3 case examples, refined
coding criteria, cases with
insufficient information
removed.
Final selected case units,
N=63
Continuation of Table 3.1
54
The selection process (Figure 3.1) commenced by using the identified search
terms and information sources listed in Table 3.1. The Stage 1 search of articles,
chapters and book reviews yielded the initial sample (N1=102). This sample was
expanded in Stage 2 by reviewing its references and ‗cited by‘ works (N2=130). The
inclusion criteria reduced the case samples to N3=70 in Stage 3 and this was further
reduced (N=63) after cases with insufficient information were removed.
Broad-based Research Synthesis Method Inclusion Criteria
The critical aspect of the Broad-based Research Synthesis method is the
inclusion criteria. The inclusion criteria will determine which case units will be
included in the study. They are also an integral part of the systematic sampling
procedure applied for the search. This helps focus: (1) the parameters of searching the
literature; (2) the parameters for selecting items from the literature and reference lists
through their titles and abstracts; and (3) the final selection of the case units.
The searched articles need to satisfy each of the following criteria to qualify as case
examples:
1. An ECB report published in the listed databases and journals.
2. Published in the period 1970 to present. This selection of time frame
corresponds to the years included in the Labin et al. ECB literature synthesis.
3. Articles that include a report on the measurement process and description of
the measurement models or measurement tools used. However, all ECB
reports included by the sampling procedure will have to be examined to
determine the relative percentage of reports that used measurement practice.
4. The ECB has to be in the context of an organization, a government sector or
agency. This includes ECB reports on programs that indicate involvement of
an organization for the mainstreaming of evaluation. Case units in the context
55
of formal education for evaluation training such as university courses will be
excluded from the selection.
Conceptual Framework of the Study
The conceptual framework of this study is diagrammatically represented in
Figure 3.2. It summarizes the main idea of the research, the relationships of variables,
the research questions, and the methods of analysis. This framework was developed
from the concepts drawn from the literature review of this study and from the research
questions raised in the previous chapter. Table 3.2 summarizes the ECB variables
identified in the literature which form the first component of the framework.
In Figure 3.2, the left-hand box labelled “Evaluation Capacity Building”,
consists of three variables: content, implementation and context (details in Table 3.2).
Content refers to ECB topics delivered at ECB initiatives, often these ECB topic
contents are categorized into individual or organizational focused. These topics range,
for example, from basic evaluation knowledge and skills to organizational
strengthening of evaluation system within the organization. The implementation
variables refer to the teaching or training approaches of the ECB. It may include
strategies such using direct training or indirect training, or participatory evaluation
approaches that cater to adult learners in work settings. Context refers to the broad
range of organizational and environmental characteristics that describe the setting of
the ECB initiative. This includes, for example, characteristics such as domain of the
ECB and type of organization. Research Question 1A, “What are the implementation
approaches and content of ECBs delivered in published ECB reports?”, seeks to
document and describe these three variables. It should be noted that the “context”
56
variable is intentionally excluded from the research question, as these variables will
be represented as part of the independent variables in the analysis. The box at the far
right side Figure 3.2 labelled “Measurement Practice” refers to the variables “ECB
Learning Structure”, “Rigor of Measurement” and “Decision to Measure”.
Generally, ECB learning outcomes should be similar to those of the ECB
content topics. The “Rigor of Measurement” refers to a score level of the reported
ECB using scoring rubrics developed for this study. For instance, an ECB study
would have a high rigor score if it scores high in most of the criteria, for example in
the “scope of variables measured”, it will get a high score if it covers both individual
and organizational capacities (Appendix B). The variable “Decision to Measure” is a
binary data point categorizing ECB initiatives as those that measured or reported
evaluation of their ECBs or not. This dependent variable is included along with the
independent variables for use in the regression analysis to determine the likelihood
that an ECB initiative would measure its ECB outcomes. The “Measurement Practice”
box is determined by Research Questions 1B and 1C.
57
Figure 3.2 Analysis Diagram and Research Questions Map
Main Research Question: How do we describe the measurement practice in Evaluation Capacity Building?
RQ1: What is the description of ECB measurement practice from empirical evidence?
RQ1A: What are the implementation approaches and content of ECBs delivered in published ECB reports?
RQ1B: What is the rigor of the reported measurements in ECB?
RQ1C: What determines the practice of measurement in ECB?
RQ2: Is there evidence to demonstrate that:
RQ2A: ECB content follows a unified learning construct and a possible progressive structure?
RQ2B: ECB content could be grouped in specific ways?
Measurement Practice
RQ 2B
RQ2A
Evaluation Capacity Building
Content
Implementation
Context
Unidimensional Assumption
(IRT Analysis)
Multidimensional Assumption
(Factor Analysis)
ECB Construct and
Progression
ECB Content Sub-
domains
Levels or categories of
content, implementation
and context variables Predictive Influence
(Logistic Regression)
RQ1A
RQ1B ECB Learning Structure
Rigor of measurement
Decision to Measure
RQ1C
58
Table 3.2 Evaluation Capacity Building Content, Implementation and
Context Variables
ECB
Components Variables
Content Individual-focused topics
Evaluation awareness and attitude
Evaluation terms, approaches, or methods
Logic models
Evaluation plan
How to do an evaluation
Data management, analysis, interpretation
or use
Program planning
Program implementation
Organizational-focused topics
Organization evaluation practices
Evaluation readiness and willingness
Building leadership support
Building culture for evaluation
Creating/strengthening evaluation policy
requirements
Creating/strengthening evaluation
structures
Creating/strengthening evaluation
systems
Creating/strengthening support for
evaluation resources
Improving organizational evaluation
social network
Implementation Teaching strategy
Mode of strategy
Contact duration
Intended target change
Participant focus
Context ECB domain
Type of organization
Type of program delivery
Number of organizations
Number of Programs
Number of sites
Affiliation of ECB facilitators
Methodological paradigm
59
The boxes in the center of the diagram identify the main analytical tools used
in the study. The Item Response Theory analysis is an analytical tool adapted in this
study to determine whether the ECB content topics tap into a single learning construct
(Hambleton & Swaminathan, 1985). In learning theory, it is important that the content
of learning material holds together as an entity that is organized in such a way that it
exhibits a developmental structure or stages of competency. IRT analysis can be used
to determine whether this construct for ECB exists and whether it exhibits a
developmental structure. In this study, this structure will be referred to as ECB
developmental proficiency. The Factor Analysis organizes the ECB content topics
into possible dimensions that may help in the understanding of the underlying
conceptual construct of ECB (Thompson, 2004). Both results of IRT and Factor
Analysis are subsequently used as input variables to determine their influence on the
“Decision to Measure” using the Binary Logistic Regression.
Research Instrument Development
The research instrument (Appendix B) developed for this study aimed to
document the ECB measurement practice. The instrument is a query format that
allows the researcher to code the ECB variables identified from the literature review
using the conceptual framework of the study. The content of the coding form was
mostly adapted from Labin et al. (2012) (See Appendix B for details). It consists of
three parts: (1) the ECB profile section, (2) the checklist for ECB content and
implementation, and (3) the scoring rubrics for the rating scale of rigor of
measurement practice. The following discussion details the development and
validation process of the instrument.
60
The checklist in the coding form for ECB content and implementation was
developed and peer reviewed by three evaluation colleagues to ensure the face
validity of the instrument. In the first review, several comments helped the validation
and trial of the initial phase of the instrument development. A second round of review
was conducted when the revised instrument was presented in a PhD collective
research forum. While most of the content of the checklist were taken from the ECB
synthesis report of Labin et al. (2012), several details and modification of the
checklist were added (Appendix B). Variables were distinguished from one another
in the instrument: some were allocated to the ECB profile section and others to the
ECB content and implementation list. The researcher established the operational
definition of profile variables. These are the demographics of the context in which the
reported ECB was conducted. For example, the domain or country of the ECB project
are profile or demographic variables and are not determined by ECB practitioners or
ECB stakeholders. The variables that fall in the ECB content and implementation are
those in which the values are decided upon by the practitioners or ECB stakeholders.
These are variables that could be a result of the ECB negotiation process. For
example, the ECB content or mode of delivery is determined during the negotiation
phase of the ECB.
After expert validation of the content of the instrument, the draft instrument
was pretested, item analysed and checked for internal consistency to verify its
reliability. The measure employed for internal consistency was Cronbach’s Alpha
(0.83, indicating high reliability). This was carried out on the assumption that the
items of the instrument measure a single construct tentatively based on the experts’
validation of the content. These results were reviewed and discussed several times
with the supervisors.
61
Rigor of ECB Measurement Practice: Developmental Model of Proficiency
In developing the assessment tool to measure rigor of ECB measurement
practice, the Developmental Model of Proficiency set out by the Center on
Continuous Instructional Improvement (Corcoran, Mosher, & Rogat, 2009) was
utilized as a guide to construct the rubric contents. The rubric construction adapted
the basic rubric development using the three-level structure and rubric-making
principles from Huba and Freed (2000). While this approach focuses on the links
between assessment and instructional development, the principles are applicable to
the context of ECB. The measurement practice progression scale is not a measure of
the practitioners‘ measurement competency but rather a description of the level of
measurement practitioners have applied in their ECB contexts.
The elements of Developmental Model of Proficiency were adapted (Table
3.3) in the following manner to fit this study‘s context:
Learning targets: this describes the mastery criteria of a given ECB
measurement practice component.
Progress variables: the progress variables are themes of the construct to
be assessed by the instrument. In the case of measurement practice in
ECB these are the criterion variables to be measured, for example the
―scope of variables‖, ―obtaining evidence‖, etc. (Appendix B).
Levels of achievement: this refers to the steps within each of the
progress variables that would serve as pathways for developmental
progression. In ECB measurement practice the levels could progress
from evidence that describes ―no application‖ and ‗low to high‖. Each
level would be described in such a way that it is exhaustive and
mutually exclusive.
62
Learning performance: this refers to the evidence described in a
particular level of the progress variable along the progression scale.
Table 3.3 Developmental Model Proficiency as Applied to Rigor of ECB
Measurement Instrument
Components In Corcoran et al. (2009) In this study
Learning Targets Mastery criteria of
learning targets in
classrooms
Rigor criteria of ECB
measurement practice
Progress Variables Dimensions of the learning
construct
Criterion variables to be
measured
Levels of Achievement Progression levels within a
dimension
Progression levels within a
criterion
Learning Performance Description of the
progression level within a
particular dimension
Description of evidence of
the progression level
within a criterion.
The following items constitute the assessment tool. The following variables
were identified and adapted as components of the measurement rigor rubrics from
Braverman (2013):
ECB Construct Variables: Individuals, Systems, Structure
Measurement approaches: Self-report, Observations, Multiple Sources
Scope of measures: Single Measure, Multiple Items, Multiple Measures
The following variables are additions in this study to the list of Braverman (2013):
Utilization: Design ECB, Guide ECB, Evaluate ECB
Representativeness: Non-probability samples, Probability samples, Census
Timing: During ECB, Short Time After ECB, Extended Time After ECB
Level of generalization: Anecdotal, Descriptive, Inferential
63
Design: Observational, Quasi-experimental, Experimental
Reliability measures
Sources of Potential Error
Since the study units of this research are existing ECB reports written for
various publication purposes and standards, they may not truly reflect or emphasize
the rigor of measurement practice. The emphasis could be on ECB approach or any
other aspect of ECB. There is also an issue on the perspective that is taken in this
study. Due to the focus on ECB measurement practice and measurement rigor, there
may be reports that do not suit this approach, such as an ECB report that adopts a
qualitative approach of evaluating an ECB. Inclusion of such a report in this would
naturally give the report a low level of measurement rigor when it should have been
examined on a different set of criteria for qualitative approaches. For example, these
kinds of reports may be examined using rubrics drawn from what Patton (2002) called
the ‘quality and credibility of qualitative evidence’. However, this is beyond the scope
of this research.
Another possible area of contention is the fact that the researcher developed
and applied the assessment instrument as the only rater of the ECB case units. To
strengthen this area of weakness, the standard practice of validating the instrument
was followed, using peer review of a panel of ECB practitioners and assessment
experts. The internal consistency and reliability of the tool were checked by pretesting
and item analysis. While inter-rater reliability is not established, this aspect is of little
concern because there is only one rater and it could be assumed that with the help of
the rubrics, some level of objectivity is achieved. This issue is presented in detail in
the section on research instrument development. There is a risk that fatigue and time
factor during the rating may affect the scores. This is addressed by the researcher
64
limiting case unit assessments to only 2 to 3 papers per day to avoid fatigue. Also, the
relative position of each paper in the sequence will be noted so that the variability of
scoring can be examined across the rating time and could be adjusted and addressed if
significant variance is found.
Data Management
The coding procedure was conducted by examining each ECB report and
will be carried out in two steps. The first coding was for the categories of ECB
context, content and implementation variables and the second coding for the
measurement of ECB measurement rigor using the established rubrics. A qualitative
software package called ATLAS.ti ("ATLAS.ti," 2014) was used for management of
ECB report textual data to categorical codes. This provides a tracking reference that
can link sections of the report to codes. In other words, the procedure is essentially
like assessing a piece of academic work, such as an essay, but using a well-developed
set of criteria, checklists and rubrics. A descriptive summary of the measurement
practice is written as an annotation to each case to qualitatively examine the piece of
work. The purpose of this step is to discern emerging themes and patterns that could
possibly validate, explain and provide a description of the quantitative values
produced in the quantitative analysis.
The scores generated by the scoring rubrics and some numerical data was
coded quantitatively to allow quantitative aggregation and statistical analysis
procedures. These data will be organized in a spreadsheet format suitable for
statistical analysis using IBM SPSS Statistics ("IBM SPSS Statistics," 2011). The
statistical software generated descriptive statistics such as frequency distributions and
65
descriptive summaries. Statistical tests such as comparisons, data reduction and
regression analysis was also examined.
Statistical Analysis
The statistical analyses that will be applied in this study are selected on the
basis of the nature of the research questions. The analysis diagram presented earlier in
Figure 3.1 shows the main statistical analyses that are suitable to answer the posed
research questions. The following statistical tools are used in this study:
Descriptive Statistics
These are the summary statistics in the form of frequencies, averages and
standard deviations used to provide descriptions of the quantitative and
categorical data sets. These tools will be used to generate answers for the first
research question and its sub-questions.
Binary Logistic Regression Analysis
Binary logistic regression analysis is a statistical modeling technique used to
predict the outcome of a categorical dependent variable based on one or more
predictors. The predictor variables could be numerical or categorical. This
regression estimates the odds that the dependent variable is a success
(Freedman, 2009). In this study, this analysis is applied to determine what
influences the decision to measure ECB, where the decision to measure is
taken as a binary categorical variable. The binary logistic regression analysis
in this study will be carried out using the IBM SPSS statistical software ("IBM
SPSS Statistics," 2011).
IRT Analysis
Item Response Theory (IRT) analysis, also known as latent trait theory
66
analysis, is an approach for the design, analysis and scoring of tests,
questionnaires, and similar instruments that measure ability, attitude, or other
variables. Central to the concept of IRT is a modeling technique that would
determine whether items of a test or content of a learning material (as in the
case of ECB) constitute a unified construct. This analysis also provides
information about whether the construct examined forms a hierarchical
structure of difficulty (Hambleton & Swaminathan, 1985). In this study, this
analytical technique is applied to address the second research question, that is,
to determine whether there is evidence to demonstrate that ECB content has a
unified learning construct. The application of the analysis will be carried out
using ConQuest, a statistical software for IRT (Wu, Adams, & Haldane, 2006).
Factor Analysis
Factor analysis is a statistical method used to describe, observed correlated
variables within a potentially lower number of unobserved variables, called
factors. It attempts to represent a set of observed variables in terms of a
number of common factors plus a factor which is unique to each variable. The
factors (also called latent variables) are hypothetical variables which explain
why a number of variables are correlated with each other. This will be used in
this study to determine whether the ECB content can be grouped into
categories or factors of similar traits (Thompson, 2004). The factor analysis
will be carried out using the IBM SPSS statistical software ("IBM SPSS
Statistics," 2011).
67
Role of the Researcher
A critical aspect of this research is the role of the researcher as the developer
and implementer of the research instrument. It is important that although the
development of the research instrument is carried out by a single researcher, peer
review is undertaken in the process. This approach ensured that the researcher
received critical feedback and diversity of views to examine and address possible
viewpoint biases. In the research instrument development, this was achieved first by
considering what the literature had to say about the construct to be measured. This
incorporated the views of authorities on ECB concepts and models, measurement in
evaluation and assessment principles. After drafting the instrument, the next step was
organizing a small group of colleagues in the field of ECB and assessment through
one-on-one meetings, small group discussions, online chat and emails on its content.
The comments and suggestions were carefully noted and considered in the revision of
the instrument draft. After a trial run and basic psychometric procedures, the final
instrument was examined and approved by the research supervisors, thus ensuring an
acceptable level of face validity of the research instrument.
Regarding the researcher‘s role as the only rater of ECB reports in the
study, the sources of assessment errors are identified and minimized. For example, to
make sure that fatigue does not affect scoring, a limited number of papers are
examined in a day depending on the length of the report. Long papers limit
assessment to two per day and at most three for shorter reports. The variability of
scores is examined with respect to the relative position of the papers in the assessment
sequence as well as the classification of papers with respect to length of report. That is
68
to see if score variability differs, for example, in the first half of the set or the second
half of the set.
The approaches taken confirm the credibility and trustworthiness of the
study despite the fact that the development and implementation of the research
instrument to assess ECB reports is conducted by a single researcher.
Ethical Concerns
This study used non-human subjects. This means that this falls into the
category of a low risk study which did not require ethics approval from the
University. However, the researcher used correct referencing of the case units
included in the study to ensure their representation and to disguise them. Even though
the ultimate paradigm used is quantitative, it was intended that the descriptive aspects
would minimize the bias and would clearly present the point of view of the study.
Conclusion
In summary, this chapter has presented the steps that will be undertaken to
answer the research questions The study will use the Broadbased Research Synthesis
approach to ensure the scope and systematic identification of the case units which are
to be included in the study. The chapter has described the research conceptual
framework that was constructed to identify the major components of the study, to
show relationships of the variables to be investigated and to outline the analytical
approaches to be carried out. The study‘s research instruments established, and the
data management plan and statistical analysis selected were discussed.
69
CHAPTER 4
RESULTS AND ANALYSIS
Overview of the Chapter
The investigation plan outlined in Chapter 3 is reported in this chapter.
The results are presented in the same order as that in which the main research
questions are posed. The chapter outlines the story of ECB measurement practice, and
how ECB practitioners deal with finding evidence of learning when they teach
evaluation to individuals and organizations. An analysis diagram is reproduced in
Figure 4.1 to facilitate understanding of the progressive presentation of the results.
The first research question is about the description of the content and
implementation of ECBs. Therefore, the sample and contextual profiles of the ECB
case samples included in this study are presented first in the first two sections. This
provides a picture of what has been taught in ECB practice. ECB content takes
prominence as this is the basis for examining what has been measured. The
researcher answers the second research question by examining in detail the
characteristics of ECB content topics whether there is evidence that the ECB content
follows a unified construct with the possibility of progressive structure and whether
these content topics could be grouped in specific ways. The second part stems from
the notion that ECB as a learning intervention implies possible existence of the
learning content construct as a basis for measurements.
70
Figure 4.1 Analysis Diagram and Research Questions Map
Main Research Question: How do we describe the measurement practice in Evaluation Capacity Building?
RQ1: What is the description of ECB measurement practice from empirical evidence?
RQ1A: What are the implementation approaches and content of ECBs delivered in published ECB reports?
RQ1B: What is the rigor of the reported measurements in ECB?
RQ1C: What determines the practice of measurement in ECB?
RQ2: Is there evidence to demonstrate that:
RQ2A: ECB content follows a unified learning construct and a possible progressive structure?
RQ2B: ECB content could be grouped in specific ways?
Measurement Practice
RQ 2B
RQ2A
Evaluation Capacity Building
Content
Implementation
Context
Unidimensional Assumption
(IRT Analysis)
Multidimensional Assumption
(Factor Analysis)
ECB Construct and
Progression
ECB Content Sub-
domains
Levels or categories of
content, implementation
and context variables Predictive Influence
(Logistic Regression)
RQ1A
RQ1B ECB Learning Structure
Rigor of measurement
Decision to Measure
RQ1C
71
The Sample Profile
This study includes 63 ECB cases published in various journals in the
field of evaluation. These cases follow the inclusion criteria presented in detail in
Chapter 3. The ECBs considered in this investigation pertain only to ECBs that
engaged in evaluation capacity building in relation to organizations. This means that
formal trainings in evaluation, for example in universities, with no organizational
involvement, are excluded. Published ECB reports that focus on research skills
building but did not explicitly target evaluation capacity were also excluded even
though they target similar technical competencies in ECB among individuals and
organizations.
Since the Labin et al‘s. (2012) conducted the definitive work to date on
ECB cases, Table 4.1 was produced as a comparison of the case samples examined
the Labin report. As their work has been recognized as an important benchmark for
ECB studies, it is imperative to relate the study sample to their work. This study only
accounts for 79 percent (48 of the 61) of the cases in Labin‘s synthesis study. This is
because of the selection criteria for this study which some of the Labin sample did not
satisfy. Also, some articles could not be located online, while the full text of other
studies was not available. The online search for articles in this study was extended to
include the most recent published report using the same search terms, journals and
databases as applied in the Labin study. The scope of this investigation is limited to
published ECB reports from 1978 to 2013 and restricted to ECB exercises in
organizations. The sample in this investigation does not represent ECB practices in
general, as many organizations do not publish their work and the search execution has
limitations. This means that statistical conclusions are limited only to the population
72
of ECB reports represented by the sample and cannot be generalized to the entirety of
ECB practice in the field. However, the results and findings may be sufficient to
create a picture of ECB measurement.
Table 4.1 Case Sample of this Study and the Labin et al. (2012) Sample
Case Reference Years Published Number of Cases
Labin et al. (2008) 1978 - 2008 48
Recent search 2008 - 2013 15
Total cases in this study 1978 - 2013 63
Note: Total number of studies in Labin et al. (2008) = 61
Table 4.2 presents the distribution of ECB case reports according to the
journals in which they are published. Five journals published the highest number of
ECB case reports: (1) New Directions for Evaluation; (2) Evaluation and Program
Planning; (3) American Journal of Evaluation; (4) The Canadian Journal of Program
Evaluation; and (5) Evaluation. These are among the leading journals in the
evaluation field. These five journals published 70 percent of the case reports
considered in this study. The remaining journals that published ECB case reports
represented the multidisciplinary domain of evaluation; these are the primary users of
evaluation such as the fields of education, health and social interventions.
73
Table 4.2 Journals that Published ECB Case Reports in the Sample
Name of Journal Number of ECB
Reports
New Directions for Evaluation 12
Evaluation and Program Planning 11
American Journal of Evaluation 10
The Canadian Journal of Program Evaluation 6
Evaluation 5
Journal of Prevention and Intervention in the Community 3
AIDS Education and Prevention 2
American Journal of Community Psychology 2
Professional School and Counseling 2
Evaluation Journal of Australasia 1
Evaluation Review 1
Gifted Child Quarterly 1
Health and Social Work 1
Journal of Community Practice 1
Journal of Health Care for the Poor and Underserved 1
Journal of Organizational Behavior Management 1
R&D Management 1
The Journal of Experiential Education 1
Total 63
Figure 4.2 displays the timeline plot of these case reports. While the range
appears to be wide, only six cases are within the period prior to year 2000. The graph
appears skewed to the left through time with several peaks after the year 2000. This
may indicate the interest generated by AEA conferences, conducted in 2000 and 2001,
which respectively highlighted ECB and evaluation mainstreaming. (Leviton, 2001).
74
The decade after year 2000 seemed to be a period of empirical ECB reporting with a
total of 49 cases (78%) in that span of time alone. Although, there seems to be a
declining trend after 2010, but it should be noted that this graph only pertains to ECB
case reports and not to ECB literature in general or ECB activity, which there is no
way to track in this study.
Figure 4.2 Timeline and Distribution of Published ECB Case Reports
The publication on ECB literature in general is a different story. Search
results produced a total of 58 publications on ECB related literature from 2008 to
2014. After applying the selection criteria, only 15 cases were included in the sample.
Thus, there are more ECB conceptual and theoretical conversations but fewer ECB
field case reports appearing in the literature. Conceptual and theoretical discussions
on ECB are possibly on the rise but fieldwork reports such as those included in this
study appear to be declining. This comparison of publications of theoretical and
conceptual ECB versus empirical ECB appears to support the claim that empirical and
evidence-based conceptualizations of ECB are still wanting. Also, it could be possible
0
1
2
3
4
5
6
7
8
9
1975 1980 1985 1990 1995 2000 2005 2010 2015
Nu
mb
er
of
Pu
bli
shed
EC
B
Rep
ort
Year
75
that journals stopped publishing ECB case reports because they are not new or unique
contributions anymore.
The United States of America is the leading country when it comes to
publishing ECB reports, accounting for 67% of the total case reports (Table 4.3).
Canada and Australia follow, but pale in comparison with the U.S., which published
about six times more than Canada and ten times more compared with Australia. Of
the four ECBs conducted in Australia, three were published elsewhere; only one
appeared in the Evaluation Journal of Australasia. This may indicate that the U.S. is
still the leading country when it comes to publishing evaluation research and activities
with its similar role in the evaluation history; or it may only indicated that the U.S.
constituents were quick to publish what they have been doing.
The ECB reports examined here were mostly (75%) from countries that
are members of the Organization for Economic Co-operation and Development
(OECD), which are economically advanced countries. The published ECBs from non-
OECD countries (Afghanistan, Ghana, and Latin America and the Caribbean) were
conducted in partnership with an agency from an OECD country; for example, the
Ghana report was conducted in partnership with a United Kingdom development
agency. Therefore, the sample of ECB cases in this study does not represent the
developing world. It is highly probable that when international development
agencies conduct ECB among the recipient developing countries, but results may not
have found their way to publication or they may be published but only within the
agencies‘ circulations. The sample for this study fails to capture the global situation of
ECB as there is a growing movement of adoption and adaptation of evaluation in
Southeast Asia and other parts of the world (Grob, 2010). The implication of this
76
particular background is that results of this study could only be attributed to the
population of ECBs conducted in developed countries as represented by the sample.
Table 4.3 Countries where ECB Case Reports were Conducted
Country/Region Number of ECB
Reports
United States of America 42
Canada 7
Australia 4
Afghanistan 1
Denmark 1
Ghana 1
Japan 1
Latin America and the Caribbean 1
Mexico 1
New Zealand 1
Spain 1
Sweden 1
Total 63
To summarize, this study includes mostly ECB cases in developed
countries, in particular the United States, and predominantly published in evaluation
journals. Although the period covers from 1978 to 2013, the majority of the cases
were published between the decade 2000 and 2010. They also point to the limitation,
that ECB cases which may have occurred in international development domains
77
outside developed countries are not being captured in this report. These characteristics
provide the scope on which the conclusions of this investigation can be inferred.
ECB Contextual Profile
The ECB contextual profile provides a general description of the
environments in which these ECB cases in the sample were conducted. This section
will detail where these ECBs have occurred: the domain of disciplines; the type of
organizations; the number of organizations and programs; the kinds of program
delivery; and even the affiliation of ECB practitioners and their methodological
paradigms.
These ECBs occurred mostly in organizations, institutions or agencies that
deal with social interventions. These domains include health, education, community
development, child and youth development, and research and policy (Table 4.4).
These domains are broad classifications of the fields where the ECBs were conducted.
For example, Fetterman and Bowman‘s (2002) experiential education and
empowerment evaluation of the Mars Rover Educational Program is classified under
the domain ‗Education‘.
Seventy-six percent of the ECB cases were in the domains of health and
education. These include, for example, organizations, that provided services for HIV
prevention or schools that tested a new guidance and development program. Agencies
that perform social work, such as child, youth and community development programs
comprised eight percent of the cases. Examples included violence prevention, adult
education and afterschool care programs. ECBs appear to have low occurrence in
agencies that focus on research and policy (10%).
78
Table 4.4 Distribution of ECB Domain
ECB Domain Number of ECB
Reports
Percentage
Health 28 44%
Education 20 32%
Community Development 12 19%
Research and Policy 6 10%
Child and Youth Development 5 8%
Note: ECB case may have multiple domains
Table 4.5 shows the type of organizations (also institutions or agencies)
where these ECBs have occurred. About half of the ECB initiatives were undertaken
by non-profit organizations (51%) followed by government agencies (29%). Schools
and school districts comprise only 16 percent of the share. This may signify the active
role of non-profit organizations in social development programs. In terms of ECB,
results indicate that these are the primary consumers or sources of ECB demand.
Evaluators that specialize in ECB may find practice niches in NGOs, government
agencies and education institutions.
Table 4.5 Type of Organization
Type of Organization Number of ECB
Reports
Percentage
Not-profit 32 51%
Government 18 29%
School or School District 10 16%
University 2 3%
For-profit 1 2%
79
The types of program summarized in Table 4.6 describe the primary
purpose for which ECBs occurred. For example, ―services‖ refers to organizations
that specialize in the provision of health care or education; ―education or capacity
building‖ refers to organizations that provide training programs; ―advocacy‖ for
organizations that specialize in campaigns such as anti-smoking and tobacco bans;
and ―research and policy‖ for organizations that focus on policy development, for
example, the Food Program for Policy in Ghana. Note that some organizations have
multiple program delivery priorities. For example, one may provide health care
services and advocacy at the same time. Hence, totals in Table 4.6 exceed the total
number of ECB cases in the sample. Findings show that most of the ECBs were
conducted in organizations that deliver ―services‖ and ―education or capacity
building‖ programs to their beneficiaries.
Table 4.6 Type of Program Delivered
Program Delivered Number of ECB
Reports
Percentage
Services 47 75%
Education or Capacity Building 26 41%
Advocacy 12 19%
Research 9 14%
Note: ECB case may have multiple programs delivered
ECBs were conducted in single or multiple organizations (Table 4.7).
Multiple organization ECBs occur, for example, when a funding agency gathered all
region-wide organizations that are recipients of funds for some training on program
evaluation. This is also the case for some national agencies operating at a head office
80
with satellite independent organizations nationwide, for example, a federal health
agency calling their direct line and allied agencies to engage in an ECB. Single
organization ECBs were carried out specific to one organization. Data show that
ECBs conducted on single or multiple organizations were almost equal in number: 52
percent for single organizations and 48 percent for multiple organizations. In terms of
organization composition, single or multiple organizations are almost equally
represented in the sample.
Table 4.7 Number of Organizations in an ECB Activity
Number of Organizations Number of ECB
Reports
Percentage
Single Organization 33 52%
Multiple Organizations 30 48%
Most ECBs were carried out by organizations that run multiple programs
(Table 4.8), that is, an organization may carry out many intervention programs at
once. Only about a third (32%) of ECB cases were running single programs. This has
possible implications with respect to ECB demand. Organizations with multiple
programs may have the tendency to engage more in ECB. As to program sites, almost
all programs run parallel implementation in multiple sites (Table 4.9). For example, in
a school district, a new guidance program was simultaneously implemented in several
locations. Multiple site programs account for 90 percent of the programs mentioned in
the ECB reports.
81
Table 4.8 Number of Programs in an ECB Activity
Number of Programs Number of ECB
Reports
Percentage
Single Program 20 32%
Multiple Programs 43 68%
Table 4.9 Number of Program Sites
Number of Sites Number of ECB
Reports
Percentage
One-site 8 13%
Multi-site 55 87%
Considering those who carried out reporting and publication of these ECBs,
it appears that most ECB facilitators were university affiliated (65%). This is not
surprising, as those who are connected with universities are typically expected to
publish their work. Authors who are affiliated with the internal evaluation unit of
organizations were often co-authors of university-based ECB facilitators. The sample
included very few private consultancy evaluation practitioners who publish their
work. Based on the data evidence at hand, there was no way of determining how
many of these private practice evaluators conduct ECBs in the field. However, this
information indicates the significant and important role of universities: (1) publishing
on ECB, and (2) in promoting and advocating evaluation to the organizations and
often through partnership.
82
Table 4.10 presents the affiliation of ECB facilitators that reported ECB
practice. Thirty percent of the ECB reports were published by Internal Evaluation
Unit, suggesting some interest in self-reflection or self-analysis.
Table 4.10 Affiliation of ECB Facilitators
Affiliation Number of ECB
Reports
Percentage
University 41 65%
Private Consultancy 6 10%
Internal Evaluation Unit 19 30%
Note: There is an overlap of categories as some of university affiliated evaluators were also involved in
some organization’s internal evaluation units.
Lastly, the final contextual factor in the ECB implementation profile was
examining paradigms used to report on the results of ECB efforts. Table 4.11 reveals
that the majority of the ECB facilitators (close to 90 percent of the ECB reports) made
use of qualitative methods. This was more than double those that use quantitative
methods. A third of all the cases used both qualitative and quantitative methods
(multiple methods). In this third group, there were no explicit statements in the reports
about specific research or reporting paradigms used. Therefore, it could not be
determined whether some of these multiple methods intentionally followed the mixed
methodology; that is, systematically integrating both methods to achieve an analysis
(as opposed to the eclectic use of methods) to give a more robust picture of the
phenomenon. This distribution of methodological paradigms can be best depicted
relative to their sizes using a Venn-diagram (Figure 4.3). This is an important finding
in relation to the researcher‘s main query regarding the measurement practices in
ECB. The figure demonstrates that quantitative methods are much less common in
83
ECB publications. This will have implications in the following analysis and the
inferences of this study.
Table 4.11 ECB Case Report Methodological Paradigm
Methodological Paradigm Number of ECB
Reports
Percentage
Qualitative 55 87%
Quantitative 22 35%
Multiple Methods 14 22%
Note: Categories are not mutually exclusive; qualitative and quantitative counts include multiple
methods counts
Figure 4.3 Venn-Diagram of the Methodological Paradigms of ECB Reports
In summary, the contextual profile of published ECBs included in the sample
reveals that the majority of the ECB cases were focused on social intervention
Qualitative, N1 = 55
Quantitative, N2 = 22
41 14 8
N = N1N2 – N1N2
N = 55 + 22 -14
N = 63
Multiple, N1N2 = 14
84
domains with priorities on health, education, and community development areas. It
also shows that about half of these were conducted by non-profit organizations
followed by government agencies and schools that provided delivery of social
services and education and capability building activities. ECBs were conducted in a
similar percentage of single organizations, and clusters of multiple organizations,
mostly on multiple programs delivered at multi-sites. University-affiliated ECB
practitioners largely authored the ECB publications in partnership with internal
evaluation units of the organizations. The reporting of results from ECBs was mostly
presented in a qualitative paradigm. This summary of contextual profile points out the
clear scope and limitations of the following analysis, inferences and conclusions.
Bearing mind on this background information, the profile of the sample and
the ECB contextual descriptions, the subsequent sections will set out to answer the
stated research questions. The first research question is stated as follows:
Research Question 1: How can ECB measurement practice be described
from empirical evidence?
This research question is dealt with by considering the three related sub-
questions. Answers to these sub-questions will form the composite response to
Research Question 1.
Research Question 1A: What are the content and implementation
approaches of ECBs found in published ECB reports?
85
This first research sub-question covers the content and implementation
strategies of ECBs included in the sample for this study. This describes what is taught
and what occurs in ECB activities. It also seeks to validate and affirm the descriptions
offered by earlier ECB systematic reviews and syntheses (Labin, 2014; Labin, et al.,
2012; Suarez-Balcazar & Taylor-Ritzler, 2014). The presentations here are essentially
descriptive, providing an overall background picture of the content and
implementation of ECB activities in the field. More importantly, these results will be
related later in the investigation with regard to measurement practices. This section is
divided into two parts: the content and implementation. The contextual profile is
presented in the preceding section beginning on page 77.
ECB Content
This section presents the documentation of ECB content topics delivered
by the ECB initiatives considered in the ECB sample. There were 63 cases examined
in this study of which only 57 cases reported ECB content topics. An initial checklist
was developed in the coding form to determine the ECB content, where content is
also referred to as ECB ―topic‖ throughout this report. After the coding some topics
were added but, essentially, there was little variation from the Labin et al. (2012)
report (See Appendix B for comparison note with Labin et al. instrument). Most of the
topics in the checklist focused on evaluation awareness and attitudes, evaluation
terms, approaches or methods, logic models and the like. However, when reviewing
the case samples, it was noted that some cases documented several topics, such as
ECBs that focused on both program planning and program implementation. There
were also several cases that referred to an ECB activity as ―training in evaluation‖ but
which comprised topics that seemed to focus more on research technical skills such as
86
data management and analysis. The final checklist of ECB content based on topical
themes generated 17 topic categories. The ECB content categories generated from the
checklist were initially categorized broadly into two groups based on Labin et al.
(2012). The groupings refer to ECB topics that target individual capacities and those
that target organizational capacities.
Figure 4.4 presents the ranked frequency distribution of the topics found in
the ECB case samples which target individual evaluation capacities. The bar graph
reveals the counts relative to each topic. As can be observed from the graph, an ECB
may have multiple related topics on evaluation, depending on the learning needs of
individuals and organizations that might have been negotiated prior to the conduct of
the ECB initiative.
The counts on ECB content show that the topic with the greatest frequency
was the creation of an ―evaluation plan‖. In some ECB cases, the topic ―evaluation
plan‖ co-occurred with the topics ―program planning‖ and ―program implementation‖.
Along with the topic ―evaluation plan‖ more than 50 percent of the ECB cases
covered the topic of data management, the ―how to‖ of conducting an evaluation, and
evaluation basics such as evaluation terms, approaches or methods. Nearly half of the
cases included logic models. While these contents related mostly to evaluation, some
essential skills in program management, as well as research skills (such data
management, analysis, interpretation or use) were also included.
―Evaluation awareness and attitudes‖ was the least frequent topic (14%)
in all the ECB cases. This does not mean that the theme of ―evaluation awareness and
attitudes‖ is not popular or important. It may signify that the organizations requesting
the ECBs already had a strong conviction of the importance of evaluation; hence,
there is no need to prime for positive attitude towards evaluation or awareness of the
87
significance of evaluation. However, this count may also represent organizations that
still struggle to convince their constituents of the relevance of evaluation for them.
Figure 4.4 ECB Content Targeting Individual Level Capacity (N=63)
9
15
20
30
33
37
37
42
0 10 20 30 40 50 60
Evaluation awareness and attitudes
Program implementation
Program planning
Logic Models
Evaluation terms, approaches or methods
How to do an evaluation
Data management, analysis,…
Evaluation Plan
Number of ECBs
ECB Content: Individual Level
88
Figure 4.5 ECB Content Targeting Organizational Level Capacity (N=63)
The results of organizational level content topics of ECB are presented in
Figure 4.5. Nearly 20 percent of the cases covered topics on creating, strengthening
or building evaluation systems, evaluation structures, or culture for evaluation.
Evaluation Systems here referred to the establishment of organizational flow of
evaluative information, feedback loops and utilization. Evaluation Structures referred
to the establishment of evaluation units or networks as well as the management
capability for data and information; and Evaluation Culture referred to topics on
beliefs and behaviors relating to evaluation as an organization.
Approximately 10 percent of the cases covered ―improving the social
network‖ within and outside the organization for evaluation, and less than 10 percent
of the cases touched on ―evaluation support‖ such as leadership, resources, and
policies. As was the case with ―evaluation awareness and attitudes‖ for individual
3
4
5
5
6
7
10
10
13
0 10 20 30 40 50 60
Evaluation readiness and willingness
Creating/strengthening evaluation policy
requirements
Organization evaluation practices
Building leadership support
Creating/strengthening support for
evaluation resources
Improving organizational evaluation social
network
Building culture for evaluation
Creating/strengthening evaluation
structures
Creating/strengthening evaluation systems
Number of ECBs
ECB Content: Organizational Level
89
evaluation capacity content, the low number of ECB cases that discuss ―evaluation
readiness and willingness‖ at the organizational level does not necessarily reflect the
unpopularity of evaluation, rather, it could indicate a reduced need for ECB
motivation, as these organizations could be already convinced of the significance of
evaluation and the need to improve the organization‘s individual and organizational
evaluation capacities.
Overall, about 40 percent (23 of 57 cases), of the ECBs reported content
areas that were related to organizational evaluation capacity (Figure 4.6). This group
of topics is broadly categorized as ‗organizational level‘ content because they each
address the organizational capacities for evaluation. Ninety-eight percent reported
content areas pertaining to individual evaluation capacity, while only one case
conducted an entire systems work for organizational evaluation capacity.
Figure 4.6 Venn-diagram of the Capacity Change Target of ECB Reports
Of the 23 cases that reported ECB at the organizational level, 56 percent (13
of 23 cases) completed evaluation systems and 43 percent (10 of 23) focused on
Individual, N1 = 56
Organizational, N2 = 23 34 22
1
N = N1N2 – N1N2
N = 56+23-22
N = 57*
(* Six cases with no content report)
Blended, N1N2 = 22
90
evaluation structures and cultures. This reveals two critical points: (1) there is a need
to measure evaluation systems, structures and cultures to determine effectiveness of
these evaluation training inputs; and (2) there has to be a way of measuring them. This
result attests that the majority of the existing ECB measurement tools were centered
on individual evaluation capacities (Labin, 2014).
At this point, there is a clear picture of what ECB content had been delivered
to organizations. ECB topics covered mostly involved creating an evaluation plan,
improving data management and analysis skills, learning about the basics of
evaluation, evaluation implementation and logic models. Few organizations touched
on organizational level content such as creating systems and structures for evaluation
and developing a culture for evaluation in the organization.
This description of content that has been delivered during ECBs has strong
implications as to what outcomes should be sought in ECBs. Clearly, one cannot
expect learning outcomes from content that has not been delivered or organized. If
most ECBs are focused on improving individual evaluation capacities, then improved
organizational evaluation capacities cannot be expected. Thus, there is a need to
examine the level of expectations with regard to program outcomes.
ECB Implementation
Evaluators recognize the fact that they have multiple roles in the
implementation of ECB. Evaluators are not only evaluators, but also teachers,
trainers, facilitators, managers, critical friends, coaches, and policy advisers. In ECB
implementation, these roles usually occur in combination, sometimes in a conflicting
manner. Primarily, however, the main role of an evaluator in an ECB is to teach about
evaluation. The approach could vary from an informal demonstration and
91
participatory technique to formal full-time in-house training sessions. These roles
ensure that ECB activities are delivered to and address the intended target capacity
change using the appropriate strategies. This section considers at ECB
implementation with respect to intended ECB target, the focus participants and the
strategies of implementation.
The Integrated ECB Synthesis Model (Labin, 2014) shown in Figure 4.7
could provide a reference framework for where these implementation factors could be
located. In the model, ECBs are intended to improve the evaluation capacities of an
organization, at both the individual and organizational levels. This is carried out
through the ECB ―Activities‖ component of the model.
92
Figure 4.7 Integrated Evaluation Capacity Building Model (Labin, 2014)
93
Intended Target of ECB
ECBs aim to improve individual or organizational evaluation capacities.
Table 4.12 presents the intended target capacities of ECBs included in this study. This
is different from the ECB content target categorization as presented earlier as this one
is the explicit intended target as reported. Sixty-four percent targeted individual
capacity change. A third of the cases targeted both individual and organizational
evaluation capacities. Combining the categories for individual or organizational
ECBs, this comprises 97 percent of ECBs that are focused on improvement of
individual evaluation capacities and 36 percent for organizational evaluation
capacities. These findings suggest that most ECBs are limited to individual evaluation
improvement compared with organizational evaluation improvement.
Table 4.12 Intended Target of ECB
Target Change Number of ECB
Reports
Percentage
Individual capacities only 40 64%
Organizational capacities only 2 3%
Individual and organizational
capacities
21 33%
Table 4.13 presents the participant focus of ECB. Counts show that 90
percent of the cases focused on evaluation training of the program staff and little over
half (59%) focused on program managers. Results also show that only about one-third
(30%) involved the leadership in the evaluation capacity training and a few included
program beneficiaries (14%). These findings imply that if ECBs are to target
organizational change, then ECBs might need to improve on engaging the leadership.
94
In Labin‘s IECB model (Figure 4.7), it is assumed that ―program outcomes‖ are the
ultimate outcome of an ECB, and additionally, as a function of both the individual and
organizational evaluation capacities. However, it appears in these findings that the
ECB targets in practice were skewed towards individual evaluation capacities and
omitted the leadership in the process.
Table 4.13 Participant Focus of ECB
Participant Number of ECB
Reports
Percentage
Staff 57 90%
Managers 37 59%
Leadership 19 30%
Beneficiaries 9 14%
Note: ECBs may have multiple participant focus
ECB Implementation Strategies
ECB implementation strategies refer to the collective ECB delivery
approaches. These are the ECB teaching strategies, the delivery mode and the contact
duration during the ECB activity.
Table 4.14 Type of ECB Teaching Strategies
Strategies Number of ECB
Reports
Percentage
Direct Training 39 62%
Technical Assistance: Coaching,
Mentoring, Consultations
39 62%
Participatory Evaluation 40 63%
Learning Materials (Print or Online) 13 21%
Note: ECBs may use multiple teaching strategies
95
Table 4.14 shows that the ECB teaching strategies employed occur in similar
proportions (62%). These are the teaching of evaluation through direct training; the
supplementation of various technical assistance modes; and the involvement in actual
evaluation or participatory evaluation. Only about 20 percent of the reports
mentioned the use of evaluation manuals, templates and web resources in conjunction
with direct training, technical assistance or participatory evaluation. Since each ECB
could use multiple teaching strategies, the Venn-diagram shows the overlaps and
complements among these strategies for a clearer picture of how these strategies were
combined in practice (Figure 4.8).
The six pair-wise diagrams compare four strategies two at a time (4C2=6), to
reveal which strategies tend to be used together. Although there are cases where more
than two strategies were used, grouping them in pairs enables simple catergorization
of subgroups for subsequent comparative analysis. Approximately one-third of all the
cases (32%) combine direct training approach and participatory ECB (Venn diagram
A). This figure shows that practitioner preferences in ECB strategy are almost equally
divided into three groups: (1) those that combine direct training and participatory
approach, and those that conducted ECBs either as (2) exclusive direct training or (3)
participatory strategy only. This finding is important in the sense that ECBs can be
categorized by strategy for comparison, for instance with respect to their measurement
practices. This categorization may provide some basis for determining whether the
choice of strategy is related to measurement practices.
Only 4 of the cases (as shown outside the two circles in Venn diagram A) do
not use direct training or participatory ECB. These cases considered at ECB as
establishing evaluation system and processes within an organization, rather than
through teaching.
96
Figure 4.8 Pairwise Combination of ECB Strategies
(Direct Training Participatory)‘
Direct
Training
19
Participatory
20 20
4
(Direct Training Technical Assistance)‘
Direct
Training
Technical
Assistance
25 14 14
10
(A) (B)
(Direct Training Learning Materials)‘
Direct
Training
Learning
Materials
8 31
5
19
(Participatory Technical Assistance)‘
Participatory
Technical
Assistance
12
28
11
12
(C) (D)
(Participatory Learning Materials)‘
Participatory
33
17
Learning
Materials
7
6
(Technical Assistance Learning Materials)‘
12 Technical
Assistance
30
Learning
Materials
9 4
(E) (F)
97
For example, these are the organizations that believe in the diffusion of evaluation
culture through the influence of evaluation champions or evaluation learning groups.
This is an approach that adheres to a belief that evaluation capacities are not
necessarily products of direct training or participatory evaluation.
The ECB cases that used the participatory approach only without direct
training can be described as ―opportunistic ECB‖. The term ―opportunistic‖ is used in
a positive sense here, connoting the desire of those in the evaluation profession to
teach about evaluation whenever there is opportunity to do so. Often, these
opportunities arise during an evaluation activity. ECBs of these types are ―add ons‖ to
evaluation activities and are generally not reflected in the evaluation contracts.
Rather, they are a form of professional obligation that any evaluator might typically
undertake voluntarily.
Almost 50 percent, that is 28 of the 63 cases, used technical assistance and
participatory approach for ECB (Venn diagram D). This is closely followed by
technical assistance and direct training approach with 25 of the 63 cases (Venn
Diagram B). This shows that technical assistance appears to be a basic feature of ECB
that practices direct training and participatory approaches. These forms of technical
assistance include consultations, coaching or mentoring using face-to-face, telephone
or online communication media.
The data on the use of supplementary learning materials may not give an
accurate picture of this strategy (Venn Diagram C, E and F). These data only record
cases that explicitly mention the use of training manuals, activity guides, templates or
web based interactive resources. Learning materials maybe used but not mentioned
during direct training or participatory strategies.
98
The mode of strategies refers to the media of communication used during
the conduct of ECB. Table 4.15 shows the distribution of these preferred modes. It
appears that the face-to-face mode is the most preferred approach. It comprises of 95
percent of the ECB activities, that is, face-to-face only plus face-to-face combined
with other modes. Other modes include telephone and online meetings or conferences.
Only three cases of ECB used this remote ECB delivery mode only. This result tells
us that ECB delivery is far from fully adopting distance learning methodologies. The
modern virtual communication technologies appear to replicate the interaction needs
for learning purposes, at least within this sample.
Table 4.15 Mode of Strategies Reported
ECB Delivery Mode Number of ECB
Reports
Percentage
Face-to-face only 34 54%
Face-to-face combined with other modes 26 41%
Other modes not including face-to-face 3 5%
Assuming quality, content and strategies were equal, teaching ―dosage‖ may
have a significant impact on learning. More time devoted to ECB may translate to
better ECB outcomes. In this investigation, it seems that ECB appears to engage in
training of significant duration (Table 4.16).
99
Table 4.16 ECB Contact Duration
Contact Duration Number of ECB
Reports
Percentage
One day or less engagement 1 2%
Single 2-3 day engagement 8 13%
Multiple times a year or multiple
years engagement
51 81%
Note: Three (3) cases did not give an indication of ECB duration
The majority of the published ECBs (81%) engaged the ECB activities
multiple times in a year or once a year for multiple years. Only about 15 percent of
the cases had less than three days of engagement. This means that most ECB
negotiations in this sample had the underlying understanding that ECB requires
sustained engagement and goes towards demonstrating that people see it as a long
process. The distribution of duration indicates a recognized need for multiple
engagements for a longer period of time. Most of these engagements were periodic
and coupled with technical assistance in between.
In summary, ECBs were either implemented as direct training, indirect
participatory evaluation learning or a combination of both with follow through
technical assistance engagements. The majority used face-to-face ECB delivery mode
and longer duration of engagement for multiple times a year or multiple years. Only
about one-third focused on improving organizational evaluation capacities compared
with the majority that focused on improving individual evaluation capacities. These
descriptions and categorization of ECB content and implementation along with the
100
ECB contextual profiles presented earlier in this section, will provide a basis for
subsequent statistical analysis in this report.
Answer to Research Question 1A
What are the content and implementation approaches of ECBs found in
published ECB reports?
In the field practice of ECB, as represented by the sample of this
investigation, ECB content and implementation tended to focus more on individual
evaluation capacity building compared with building organizational evaluation
capacity. ECB content and implementation include building fundamental knowledge
and skills in carrying out evaluations at the program level of organizations. Only a
few ECB cases focused on organizational evaluation capacity including systems,
processes, support, and evaluation culture. The majority of the reports appear to limit
ECB to individual evaluation improvement. The ECB implementation is equally
divided into those that conducted direct training, indirect participatory learning or a
combination of both. Most ECB efforts incorporated technical assistance and the
majority favoured face-to-face engagement for a longer training duration. Most ECBs
involved program staff and managers compared with a few that involved the
organization leadership and program beneficiaries.
101
Research Question 1B
What is the rigor of measurement practice in published ECB reports?
The operational definition of rigor in this study refers to how well and how
robust were the measurements carried out in ECBs. An instrument was developed
(Appendix B) to determine the rigor of ECB measurement practice based on
measurement criteria that Braverman and Arnold (2008), and Braverman (2013)
suggested. However, this research question does not apply to all ECB cases as only 22
(14 of 63) percent of the ECB cases reported measurement of ECB outcomes (Figure
4.9). It follows that answers to this research question refers only to this subset of ECB
cases. This section examines how these measurements were carried out with respect
to rigor standards of measurement.
Figure 4.9 ECB Outcomes Measurement
22%
78%
Measured ECB outcomes
Did not measure ECB
outcomes
102
Table 4.17 outlines the criteria on which ECB measurement practice can be
examined. The criteria include those that were outlined by Braverman and Arnold
(2008) and Braverman (2013), such as the scope of the variables measured, how
evidence was obtained, the reliability characteristics of the measurement tools used,
the intended utilization of ECB measures, representativeness, timing of measurement,
validity of inference and the measurement design used. The table provides columns
for comparison of percentages of the cases relative to the small sample, those that
measure their ECB outcomes, and those relative to the total number of ECB cases
examined. The comparison percentage columns show that the practice of ECB
outcomes measurement has not been a common practice relative to the number ECB
cases in this study sample. Before presenting through the details of the measurement
rigor criteria for ECB, the validity and reliability of the instrument used for this
assessment is discussed in the following section.
Validity and Reliability of the Rubrics
The content validity of the rubrics for rating the rigor of measurement in
ECB cases is based on the criteria that Braverman (2013) outlined in considering what
constitutes good measurement practice. The process of developing this tool also
included a review of each rubric item by a panel of practitioners in assessment.
Details of the development of the rubrics to establish their content validity was
discussed in Chapter 3. The Cronbach‘s Alpha, a measure of reliability of the items, is
0.83 which indicates high reliability. This provides confidence that this instrument to
measure the rigor of ECB measurement practice is stable enough to produce the same
measurement results 83 percent of the time different samples from the same
population of ECB cases is used.
103
Table 4.17 Rigor of ECB Measurement Practice
Criteria and Items
Nu
mb
er o
f E
CB
Ca
ses
Percentage
Relative to
EC
Bs
wit
h O
utc
om
es
Mea
sure
men
t ,
N =
14
All
EC
B C
ase
s, N
= 6
3
Scope of variables measured
Measured individual’s evaluation capacity (which
may include awareness, knowledge, skills or
attitudes).
10 71% 16%
Measured the organizational evaluation capacity
that includes evaluation leadership, policies,
systems, resources or structures.
2 14% 3%
Measured the organization’s contextual measures
such as social climate, learning capacity, culture
or social network.
1 7% 1.5%
Obtaining Evidence
Indirect measurement: testing such as self-report
or self-rating only.
7 50% 11%
Direct measurement: obtained by direct testing or
observation only.
0 0% 0%
Combination of indirect and direct measurements. 7 50% 11%
Reliability of Measurement Tools
Uses tools with unreported or unmeasured
reliability.
10 71% 16%
Uses tools with reported reliability and within
acceptable values.
3 21% 5%
Uses standardized or validated measurement
instruments.
0 0% 0%
Representativeness
The measurement used non-probability sample. 0 0% 0%
The measurement used probability sample with
random sampling techniques.
0 0% 0%
The measurement used all case units or most cases
of the population of interest.
13 93% 21%
104
Continuation of Table 4.17
Criteria and Items
Nu
mb
er o
f E
CB
Ca
ses
Percentage
Relative to
EC
Bs
wit
h O
utc
om
es
Mea
sure
men
t ,
N =
14
All
EC
B C
ase
s, N
= 6
3
Intended Utilization of ECB Measures
ECB measures are used to establish baseline
information to inform ECB design.
0 0% 0%
ECB measures guide ECB implementation. 0 0% 0%
ECB measures are used to evaluate ECB impact. 12 86% 19%
Timing of Measurement
Measurements were only made once at the
beginning or at the end of ECB project.
2 14% 3%
Measurements were made at the beginning and at
the end of ECB; may include measures during
ECB.
9 64% 14%
Measurements were made over an extended period
of time after ECB to see changes in the long term.
3 21% 5%
Validity of Inference from Obtained Measures
The conclusions are at best anecdotal with
descriptions of evaluation capacities but no
measures to back up claims.
0 0% 0%
Descriptions of evaluation capacities were made
and backed up by figures from measures; may also
extend to comparing measures.
4 28% 6%
Conclusions were carried out with sound measures
and statistical procedures that warrant statistical
inference, e.g. hypothesis testing or modeling.
10 71% 16%
Measurement Design Used
The measurement design use simple observational
method (no control or comparison groups and no
randomization of case units made).
2 14% 3%
The measurement design used comparison groups
but lacks random assignment.
11 79% 17%
The elements of experimental design are present
with control and comparison groups and random
assignments made.
1 7% 1.5%
105
Scope of Variables Measured
Considering the individual items of the rubrics, with regards to the scope of
variables measured, the majority (71%) of the ECB cases that measured ECB
outcomes focus on individual evaluation capacity such as evaluation awareness,
knowledge, skills and attitudes. Less than 30 percent of the cases measured outcomes
that pertain to organizational evaluation capacities. In this rubric item, the researcher
has made an error in assuming that ECB ―awareness‖ is in the same category as
evaluation ―knowledge, skills and attitudes‖. In the following analysis, particularly in
the analysis of the ECB content topics for Research Question 2, it is shown that
―evaluation readiness and awareness‖ appear to be of a separate sub-domain distinct
from knowledge, skills and attitude. This reveals an error in the rubric item
development, - wherein a single item description covers multiple characteristics. In
this case, there should have been a separate item, so that each item in the rubric
criterion could carry a singular description or idea. Nevertheless, the scope of
variables being measured in practice shows an inclination to measure individual
evaluation capacity variables.
Obtaining Evidence, Reliability of Measures and Representativeness
Of the ECB cases that reported their ECB measurement outcomes, 50
percent used predominantly indirect measurement approaches, such as self-reports
and self-rating to obtain evidence. The remaining 50 percent used a combination of
these indirect measurement along with direct testing or observation. Frequently, the
scenario was as follows: before or after an ECB workshop participants were asked to
complete survey questionnaires. The surveys conducted prior to an ECB activity
usually focused on obtaining baseline information regarding evaluation knowledge,
106
skills and attitudes of the participants. Immediately after the ECB activity, the
participants were asked to rate the positive ―change‖ they believed they had gained as
a result of the ECB activity. Descriptions of these ―evaluations‖ of ECB included
feedback with regards to the training implementation, such as the degree of
satisfaction of the attendees, the relevance of the topics to their evaluation needs, and
the quality or effectiveness of the ECB facilitators. This implies that this approach of
evaluating an ECB is only limited to the ECB workshop, the training or the
engagement that was carried out. While this approach has its merits, it appears to be
myopic and focused only on the ECB activity, rather than an evaluation of the
programmatic concept of an ECB as an intervention.
Only 21 percent of the ECBs that reported measurements provided reliability
measures of their measurement tools. The remainder did not report the reliability for
their measures. Often, these measurement tools were developed by the ECB
practitioners. This is an indication that the measurement practice with regards to the
quality of the measurement tool, often in survey questionnaire form, has not been
given emphasis.
All of the ECB cases that reported ECB outcomes measurement reported
their data using all or most of the responses of the ECB participants, based on surveys
conducted immediately after the ECB training or workshop. There were no cases that
used the sampling approach to gather data. In this respect, the ECBs performed
relatively well with regard to representativeness of the population of interest in their
reports.
107
Intended Utilization of ECB Measures and Timing of Measurements
Concerning the intended utilization of ECB measures, 86 percent sought to
measure the impacts of ECB, although ―impact‖ in this case only refers to the ―before
and after‖ effects of the ECB training. The information provided by ―timing of
measurement‖ data also supports this observation. Most measurements (64%) were
carried out before and after the training. Few cases (21%) extended measurement
activities longer after the training to ascertain the long term effects. These extended
measurements are most likely organized at the negotiation stage of ECB engagement
as these entail resource use and cost. Most ECBs end their ECB outcomes
measurements immediately after the ECB activities cease.
Measurement Design and Validity of Inference
The reports also indicate high competence of ECB practitioners with
regards to the proper use of statistical inference and measurement design. In most
cases, ECB measurements are limited to quasi-experimental and observational designs
as opposed to entirely experimental design (although one case managed to do this).
Most of the reports (78%) were carried out with sound measures and statistical
procedures that warranted statistical inference. The remainder was able to provide
descriptions of evaluation capacities backed up by figures and even extended to use
statistical comparisons.
Overall Rigor of Measurement
Table 4.18 provides a summary of scores of the rigor of measurement
practice in ECB using the criteria presented above. Using the rubrics developed in this
study to rate the performances of these ECB outcomes measurement, the weighted
108
means of the scores were calculated. The score bands used are defined as: Low: 1 –
2.33; Moderate: 2.34 – 3.66; High: 3.67 – 5; to aid interpretation, although a potential
problem here is the arbitrariness of the cut-off scores and its assigned qualitative
description. However, the purpose here is to provide an objective assessment using
the rubrics for the relative performances of the ECB to evaluate the weighted scores
against logical progression from low to high. The weighted means reveal that in the
identified criteria and the levels of rigor, these are mostly at the moderate level. Only
the ―representativeness‖ of the measurements scored high. The performances are low
for ―scope of variables measured‖ and the ―reliability of measurement tools‖. These
results imply that aside from the fact that few of ECB practitioners reported ECB
outcomes measurement, the quality of how measurements are carried out needs to
improve.
Table 4.18 Rigor of ECB Measurement Practice
Criteria Mean Score
Scale: 1 to 5
Level of Rigor
Representativeness 3.67 High
Validity of inference 3.61 Moderate
Utilization of measures 3.42 Moderate
Timing of measurement 2.76 Moderate
Obtaining evidence 2.67 Moderate
Measurement design 2.57 Moderate
Scope of variables measured 1.61 Low
Reliability of measurement tools 1.52 Low
Scale: Low: 1 – 2.33; Moderate: 2.34 – 3.66; High: 3.67 - 5
109
Answer to Research Question 1B
What is the rigor of measurement practice in published ECB reports?
Rigor of measurement practice is defined in terms of the criteria established
by good measurement practices. The approach to answer this research question was to
develop rubric items for each of the criteria and then subjecting the developed
instrument to a process of ensuring content validity and acceptable reliability measure
(Cronbach‘s Alpha = 0.83).
Measurement results reveal that for the subset of ECBs (22%) that measure
their ECB outcomes, the scope of variables measured were mostly at the individual
capacity level outcomes. Results further show that in this small subset of ECB cases,
there is a need to be transparent with the psychometric properties of the instruments
being used. Representativeness ranked high in that most ECB measures involved most
of the participants in ECB. In terms of validity of inference and utilization of
measures the ECB initiatives performed moderately high. Overall, there is some level
of acceptable rigor in terms of how the ECB outcomes measurements were carried
out. However, there is room for improvement in identifying the scope of the variables
being measured which is strongly linked to the ECB content construct as well as
establishing the psychometric properties of the measurement tools used.
Research Question 1C
What determines practice of measuring ECB outcomes?
The ECB cases investigated in this study can be categorized into two: those
that measured ECB outcomes and those that did not measure ECB outcomes. About
110
two-thirds of published ECBs did not report outcomes measurement. To answer the
question, ―Why are ECB outcomes not measured?‖ one approach would be to explore
which possible predictor variables influence measurement practice.
Having taken stock of ECB profile characteristics such as content,
implementation and context variables as prospective independent variables, the
decision to measure or not to measure ECB outcomes can be considered as the
dependent variable. This dependent variable, that is, the practice of measurement:
those that measured ECB outcomes or those that did not, is a dichotomy. In this case
the Binary Logistic Regression analysis allows for the possible examination of
relationships with a binary dependent variable and numerical or categorical predictor
variables. The aim of the analysis is to determine which variables determine the
decision to measure ECB.
The Binary Logistic Regression results on predictor variables are shown in
Table 4.19 for publication profile variables, Table 4.20 for the ECB context variables
and Table 4.21 for implementation variables. Preliminary inspection showed that
these independent variables were correlated. To remedy this situation, the analysis
was conducted using the simple binary logistic regression approach for each variable
to identify which variables showed significant influence. Binary Logistic Regression
needs sufficiently large sample size to predict well; this study did not meet that
requirement. Thus, the resulting predictions may not be as robust. The intent of this
analysis is to determine the potential influencing variables rather that providing a
robust prediction model. The predictive aspect of the model could be improved as
additional sample cases could be added in the future.
111
From the publication profile variables (Table 4.19), the analysis reveals that
the propensity to measure ECB outcomes is independent of the journal in which these
reports are published. These publication variables, the year and country, were
included in the analysis to detect possible temporal and geographical trending that
could possibly influence to decision to measure. The journals were grouped into two
categories, those that focus on evaluation and those from other disciplines that
reported ECBs. This means that journal type is unrelated to, and does not influence,
reports on ECB outcomes measurement. This shows that there is no bias in where an
ECB is published – evaluation or non-evaluation journal.
Similar conclusions can be made for the year of publication and country in
which the ECBs are published. There is no evidence that earlier or later published
ECBs were more likely to measure ECB outcomes. Thus, there is no discernible trend
with respect to time regarding ECB measurement practice. Unlike the rise of ECB
practice, measuring ECB outcomes appears to be independent of that trend. On the
other hand, ECBs published in the U.S. and outside the U.S. make no distinction with
respect to the prospect of measuring ECB outcomes as well.
Table 4.19 Simple Logistic Regression Analysis: Publication Profile and
Decision to Measure
Predictor Variable B Wald 2 P Exp(B) Prediction
Base %:77.8
Journal (Evaluation/Non-evaluation) 0.592 2.217 0.137 2.591 NS
77.8
Year (1978 to 2013) -0.097 3.506 0.061 0.908 NS
81.0
Country (US/Outside US) -1.159 1.996 0.158 0.314 NS
77.8
NS: not significant
112
Table 4.20 Simple Logistic Regression Analysis: ECB Context Profile and
Decision to Measure
Predictor Variable B Wald 2 P Exp(B) Prediction
Base %:77.8
ECB Domain1
Education 0.231 0.131 0.718 1.259 NS 77.8
Health 1.045 2.748 0.097 2.842 NS 77.8
Community Development -1.325 1.471 0.225 0.266 NS 77.8
Research -20.08 0.000 0.999 0.000 NS 77.8
Child and Youth Development 0.938 0.938 0.333 2.556 NS 77.8
Type of Organization2
Non-profit -1.273 0.742 0.389 0.280 NS 77.8
Government -1.609 1.079 0.299 0.200 NS 77.8
School or School District -0.847 0.290 0.590 0.429 NS 77.8
University3 - 1.429 0.839 - 77.8
For-profit -21.20 0.000 1.000 0.000 NS 77.8
Type of Program Delivery
Services 0.875 1.122 0.290 2.400 NS 77.8
Education/Capacity Building -0.300 0.228 0.633 0.741 NS 77.8
Advocacy 0.192 0.066 0.797 1.212 NS 77.8
Research -0.931 0.707 0.401 0.394 NS 77.8
Number of organization 0.875 1.943 0.163 2.400 NS 77.8
Number of programs 0.192 0.084 0.773 1.212 NS 77.8
Number of sites -0.875 1.185 0.276 0.417 NS 77.8
Affiliation of facilitators
University 0.463 0.401 0.527 1.589 NS 77.8
Private Consultancy 0.486 0.274 0.600 1.625 NS 77.8
Interval evaluation unit 0.140 0.047 0.828 1.151 NS 77.8
Methodological paradigm
Qualitative -21.89 0.000 0.356 0.997 NS 88.9
Quantitative 3.157 12.350 0.000 23.50 ** 85.7
Combined Methods 2.175 9.302 0.002 8.800 ** 81.0
1: Multiple response, analyzed separately; 2: Mutually exclusive: single analysis, categorical case; 3:
parameter base of a single categorical model, no estimate output; NS: Not significant; **: significant
at 0.01 level of significance.
In a similar manner, Table 4.20 presents the simple logistic regression
analysis of the decision to measure ECB outcomes with respect to ECB context
variables as regressors. ECB characteristics such as domain, type of organization, type
of program delivery, number of organization, number of programs, number of sites
and affiliation of facilitators are found to be independent of the likelihood to influence
113
decision to measure ECB outcomes. Not surprisingly, the methodological paradigm of
an ECB report influences this likelihood. Upon closer inspection, quantitative
methods and combined methods are most likely to provide ECB outcomes
measurement. The binary regression model can correctly predict 85.7 percent and 81
percent of the cases, respectively. The qualitative methods, although not a significant
predictor can provide correct prediction 88.9 percent of the time in the opposite
direction (negative Beta, -21.89) which means that qualitative reports are most likely
not to report ECB measurements.
Table 4.21 Simple Logistic Regression Analysis: Implementation and Decision
to Measure
ECB Delivery B Wald 2 P Exp(B) Prediction
Base %:77.8
Teaching strategy
Direct Teaching 0.731 1.348 0.246 2.077 NS
77.8
Participatory -20.47 0.000 0.998 0.000 NS
77.8
Combined Methods 1.527 5.705 0.017 4.606 * 77.8
Mode of strategy
Face-to-face -0.575 0.883 0.347 0.563 NS
77.8
Face-to-face and other modes 0.831 1.821 0.177 2.296 NS
77.8
Other modes -20.01 0.000 0.999 0.000 NS
77.8
Contact duration -0.333 0.237 0.626 0.717 NS
76.7
Intended target change
Individual change -0.724 1.385 0.239 0.485 NS
77.8
Organizational change -19.99 0.000 0.999 0.000 NS
77.8
Combined 0.916 2.177 0.140 2.500 NS
77.8
Participant focus
Staff -2.241 5.763 0.016 0.106 * 81.0
Managers 0.300 0.228 0.663 1.350 NS
77.8
Leadership 0.670 0.731 0.392 1.955 NS
77.8
Beneficiaries 0.329 0.262 0.608 1.389 NS
77.8
NS: Not significant; *: significant at 0.05level of significance.
114
The implementation variables of ECB reveal some significant findings.
Teaching strategy matters when it comes to predicting the likelihood of measuring
ECB outcomes. Recall that when the ECB cases were grouped into three mutually
exclusive subgroups with respect to teaching strategy, they divided almost equally
(Figure 4.10). The binary logistic regression analysis reveals that implementation
characteristics influence the likelihood of the measurement of ECB outcomes. In
particular, those that approach teaching ECB as an eclectic approach of both direct
training and participatory approaches (intersection of the circles in the Venn-diagram)
are more likely to measure ECB outcomes compared with participatory alone or direct
training alone.
Figure 4.10 ECB Teaching Strategies
Participant focus is another implementation characteristic associated with
the measurement of ECB outcomes. ECBs that focus on ―staff‖ evaluation capacity
development are more likely to measure ECB outcomes. ECBs with other participant
focus categories such as ―program managers‖, ―leaders‖ and ―beneficiaries‖ do not
appear to influence the decision to measure ECB outcomes. Thus, ECB cases that
(Direct Training Participatory)‘
Direct
Training
19
Participatory
20 20
115
have the program staff as a participant focus of ECB teaching are most likely to
present measurement of ECB outcomes, compared with programs that include
leadership, managers and beneficiaries.
Answer to Research Question 1C
What determines the practice of measurement in ECB?
The binary category of the decision to measure ECB outcomes: those ECB
cases that measured and those that did not, allows for binary logistic regression
analysis to examine what variables determine this practice. Findings reveal that
―methodological paradigm of the ECB report‖ and ―participant focus‖ are likely
determinants that influence measurement of ECB outcomes. In particular, those ECB
practices that focus on combined teaching strategy and teaching program staff have a
higher likelihood of measuring their ECB outcomes. Methodological paradigm as a
determinant suggests that ECB cases using combined methods of quantitative and
qualitative and those that use solely quantitative methods have a high propensity to
measure ECB outcomes.
Answer to Research Question 1: How can ECB measurement practice be
described from empirical evidence?
ECB measurement practice can be described by first understanding the
context, content and implementation characteristics of the ECB initiatives. ECB
content and implementation tended to focus more on individual evaluation capacity
building compared with building organizational evaluation capacity. Only a few ECB
cases focused on organizational evaluation capacity building that would include
116
building systems, processes, support, and evaluation culture. ECB implementation
approaches used direct training, indirect participatory learning or a combination of
both. Most ECBs targeted individual evaluation capacity change compared with
organizational evaluation capacity change, and mostly involved program staff and
managers compared with a few that involved the organization leadership and program
beneficiaries.
Overall, ECBs tended to overlook the measurement of ECB outcomes. Only
22 percent of the examined cases measured ECB outcomes. Of those that measured
ECB outcomes, there was some level of acceptable rigor with regard to how the
outcomes measurements were conducted. However, there is room for improvement in
identifying the scope of the variables being measured which is strongly linked to the
ECB content construct as well as establishing the psychometric properties of the
measurement tools used. Those that measured ECB appeared to be influenced by the
methodological paradigms to which the practitioners adhere.
For ECB and evaluation practice, these results mean that when it comes to
ECB measurement, not much groundwork has been laid. The ECB practice has to
move from individual and program level focus to organizational level focus if the aim
of ECB is to effect organizational change with respect to making evaluation
mainstream and improve organizational outcomes. While ECB program theory
assumes both the roles of individual level and organizational level as targets of the
learning intervention, in practice, evidence suggests that ECBs are skewed to
individual level focused.
117
Research Question 2
Is there evidence to demonstrate that:
Research Question 2A: ECB content follows as unified learning construct
and a possible progressive structure?
Research Question 2B: ECB content could be grouped in specific ways?
In order to understand how ECB should be assessed as a learning
intervention, it is important to explore the construct that is ECB. That is, if ECB is a
single construct then all the ECB content topics delivered during an ECB initiative tap
into this construct. The term construct in this study refers to the concept of evaluation
capacity that could not be directly observed but can be explained by observable
phenomena – such as the ECB content topics that practitioners assumed to be the
components that demonstrate evaluation capacity. It could also be possible that this
ECB construct is a higher order construct which suggests that it may have several
domains that constitute it.
Therefore, this study attempts to examine two aspects with respect to ECB
learning content. First, it investigates whether the ECB content topics as delivered in
practice follow a unified learning construct. This is carried out by using the Item
Response Theory (IRT) analysis which will quantitatively reveal whether the ECB
topics identified from the ECB sample data adhere together as one latent entity, that
is, the unobservable unified concept that can be inferred from observed indicators.
The IRT analysis could further reveal whether these content topics form a hierarchical
or progressive structure in terms of ―difficulty‖. Second, it is possible that the ECB
topics could be grouped in specific ways such that they constitute sub-domains of
ECB construct. This can be revealed by employing exploratory factor analysis. This
118
quantitative technique can assist in determining whether the ECB content topics
examined can be organized into several factors or components.
Item Response Theory (IRT) Analysis
Central to the idea of IRT is the assumption of unidimensionality. This
analysis runs by examining the empirical data to ascertain whether they fit this
assumption. Hence, data in the form of items are examined statistically to whether
they fit the assumed model. In this study, the items are the ECB content topics
documented and coded for the analysis. Furthermore, another important notion in IRT
is the concept of ―difficulty‖. The analysis could organize the items that fit into a
latent construct into hierarchical order of difficulty (Hambleton & Swaminathan,
1985), however, in this research context, it is used to describe the ECB content topic
―level‖ in terms of what is most likely to be delivered in an ECB activity, given the
ECB practitioner‘s ability, among other factors, to deliver such content topic. For ease
of reference, the term item difficulty as a common terminology in IRT will be referred
to as ―ECB developmental proficiency‖ in this study. The Item Response Analysis in
this study uses the Rasch Model on which empirical data can be examined and
confirmed if each ECB content topic fits into a single latent construct.
The approach to this analysis begins with the identification of ECB topics
that were delivered in an ECB activity as reported by the ECB cases examined in this
study. The checklist of ECB content topics that determines which ECB cases
delivered such topics forms a matrix of binary data set. A binary data set is simply a
set that could assume the values of either Zeros or Ones: One representing ‗yes‘ while
Zero represents ‗no‘. The resulting record of Zeros and Ones could be considered as
119
the ‗response matrix‘ of ECB cases with respect to ECB content topics delivered. This
situation precisely allows for the use of Item Response Analysis. The analysis can
yield two things: (1) the determination whether the content topics consist of a single
construct through an analysis of the fits statistics and (2) the ordered ranking of the
ECB topics in a developmental continuum through the IRT ―person-item‖ map
referred to here as ―ability-proficiency‖ map.
The analysis yields estimates of an ordered ranking of ECB content topic by
levels of ―ECB developmental proficiency‖ and levels of ―ability‖ for an ECB case,
representing practitioners‘ ability to deliver the topic. This estimation approach is
considered to be an advancement over the traditional item difficulty and population
ability estimation in that the estimates are mapped into the same scale and units
(Hambleton & Swaminathan, 1985). The ―ability-proficiency‖ graph would provide
information as to what ECB topics are of low or high order in a progression scale with
respect to ECB practitioner‘s ability to deliver. Thus, the analysis results provide a
picture of how the overall ECB content delivery performs with respect to these ECB
content topics.
Figure 4.11 shows the IRT analysis ability-proficiency graph. This graph is
conventionally termed ―ability-difficulty‖ distribution on a logistic scale. The ―ECB
content topics‖ refers to the ―items‖ while the population displaying the ―ability‖ in
the scale refers to the ECB cases in the sample, representing ECB practice. This
graph provides a picture of how ECB cases perform (ability) in relation to an ordered
level of ―developmental proficiency‖ of the ECB topics (items). The items are
positioned on the right side of the graph. The left side of the graph shows the
horizontal bars formed by Xs which represent the ECB cases.
120
Figure 4.11 ECB Cases and ECB Developmental Proficiency
Low Level Topics
Moderate Level Topics
High Level Topics
121
The graph provides information not only with regards to how the content
topics are scaled in levels of proficiency, but also with regards to the relative count of
ECB cases that have the abilities to perform at this level. It should be noted that the
scale ranges from -3 to +2 for both content topics and ability score: those topics that
are easiest to deliver are those approaching -3, while those that are most difficult to
deliver are approaching +2.
To interpret the meaning of the graph, the position at a particular score
indicates which topics in the ECB cases have a 50 percent chance of being delivered.
For example, the ECB cases with ability score ―0‖ means these are ―50 percent likely
able to deliver ECB content topic numbers: 8, 7, 3, 2, 5, 6 and 4‖ (See Table 4.21 for
item numbering reference). Thus, another way to interpret is that: ECB cases have
less than a 50 percent chance of delivering ECB content topics positioned above a
particular ability score. Content topics positioned below a particular ability score
means that ECB practitioners, for that ECB case, have high probabilities of delivering
those topics. To simplify interpretation, those ECB cases that fall within the bands
―high level topics‖, ―moderate level topics‖ or ―low level topics‖ are the ECB cases
that have about 50 percent chance of attaining such levels of ECB developmental
proficiency.
In Figure 4.11, the ECB content topics are organized in such a way that they
are ordered in terms of progressive proficiency level independent of the ECB cases
performance levels. This means that the topics can actually be grouped arbitrarily
independent of the population distribution on the left side of the graph. There is no
hard and fast rule in these groupings, however, considering the middle interval of the
scale between – 1 to +1 as the moderate group, the items or ECB topics can be
organized to three groups of ―high level‖, ―moderate level‖ and ―low level‖
122
proficiency ECB content topics. This is not an indication of popularity or the most
common topics, as frequency counts can readily provide. Instead, it indicates the
estimated ―intrinsic‖ proficiency characteristics of the topics independent of whether
or not these are popular of not. Table 4.22 gives the clustering list of the difficulty
levels of the ECB topics.
Table 4.22 Topic Number List and Levels of Developmental Proficiency
Topic
Topic
Number
Reference
High Level
Evaluation readiness and willingness 10
Creating or strengthening evaluation policy requirements 13
Organization evaluation practices 9
Building leadership support 11
Creating or strengthening support for evaluation resources 16
Moderate Level
Improving organizational evaluation social network 17
Evaluation awareness and attitudes 1
Building culture for evaluation 12
Creating or strengthening evaluation structure 14
Creating or strengthening evaluation systems 15
Program implementation 8
Program planning 7
Low Level
Logic models 3
Evaluation terms, approaches or methods 2
How to do an evaluation 5
Data management, analysis, interpretation or use 6
Evaluation Plan 4
This simple ordering of ECB content developmental proficiency with respect
to ECB practitioners‘ ability to deliver may provide an important developmental
progression structure of the ECB content being delivered. This is significant in the
sense that ECB learning outcomes expected and the competencies to be gained by the
ECB participants can be similarly structured. In this respect, ECB measurement
123
practice could be structured in a similar manner. This result may imply that there is a
possible structuring of the developmental progression on which existing ECB
outcome measurement tools can be positioned. This will allow practitioners to
examine whether there is a good representation in what were delivered and measured
in ECB with respect to this developmental progression scale.
The shape of the distribution of the ECB cases with respect to ECB
developmental proficiency continuum in Figure 4.11 reveals some notable findings.
Rotating the figure horizontally, the plot of the X‘s representing ECB cases included
in the study approximates the classic bell-shaped but slightly positively skewed
distribution. Judging from its position along the scale, the majority of ECB cases fall
within the logistic scores -1 and below which corresponds to the low level ECB topics
in the developmental proficiency continuum. This means that most of the ECB cases
delivered were at this low level category only. The distribution appears to be
positively skewed, however with sufficient data it could possibly extend to the high
level position in the scale filling the distribution at the positive end. It is also
discernible that this distribution is possibly bi-modal, or having two modes, as
indicated by the rise at the positive end. This could be a super-imposition of two
normal curves, the bigger one straddling on the low level part of the scale, and a small
normal curve positiones between the moderate and high level scales. This indicates
that a small sub-group of ECB cases have focused on topics in the upper level of the
ECB developmental proficiency continuum.
The empirical data fits well with the Rasch Model in this Item Response
Analysis. Table 4.23 presents the mean square (MNSQ) fit statistics for each of the
ECB content topics. Results show that all items (ECB content topics) have MNSQ
values within the confidence intervals (T-values <|2|). Since the ECB content topics
124
are a combination of individual and organizational evaluation capacities, this result
seems to suggest that this divide blurs when developmental progression is considered.
Table 4.23 Item Mean Square Fit Statistics
Item
Weighted Fit
MNSQ CI T
High Level
Evaluation readiness and willingness (10) 1.07 (0.02,1.98) 0.3
Creating or strengthening evaluation policy requirements (13) 0.90 (0.19,1.81) -0.1
Organization evaluation practices (9) 0.89 (0.31,1.69) -0.2
Building leadership support (11) 0.86 (0.30,1.70) -0.3
Creating or strengthening support for evaluation resources (16) 0.94 (0.39,1.61) -0.1
Moderate Level
Improving organizational evaluation social network (17) 0.95 (0.45,1.55) -0.1
Evaluation awareness and attitudes (1) 1.11 (0.54,1.46) 0.5
Building culture for evaluation (12) 0.89 (0.58,1.42) -0.5
Creating or strengthening evaluation structure (14) 0.83 (0.58,1.42) -0.8
Creating or strengthening evaluation systems (15) 1.18 (0.65,1.35) 1.0
Program implementation (8) 1.04 (0.68,1.32) 0.3
Program planning (7) 1.14 (0.74,1.26) 1.0
Low Level
Logic models (3) 0.80 (0.78,1.22) -1.9
Evaluation terms, approaches or methods (2) 0.98 (0.77,1.23) -0.2
How to do an evaluation (5) 1.14 (0.75,1.25) 1.0
Data management, analysis, interpretation or use (6) 1.04 (0.75,1.25) 0.3
Evaluation Plan (4) 0.96 (0.69,1.31) -0.2
Note: Estimation using ConQuest Software: Chi-square Test of Parameter Equality = 468.71, df=16,
Sig level =0.000; Separation reliability = 0.965.
What is being measured in ECB?
As stated earlier, most of ECB cases in the study sample did not measure
ECB outcomes (Figure 4.12). This means that the answer to the question, ―What is
being measured in ECB?‖ refers to only a subset of the total cases under investigation.
In the first research question, the ECB content and implementation were described,
with the answers derived from the majority of cases. The details of ECB content and
the way ECBs perform in delivering that content were also examined. The primary
125
reason for clarifying about content details is to connect ―what has been delivered‖ and
what ―learning outcomes‖ should be measured.
Figure 4.12 ECB Outcomes Measurement
Learning outcomes are referred to as ECB outcomes, as distinct from the
program outcomes of organizations. It is imperative to distinguish between these two
programmatic outcomes. ECB outcomes refer to outcomes of ECB as an intervention,
as opposed to program outcomes from various intervention programs that
organizations deliver. Ultimately, the improvement of organization program outcomes
is the end goal for which ECB intervention is believed to be a solution. It is
unfortunate that data is meager when it comes to reports of measurements of ECB
outcomes in this study sample. Nevertheless, making use of the available information,
it is possible to describe what is being measured in ECB, based on the small sample.
Table 4.24 documents the reported measurement of ECB outcomes. ECB
practitioners provide these reports as an evaluation of their ECB initiatives. In the
sample, there were 14 ECB cases that reported their ECB outcomes. For definitional
clarity, measurement here refers to the ―changes‖ in various metrics defined and
22%
78%
Measured ECB
outcomes
Did not measure
ECB outcomes
126
observed by those who reported their ECB activities. Often, these measures are self-
reports by participants carried out almost immediately after an ECB training or
workshop.
Table 4.24 is called a ―presence-absence‖ matrix of reported ECB outcomes
with respect to ECB content topics organized in the Item Response Analysis
(Assessment and Learning Partnerships, 2012). It can be recalled from the previous
section that these content topics are organized according to levels of developmental
proficiency. The matrix identifies with a ―1‖ the ECB outcomes reported to be
measured in relation to the organized ECB content topics. The first column of the
matrix that contains ECB content topics serves as reference to match the ECB
outcomes reports. In a sense, this matrix confirms ECB outcomes measurement
validity; that is, it provides evidence that what was taught or delivered has been
measured. There would appear to be a dissonance when ECB practitioners teach or
deliver something and measure other things.
The findings reveal that the pattern of what is being measured in ECB
actually follows the pattern of the ranked ECB content developmental proficiency
(Table 4.22). That is, reported ECB measurements correspond to the difficulty pattern
produced by the Item Response Analysis of the ECB content topics. This is verified
by employing a tool called the Guttman chart (Table 4.25). The Guttman chart is
readily produced by ranking the total scores of the ECB cases. In this study sample,
the Guttman chart produced the classic ―diagonal pattern‖ when the ECB content
developmental proficiency ordering was used as a reference. This implies that the
reported ECB outcomes measurements had a similar developmental progression
structure. It shows how easy topics are most likely to be measured in practice while
the difficult ones are less likely measured.
127
Table 4.24 Presence-Absence Matrix of Reported ECB Outcomes with Reference to ECB Content
ECB Content Delivery1 Measured by Case Units during or after ECB
2 (N=14, 22% of Cases)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Total
High Difficulty Level
Evaluation readiness and willingness 0
Creating or strengthening evaluation policy requirements 1 1
Organization evaluation practices 1 1 2
Building leadership support 1 1 2
Creating or strengthening support for evaluation resources 1 1 1 1 4
Moderate Difficulty Level
Improving organizational evaluation social network 1 1 1 3
Evaluation awareness and attitudes 1 1 1 1 1 5
Building culture for evaluation 0
Creating or strengthening evaluation structure 0
Creating or strengthening evaluation systems 1 1 1 3
Program implementation 1 1 1 3
Program planning 1 1 1 3
Low Difficulty Level
Logics models 1 1 1 1 1 1 1 7
Evaluation terms, approaches or methods 1 1 1 3
How to do an evaluation 1 1 1 1 1 1 6
Data management, analysis, interpretation or use 1 1 1 1 1 1 1 1 1 9
Evaluation Plan 1 1 1 1 1 1 1 1 1 1 10
Total 7 4 5 0 6 2 2 0 10 5 3 9 4 4 1: Overall ECB content delivered;organized using IRT analysis from 63 cases.
2: Using 14 cases that reported ECB outcomes measurement.
128
The matrix also shows how proficient ECB cases, that is, ECBs with high
total scores, have a tendency to measure more outcomes. It needs to be mentioned at
this point that the ECB content developmental proficiency was estimated from the
entire ECB cases sample used in this study, while the Guttman ordering uses only the
subset of sample that reported ECB measurement. The outcomes reporting is coded
independently of reported content. The Guttman chart result confirms two things.
First, it shows that there is some kind of validation of the Item Response Analysis
organization of the content topics. Note that the set of ECB content topics is shown to
be a single construct and that the ECB outcomes measurement is independent data.
Second, it shows that there is a match between what are measured with what are being
delivered. The conclusion that can be drawn from this is that practitioners that
measured ECB outcomes know what they are measuring. Most importantly, the
results established the notion that there is a possible developmental progression
structure of the ECB content delivery and hence, measurement practice should to
follow this developmental progression.
Additionally, in the Guttman chart (Table 4.25) it can be noted that there are
ECB cases with zeros and very low total scores. This does not mean that these cases
did not evaluate their ECBs. This means that the measurement metrics used in their
evaluation report of the ECB outcomes differ from what was assumed in this
investigation. The assumption was that ―changes‖ would be measured with respect to
the competencies identified through the content topics delivered. The zeros and low
total scores reveal that other methods are used in some ECB cases to evaluate ECB
outcomes.
129
Table 4.25 Guttman Ordering of Reported ECB Outcomes with Reference to ECB Content
ECB Content Delivery1 Measured by Case Units during or after ECB
2 (N=14, 22% of Cases)
9 12 1 5 3 10 2 13 14 11 6 7 4 8 Total
High Difficulty Level
Evaluation readiness and willingness 0
Creating or strengthening evaluation policy requirements 1 1
Organization evaluation practices 1 1 2
Building leadership support 1 1 2
Creating or strengthening support for evaluation resources 1 1 1 1 4
Moderate Difficulty Level
Improving organizational evaluation social network 1 1 1 3
Evaluation awareness and attitudes 1 1 1 1 1 5
Building culture for evaluation 0
Creating or strengthening evaluation structure 0
Creating or strengthening evaluation systems 1 1 1 3
Program implementation 1 1 1 3
Program planning 1 1 1 3
Low Difficulty Level
Logics models 1 1 1 1 1 1 1 7
Evaluation terms, approaches or methods 1 1 1 3
How to do an evaluation 1 1 1 1 1 1 6
Data management, analysis, interpretation or use 1 1 1 1 1 1 1 1 1 9
Evaluation Plan 1 1 1 1 1 1 1 1 1 1 10
Total 10 9 7 6 5 5 4 4 4 3 2 2 0 0 1: Overall ECB content delivered;organized using IRT analysis from 63 cases.
2: Using 14 cases that reported ECB outcomes measurement
130
As an example of other methods uses, some cases focused on the evaluation
of outputs produced as well as evaluation systems established, rather than measuring
individual evaluation capacity changes that ECB purports to bring. This finding
implies that some ECBs in practice evaluated ECB outcomes through holistic product
outputs, such as the portfolio products in authentic assessment approaches, rather than
looking at individual or organization competency ―change‖ targeted by the ECB
activity (Table 4.26).
Table 4.26 Some Process and Outcome Areas Measured in Reported ECBs
Intention to use learning at the workplace
Level of exposure to training and materials
Number of evaluations carried out after the training
Quality of evaluation reports after the training
Quality of technical assistance
Quality of topics and learning tasks
Training satisfaction of participants
Utilization of evaluation outputs
These other ways of considering ECB outcomes, and some processes, reveal
that ECB outcomes measurement is not limited to the view of documenting individual
and organizational change as defined by the ECB content being delivered (Table
4.26). These include other factors such as the technical aspects and details of the
training, the performance of the ECB implementation, and the delivery of teaching. In
addition, there are ECB measurements that focus on the evaluation products and the
quality of these products. Some cases did not report individual or organizational
131
changes but rather reported on the improvement in the number and quality of
evaluation outputs, and the utilization of these evaluation products. These findings
show that measuring ECB outcomes is as diverse as the ECB conceptualization.
Factor Analysis: Multidimensional Assumption
Factor analysis allows for the exploration of the possible dimensions that
may provide further information regarding ECB content construct. This analysis looks
into the possible independent ECB content sub-domains rather than simply grouping
them into individual or organizational content topics. This exploration has important
implications for the pedagogical aspects of delivering ECB. For example, being able
to know what aspects of ECB have to be delivered is as important as the approach of
delivering them progressively.
The first step in factor analysis is to determine whether the data are suitable
for data reduction and how many factors or dimensions the ECB content construct can
be extracted. The scree plot shown in Figure 4.13 is a technique that determines how
many possible latent factors or dimensions underlie this set of ECB topics. The
―elbow‖ of the scree plot, where the plot begins to taper off, suggests the number of
suitable dimension groupings. The result identifies four factors corresponding to the
elbow of the scree plot.
132
Figure 4.13 Scree Plot for ECB Content
The factor loadings are shown in Table 4.27. Factor loadings less than 0.40
are not displayed in the table to facilitate the identification of topic groupings. The
values of these factor loadings mapped onto components provide the reorganization of
the ECB content topics into groups. The factor analysis details are shown in the
footnote of Table 4.27, satisfying sample adequacy index (Kaiser-Mayer-Olkin =
0.60) and factorability criterion (Bartlett‘s Sphericity p-value = 0.001). This means
that even though the sample size was small, it satisfied the sample size statistical
criterion for factor analysis. It also shows 35 percent correlation residuals above 5
percent. The Maximum Likelihood extraction method was employed with Direct
Oblimin rotation. The Direct Oblimin rotation was used with the assumption that the
factors (or sub-dimensions) in the ECB construct could be correlated (de Winter &
Dodou, 2012).
133
Table 4.27 Structure Matrix of the ECB Content Using Maximum Likelihood Method of Extraction and Direct Oblimin Rotation
Preliminary Factor Groupings and ECB Topics
Factor Loadings
1 2 3 4
Teaching Individual Evaluation Capacities
Evaluation awareness and attitude
Evaluation terms, approaches, or methods 0.466
Logic models 0.403 0.638
Evaluation plan 0.538
How to do an evaluation
Data management, analysis, interpretation or use 0.486
Program planning 0.967
Program implementation 0.844
Addressing Organizational Evaluation Capacities
Organization evaluation practices 0.686
Evaluation readiness and willingness 0.450
Building leadership support 0.525 0.557
Building culture for evaluation 0.995
Creating/strengthening evaluation policy requirements 0.595
Creating/strengthening evaluation structures 0.532 0.639
Creating/strengthening evaluation systems 0.588
Creating/strengthening support for evaluation resources 0.472
Improving organizational evaluation social network 0.922
Note: Empty cells are coefficients with less than 0.400; Factor analysis using Maximum Likelihood Method and Oblimin Rotation; Kaiser-Meyer-Olkin:0.592; Bartlett’s
Test of Sphericity p-value: 0.000; Correlation residuals greater than 0.05: 35%; Cumulative % of Variance: 55.44%
134
The set of items for which the first domain has strong loadings can be
identified as ―Developing Evaluation Culture and Practices‖. These are the topics that
addressed organizational evaluation capacity building in terms of ―building culture for
evaluation‖ and ―organization evaluation practices‖. These are the leading items with
high factor loadings, 0.99 and 0.68 respectively.
The analysis results of the ECB content items could be organized into
recognizable sub-domains as shown by the extracted factors. The factors will be
referred to as sub-domains or simply domains to highlight the notion that these are
ECB content topics delivered in practice. Although there are other items this factor
loads into, these are shared by other domains. This is expected as Table 4.28 shows
the correlation matrix among the domains.
Table 4.28 Factor Correlation Matrix
Factor Correlation Matrix
Factor 1 2 3 4
1 1.000 .082 .325 .339
2 .082 1.000 .004 .146
3 .325 .004 1.000 .039
4 .339 .146 .039 1.000
Extraction Method: Maximum Likelihood.
Rotation Method: Oblimin with Kaiser Normalization.
The second domain could be named ―Managing Programs‖. These are topics
that pertain to ―Program planning‖ and ―Program implementation‖ with very high
factor loadings (0.97 and 0.84 respectively). Although this sub-domain is a couplet,
that is a factor with only two representative items, this sub-domain appears to be
independent, having low correlations with all other sub-domains. It can be deduced
from this result that, in practice, it has been an integral part of evaluation capacity
building initiatives, the topics dealing with the way organizations plan and run the
135
programs they are handling. This supports the notion that program design,
development and implementation are all considered necessary aspects in the field of
evaluation in addition to program evaluation topics.
The third sub-domain could be identified as ―Institutionalizing Evaluation‖.
This includes topics such as ―Improving organizational evaluation network‖ (0.92),
―Creating or strengthening evaluation structures‖ (0.64), ―Creating or strengthening
evaluation policy requirements‖ (0.59), ―Creating or strengthening evaluation
systems‖ (0.58), ―Building leadership support‖ (0.56) and ―Creating or strengthening
support for evaluation resources‖ (0.47). It has to be noted that two topics have
moderate factor loadings from the first sub-domain ―Developing Evaluation Culture
and Practice‖. These are ―Building leadership support‖ and ―Creating or strengthening
evaluation structures‖. This shows that leadership and evaluation structure in an
organization are both indicators of evaluation culture and practice and organizational
social context for evaluation.
The final sub-domain could be considered ―Building Evaluation Knowledge,
Skills and Readiness‖. This is the group of ECB topics that includes basic evaluation
knowledge such as ―Logic models‖ (0.64), ―Evaluation Plan‖(0.54), and ―Data
management, analysis or use‖ (0.48). ―Evaluation readiness and willingness‖ (0.45) is
an indicator of this sub-domain as well. There was an expectation that ―Evaluation
awareness and attitude‖ could belong in this cluster, however, this item has no factor
loadings from any of the sub-domains identified. This is also true for ―How to do an
evaluation plan‖. A possible explanation might be that these ECB topics are mostly
present in all ECB initiatives and do not give much information. From this, it could be
interpreted that ECB initiatives commonly recognize the need for ECB and that the
purpose of ECB is how to do an evaluation.
136
These dimension groupings of ECB content topics are somewhat
subjective. Although this data set satisfies the sample adequacy criteria, the sample
size is still a limitation compared with the ideal sample size (at least 15 cases for each
item, 17x15 = 255). Factor analysis can be highly sensitive with regards to sample
size; that is, factor loadings could change easily as new samples are added. There is
also the subjectivity of the choice of factor extraction and rotation techniques that
yield quite different results. For this study, the factor analysis procedure fortunately
yielded an interpretable result. However, it does not mean that these dimension
groupings are fixed and final. The usefulness for this analysis is that there is
something to work on when comparing the frequency levels of the topic contents
being delivered during ECBs.
Table 4.29 provides a summary list of these ECB content construct sub-
domains. While the Item Response Analysis has provided a strong case that these
topics tap into a single higher order construct and has shown that ECB content topic
delivery could be structured as a learning progression, exploratory factor analysis
demonstrated that there are discernible sub-domains. These findings show that it
could be possible that for high order constructs such as ECB content that fits a Rasch
model, sub-domains could exist. Table 4.29 also demonstrates the mapping of these
topics to the IRT hierarchy, indicating that some patterns of domain progression can
be observed. For example, in the sub-domain ―Developing Evaluation Culture and
Practices‖, there is a good distribution of low level to high level topics. The couplet
―Skills in Program Management‖ topics are both moderate levels while the
―Evaluation Knowledge, Skills and Readiness‖ domain has mostly low level topics
apart from evaluation willingness and readiness as a high level. ―Creating Evaluation
Network, Systems and Structures‖ has topics from moderate to difficult.
137
Table 4.29 Sub-domain Groupings for ECB Content and IRT Hierarchy
Classification
ECB Content Sub-domains, Topics and IRT Hierarchy Classification
Domain 1: Developing Evaluation Culture and Practices
Building culture for evaluation (Moderate Level)
Organization evaluation practices (High Level)
Creating or strengthening evaluation structures* (Moderate Level)
Building leadership support* (High Level)
Logic models* (Low Level)
Domain 2: Managing Programs
Program implementation (Moderate Level)
Program planning (Moderate Level)
Domain 3: Institutionalizing Evaluation
Improving organizational evaluation social network (Moderate Level)
Creating or strengthening evaluation structures* (Moderate Level)
Creating or strengthening evaluation policy requirements (High Level)
Creating or strengthening evaluation systems (Moderate Level)
Building leadership support* (High Level)
Creating or strengthening support for evaluation resources (High Level)
Domain 4: Building Evaluation Knowledge, Skills and Readiness
Logic models* (Low Level)
Evaluation plan (Low Level)
Data management, analysis, interpretation or use (Low Level)
Evaluation terms, approaches, or methods (Low Level)
Evaluation readiness and willingness (High Level)
* Topics that overlap between sub-domains; Two topics have no factor loadings from any of the sub-
domains: “Evaluation awareness and attitudes” and “How to do an evaluation”
138
Using these ECB sub-domains as new categories for grouping ECB topics,
Figure 4.14 shows the relative performances of the reported ECBs in terms of delivery
frequencies on these categories as weighted means, scaled 1 to 5. It should be
recalled that the initial groupings of these content topics are the broad individual and
organizational evaluation capacity categories. Factor analysis for ECB topics forced
on two factors does not confirm this individual ECB versus organizational ECB
divide. This means that ECB content construct organizes itself more on these four
identified sub-domains rather than the concept held about individual or organizational
evaluation capacity building content and approaches.
Previous results have shown high frequencies among ECB initiatives for
topics on developing individual evaluation capacities compared with topics on
targeting organizational evaluation capacities. In practice, however, there is ambiguity
with regard to how these two divide or work together for an organization. The
reorganization of topics using the factor analysis groupings reveals another
perspective. Here, the ECBs can be evaluated with respect to the four identified
dimensions. Results show that ECBs deliver relatively high frequency on ―Building
Evaluation Knowledge, Skills and Readiness‖. This is followed by a moderate
delivery on teaching ―Managing Programs‖. ECB deliveries on sub-domains
―Developing Evaluation Culture and Practice‖ and ―Institutionalizing Evaluation‖
have low frequency. Figure 4.14 may again appear to show the divide between the
individual level content constructs and the organizational level domains but a closer
view of the topics shows that these are spread in the four domains, although Domain 3
all consists of organizational level topics.
What could possibly explain this distribution pattern of topics across these
sub-domains is not addressed in this investigation; rather, the purpose is to use these
139
findings in relation to understanding measurement practices in ECB. However, one
possible explanation is the nature of demand or need for ECB. It could be possible
that most organizations that engaged in ECB contracts perceived the individual
evaluation capacity building needs, in the form of evaluation management and
evaluation skills development, as high priority compared with restructuring
organizations for evaluation mainstreaming. Another possible reason could be the
belief that individual improvement in evaluation management and skills would
consequently translate itself into improvement of organizational evaluation capacity;
just passively allowing organizational mechanisms to make these translation by
themselves. Still, another possible reason is that those engaged with ECBs were not
really viewing ECB as a progressive learning intervention with several domains. The
Findings imply that ECB, in practice and content, is highly dependent on the
perceptions of both the organizations and the ECB practitioners.
Figure 4.14 ECB Content Sub-Domains Frequencies
1.65
2.05
2.53
3.54
1 2 3 4 5
Institutionalizing Evaluation
Developing Evaluation Culture and
Practice
Managing Programs
Building Evaluation Knowledge, Skills
and Readiness
Frequency of Reported ECBs
ECB Content Sub-domains
Low Moderate High
140
ECB Content and Decision to Measure
Revisiting Research Question 1C, ―What determines practice of measuring
ECB outcomes?‖ to include the results from IRT analysis and Factor analysis to
answer this research question, Table 4.30 shoes the regression analysis with respect to
ECB content characteristics and decision to measure.
Table 4.30 Simple Logistic Regression Analysis: ECB Content and Decision to
Measure
Predictor Variable B Wald 2 P Exp(B) Prediction
Base %:75.4 ECB Content
Developmental Proficiency
High Level Topics -0.097 0.060 0.806 0.907 NS
75.4
Moderate Level Topics 0.161 0.550 0.458 1.174 NS
75.4
Low Level Topics 0.463 3.865 0.049 1.589* 75.4
Construct Dimensions
Developing Evaluation Culture and
Practice
-0.209 0.030 0.862 0.811 NS
75.4
Skills in Program Management 0.339 0.243 0.622 1.403 NS
75.4
Creating Evaluation Social Network,
Structures and Systems
-0.653 0.202 0.653 0.520 NS
75.4
Building Evaluation Knowledge,
Skills and Readiness
2.849 4.711 0.030 17.274* 77.2
NS: Not significant; *: significant at 0.05level of significance.
The decision to measure ECB outcomes also appears to be influenced by
ECB content developmental proficiency. ECBs focusing on low level topics are most
likely to measure ECB outcomes compared with those that deliver high level topics.
With respect to ECB content sub-domains, the analysis also shows that these have
influence in the likelihood of measuring ECB outcomes, the dimension ―Evaluation
Readiness‖ shows a highly significant influence. This means that those ECB cases
141
that include content topics relating to ―Evaluation Readiness‖ are most likely to
measure ECB outcomes compared with ECBs that include all the other topics. It has
to be noted that very few cases (3 of 63) include this specific topic in their ECB
contents. From this, it could be interpreted that those who are aware of the readiness
and willingness aspects of the organization may have more acute awareness of the
measurement aspects of the ECB outcomes.
Answer to Research Question 2
Is there evidence to demonstrate that:
Research Question 2A: ECB content follows as unified learning construct
and a possible progressive structure?
Research Question 2B: ECB content could be grouped in specific ways?
The IRT result showed that ECB content demonstrates a unified learning
construct. Item Response Analysis also enabled the scaling of ECB content topics into
a continuum of developmental proficiency levels. The ability-proficiency plot for the
ECB content versus ECB case samples reveals that most ECB cases were distributed
in the lower end of the developmental proficiency continuum of ECB content topics.
An additional way to view the construct of ECB content topics was the
possibility of sub-domains within this construct. For this data set, factor analysis
revealed that ECB content topics can be organized into four sub-domains, these are:
―Building Evaluation Knowledge, Skills and Readiness‖, ―Managing Programs‖,
―Developing Evaluation Culture and Practice‖, and ―Institutionalizing Evaluation‖.
ECBs have nearly high frequency on delivering topics in ―Building Evaluation
Knowledge, Skills and Readiness‖, moderate performance on ―Skills in Program
142
Management‖ but low frequency in the areas of ―Developing Evaluation Culture and
Practice‖ and ―Creating Evaluation Social Network, Structures and Systems‖.
Overall, evidence suggests that ECB content topics delivered in practice fit a
developmental progression continuum. This result provides an additional framework
to anchor existing and future ECB measurement tools. There is also evidence to show
that ECB is a high order construct with sub-domains. The characteristics of these sub-
domains show that within each sub-domain, the patterns of topic hierarchy can also be
discerned. The sub-domain with the most ECB content topics in the high level ECB
developmental proficiency is ―Institutionalizing Evaluation‖ and the sub-domain with
topics mostly in the low level of ECB developmental proficiency is ―Building
Evaluation Knowledge, Skills and Readiness‖.
Finally, findings also reveal that other alternatives of measuring ECBs were
used, including documenting the qualitative changes in evaluation process and the
outputs of evaluation products, as well as documenting improvements in evaluation
systems, rather than looking at change in capacities as the result of ECB activities.
There is also an indication that in practice there is a clear distinction between ECB
outcomes and program outcomes. The measurements are clearly of ECB outcomes as
a result of ECB intervention, rather than of program outcomes on which the ECB
intervention hopes to influence.
Chapter Conclusion
This chapter has presented answers to the posed research questions based
on the analyses of the data results and on evidence relevant to the research questions.
For published ECB cases included in this study, noteworthy findings are as follows:
143
ECB content and implementation tended to be more focused on individual
evaluation capacity development compared with building organizational
evaluation capacity;
Teaching strategies are equally distributed among direct training, participatory
approach or a combination of both approaches;
ECBs mostly targeted individual capacity change rather than organizational
capacity change and mostly focused on program staff and managers but
excluded leadership;
The majority of cases did not report ECB outcomes measurement;
ECB content topics can be viewed as a developmental progression for
learning; there is a good fit to Rasch Model confirming content topics tapping
on a single construct on a progressive scale;
Four sub-domains for ECB content were identified: Building evaluation
knowledge, skills and readiness; Skills in program management; Developing
evaluation culture and practice; and Creating evaluation social network,
structures and systems;
An important caveat is that conclusions drawn for these research questions pertain
only to the characteristics of the sample and contexts described. In the concluding
chapter, a synthesis of these findings will be presented to discuss the meaning of these
findings to ECB and ECB measurement practice.
144
CHAPTER 5
SYNTHESIS AND CONCLUSION
Overview of the Chapter
This chapter aims to show the connection and relevance of the findings,
beyond data and evidence, to what have been identified as knowledge gaps in ECB in
the literature; and hopefully, to what it means in the practice of ECB and
measurement in ECB. The synthesis begins with a recall of these gaps and how
evidence from the findings of the research questions addresses these gaps. The
contribution of this study is demonstrated by showing the relevance of these findings
in the practice of evaluation capacity building, providing possible applications of the
study results. The chapter concludes with the limitations of the study and some
possible future research directions.
Contribution of the Study
The knowledge gaps identified in the literature were distilled in the two main
research questions: (1) how can ECB measurement practice be described from
empirical evidence? And, (2) is there evidence to demonstrate that ECB content
exhibits a unified learning construct and possibly follows a progressive structure? The
first question aimed to provide evidence of what measurement practice has occurred
in ECB, which in turn provided background and evidence for the second question.
From the perspective of this study, which is that ECB is a learning intervention,
several things need to be established: Is ECB a learning construct? Does ECB follow
a progressive structure? Does the ECB construct exhibit learning sub-domains? These
145
knowledge gaps are addressed by the study findings, the way they relate back to
literature and what they mean in practice.
ECB Practice and Measurement
In ECB practice and measurement, Labin et al. (2012) and Labin (2014)
pointed out that ECB practitioners need to improve in the area of measuring ECB
outcomes. Also, Labin (2014) identified in her work the existing ECB measurement
tools that could be mapped to her proposed and validated Integrative Evaluation
Capacity Building (IECB) model. The IECB model has been a breakthrough in the
ECB literature. It conceptualized the programmatic nature of ECB from a strong
empirical research base. In this sense, the model has been grounded strongly both
from practice and from evaluation program theory. However, measurement in ECB
has to be viewed beyond measurement tools that are used and mapped in the model
components. This way of seeing measures has its own utility (i.e. usefulness) and
importance, but to look at measurement of ECB from a program theory perspective
requires looking at measures that focus on the outcomes of what has been delivered in
ECB. This is perhaps the reason for the perplexing observation that ECBs have not
adequately measured ECB outcomes: ECBs have not yet clearly defined what
outcomes to look for and measure. While the IECB has explicated the ECB program
theory components, it has not clearly defined what is being measured in ECB. For
example, the outcomes component of the IECB model correctly identified the
individual, organizational and program outcomes that are linked to the needs/reasons
and ECB activities. However, it failed to identify whether these outcomes comprise a
unified learning construct or whether they several learning domains. That is, the
characteristics of these ECB outcomes were not clearly defined.
146
In the literature with regards to measurement in evaluation and measurement
in ECB, Braverman and Arnold (2008) and Braverman (2013) argued that any good
program theory for evaluation must recognize its accompanying implicit measurement
theory if evaluation is to demonstrate methodological rigor. Findings of this study
revealed that for ECBs that measured their ECB outcomes, this understanding of
methodological rigor was understood by ECB practitioners. This is also supported by
the findings in the assessment of “Rigor of Measurement Practice in ECB” performed
in this study. However, the area that needs to be sorted out in ECB measurement
practice is that of clarifying what is to be measured in ECB outcomes. This means that
ECB practitioners need to be clear about the outcomes measured that are linked with
the ECB intervention. While the ultimate goal of ECB intervention is improved
organizational outcome, program evaluation theory and measurement in evaluation
literature would dictate that what needs to be measured in the ECB interventions are
the ECB learning outcomes. This distinction has not been made clear in practice.
In this study, the findings have shown that ECB content and implementation
tended to focus more on individual evaluation capacity building compared with
building organizational evaluation capacity. These included capacities building
fundamental knowledge and skills in carrying out evaluations at the program level of
organizations. Only a few cases focused on organizational evaluation capacity
building that would include building systems, processes, support, and evaluation
culture. Most ECBs targeted individual evaluation capacity change compared with
organizational evaluation capacity change and mostly involved program staff and
managers compared with a few that involved the organization leadership and program
beneficiaries.
147
With regards to ECB measurement, the findings revealed that only a relatively
small group of ECB practitioners measure ECB outcomes. These are mostly ECB
cases in the quantitative paradigm, delivered through combined direct and indirect
training approaches. The findings also show that the propensity to measure ECB
outcomes is likely to be influenced by the methodological paradigm of the ECB
implementation, ECB content proficiency level, construct sub-domains and
participant focus. In carrying out the measurement, there was some level of
acceptable rigor but there is need for improvement to identify the scope of the
variables being measured and to establish the properties of the measurement tools
used. These findings affirmed the need to understand the ECB content construct so as
to be able to identify what needs to be measured in ECB.
These findings in ECB practice and measurement confirmed empirically the
gaps found in the theoretical literature. There is a need to define what ECB outcomes
to measure and to understand the characteristics of the ECB outcomes to be measured.
A way to understand is to borrow the ideas from learning and measurement theory
from the field of education. This study addressed this gap by defining what is to be
measured in ECB and describing its characteristics. This is the distinctive contribution
of this study.
ECB Content Construct and ECB Developmental Proficiency Structure
Item Response Theory Analysis and Factor Analysis have provided evidence
to suggest that ECB content topics delivered in ECB initiatives fit a developmental
progression continuum and that sub-domains exist for this high order learning
construct. This means that there are two significant findings established in this study,
the confirmation that ECB is a learning construct and that it has a progressive
148
structure across several domains. This study produced two important study outputs:
(1) the ECB developmental proficiency scale; and (2) sub-domain components of
ECB. These findings suggest that the ideas in educational measurement apply in ECB
and that developmental constructs of learning actually exist in ECB. This means that
the assessment of learning and developmental approach to assessment proposed by
Griffin (Griffin, 2007) are measurement principles that could fit in ECB measurement
practice.
The survey of ECB measurement studies in Chapter 2 showed that in the
literature examined, ECB measurement has never been conceived this way. ECB
measurement studies were focused on the development of ECB measurement
instruments. These developed ECB measurements were particularly focused on the
different components and elements of the ECB program theory. As mentioned earlier,
the published ECB assessment tools have been mapped into the components of the
Integrative Evaluation Capacity Building (IECB) model (Labin, 2008, 2014; Labin, et
al., 2012). These existing ECB assessment tools, when applicable, can be mapped as
well in this ECB model of developmental proficiency.
This idea of progression and developmental models in ECB is not new. For
example, in a study that examined the link of individual and collective attributes of an
organization to build evaluation capacity and utilize outcome evaluation, Brown and
Reed (2002) documented that traditional training in evaluation tended to focus on
individual change and those that focus on organization change efforts tend to
incorporate individual change as a necessary sub-component of organizational
change. The study further found out that these approaches did not adequately
incorporate a developmental context within the evaluative framework. The study
presents an integral and developmental approach that links these individual and
149
organizational attributes. In a second study from an unpublished dissertation,
Gullickson (2010) generated a descriptive theory of evaluation mainstreaming by
examining four cases of National Science Foundation‘s Advanced Technological
Program. The study‘s findings suggest the existence of developmental stages of
evaluation within an organization. While these studies are limited to the recognition
of the significance of progression and developmental models in ECB, the present
study has contributed to this existing knowledge by demonstrating and establishing
clearly how this idea of developmental progression can be applied using ECB
empirical data.
Implications to ECB Measurement Practice
These results are important in the delivery of ECB efforts in three significant
ways. First, they provide a clear basis to determine what level of ECB content will be
considered in an ECB effort at any level in the hierarchy, depending on the assessed
needs of the organization. Needs assessment is often unbridled and could possibly
show a full range of learning needs. The ECB hierarchy provides order to this range
of learning needs, directing stakeholders to those they should prioritize and those
whice are realistic given the available time frame and resources. Second, the
identified sub-domains of the ECB content topics also identify the cluster of topics on
which the ECB initiative can focus on. Lastly, these hierarchy and sub-domains of the
ECB could be the basis of evaluation for measuring ECB learning outcomes.
Consider, for example a scenario where the results of a needs assessment for an ECB
planning in an organization reveal various evaluation capacity building needs. These
needs could range from developing skills in evaluation data management, program
delivery, evaluation reporting, and the like. Without the ECB developmental
150
proficiency and ECB learning domain references, it would be difficult for
practitioners to identify which activities to implement and prioritize. Thus, the results
of this study have practical value for ECB planners and practitioners. Furthermore, the
study results show that ECB outcomes have to be defined in terms of the ECB
learning outcomes. To illustrate this point, suppose there is an ECB initiative. The
ECB program theory model calls for investigation of the motivational factors and
reasons for conducting the ECB. One of the approaches to this proactive evaluation
phase for ECB is to conduct needs assessment and determine learning gaps. The ECB
facilitators then negotiate with the stakeholders for the design, implementation plan
and provision of the resources to carry out the ECB learning activities. If one
supposes that the competency gaps identified in the assessment phase are the basics of
evaluation knowledge and skills, and the learning activities were delivered to address
this need, then the expected ECB learning outcomes can only be limited to this level
of ECB content hierarchy. There should be no expectations to see changes beyond
what has been delivered in the ECB effort, although they may occur. For example,
when only the basics of evaluation knowledge and skills were delivered, there should
be no expectations to see changes in the evaluation system, strengthening of the
resources to support evaluation activities in the organization or improvement of the
evaluation social network if these learning outcomes were not addressed in the ECB
effort. To do so would be to set up the ECB effort for failure. It becomes further
complicated and unreasonable to expect improvement in the program outcomes
delivered by the organization after ECB training on evaluation basics. The point of
this illustration is that ECB assessment can only measure what has been delivered in
the ECB effort. The hierarchy of ECB content to be delivered provides these target
ECB learning outcomes.
151
In the IECB model, a prominent divide in the outcomes component is the
individual-level against organizational-level outcomes as a consequence of individual
or organizational focused ECB. There has been some investigation into how
individual evaluation capacities translate to organizational evaluation capacities
(Brown & Reed, 2002). However, the situation could be seen as a ―chicken or egg‖
scenario of what comes first. It has been advocated that evaluation systems and
structures need to be set up first to influence culture and practice, which in turn are
influenced by collective individual evaluation skills and vice-versa. Evidence from
this study shows that while this divide could be intuitively deduced, factor analysis of
the content construct of ECBs delivered in practice does not support this assumption.
The evidence suggests that by looking at ECB content construct, there are underlying
sub-dimensions with defensible hierarchy but the individual versus organizational
divide has not been evident.
Practitioners have long emphasized that the ultimate goal of ECB is improved
program outcomes. This means that ―ECB learning outcomes‖ could be an
intermediary step of the ECB outcomes hierarchy. There has to be a clear distinction
between the two kinds of ECB outcomes: the ECB learning outcomes and the ECB
program outcomes (also referred to as organizational outcomes). It is important to be
clear about this distinction because organizational outcomes or program outcomes
have their own set of key and mediating factors and are most likely independent of
those of the ECB learning outcomes. For example, ―Improved health outcomes‖ of an
organization‘s program recipients will have different factors and mediators
influencing this outcome compared with the ―Improved health program delivery‖ of
an ECB learning outcome of an organization‘s program staff.
152
In summary, the findings of this study suggest that while the IECB is
successful in defining the programmatic perspective of ECB, the ECB outcomes can
be re-configured into ECB learning outcomes as intermediate outcome influencing the
ultimate organizational outcomes or program outcomes. Findings further suggest that
to understand ECB learning outcomes one must take into account the hierarchical
characteristics of the ECB content construct that has been delivered in actual practice.
Limitations of the Study
The limitations of the present study should be noted. This endeavour was
carried out by a single researcher, which means there may have been possible sources
of errors in the coding and interpretation of the ECB reports. In the cases that did not
explicitly report the information needed for the study, the researcher was forced to use
his best judgment in assigning the data codes. In addition, the researcher did not
examine the organizational theories that may have underpinned the findings and their
interpretation. The discussions were narrowly focused on the areas of program theory
and learning assessment. It is possible, therefore, that further implications could have
been identified if the background literature had been expanded to organizational
theories.
Future Research Directions
The findings of this study introduced the notion of ECB developmental
proficiency. It opens a door to rethink measurement in ECB beyond psychometric
properties and development of assessment tools by component of the ECB program
theory. It challenges ECB practitioners to think of measuring ECB with respect to the
progress defined by the ECB hierarchical construct. It also challenges ECB
153
practitioners to delineate ECB evaluation between ECB learning outcomes and
program outcomes of an organization when it operates at different levels of evaluation
capacity. For example, the mapping of the ECB assessment tools could be done
according to ECB developmental proficiency and along with its sub-domains in
addition to the ECB assessment tools that have been mapped to ECB program theory.
In addition, this idea of ECB developmental proficiency can be examined through the
lens of organizational and management theories. Thus, future research directions in
ECB can explore this perspective of ECB developmental proficiency.
Conclusion
The primary aims of this study were to document and describe the
measurement practice that has occurred in ECB initiatives as reported in published
ECBs, and to investigate whether empirical evidence supports the notion of ECB
developmental proficiency that follows from the learning intervention perspective of
ECB. These aims were achieved and the results have affirmed the hypothesis of the
study: ―Evaluation capacity building as a learning intervention would call for a
progressive approach to content delivery and outcomes measurement.”
The main contribution of this study to the field of Evaluation, in particular in
the area of measurement in Evaluation Capacity Building, is the introduction of the
notion that ECB is a learning intervention. Through this assumption, it was
demonstrated that ECB follows a developmental proficiency construct. This study has
clearly established that ECB can be viewed as a learning progression. It has set a case
to reframe ECB content, implementation and measurement practice according to this
point of view.
154
REFERENCES
Adams, J., & Dickinson, P. (2010). Evaluation training to build capability in the
community and public health workforce. American Journal of Evaluation,
31(3), 421-433.
Andrews, A. B., Motes, P. S., Floyd, A., Flerx, V. C., & Fede, A. L. (2005). Building
evaluation capacity in community-based organizations: reflections of an
empowerment evaluation team. Journal of Community Practice, 13(4), 85-
104.
Arnold, M. E. (2006). Developing evaluation capacity in extension 4-H field faculty:
A framework for success. American Journal of Evaluation, 27(2), 257 - 269.
. Assessment and learning partnerships: A short course for school leaders. (2012). In
U. o. M. Assessment Research Centre (Ed.).
Atkinson, D. D., Wilson, M., & Avula, D. (2005). A participatory approach to building
capacity of treatment programs to engage in evaluation. Evaluation and
Program Planning(3), 329.
. ATLAS.ti (Version 7.1.7). (2014) [Computer Software]. Berlin: Cincom Systems,
Inc. .
Beere, D. (2005). Evaluation capacity building: A tale of value adding. Evaluation
Journal of Australasia, 5(2), 41-47.
Blalock, H. M. (1979). The presidential address: Measurement and conceptualization
problems: The major obstacle to integrating theory and research. American
Sociological Review, 44(6), 881-894.
Blalock, H. M. (1982). Conceptualization and measurement in the social sciences
Beverly Hills: Sage Publications.
Botcheva, L., White, C. R., & Huffman, L. C. (2002). Learning culture and outcomes
measurement practices in community agencies. American Journal of
Evaluation, 23, 421 - 434.
Braverman, M. T. (2013). Negotiating Measurement: Methodological and
Interpersonal Considerations in the Choice and Interpretation of Instruments.
American Journal of Evaluation, 34(1), 99-114.
Braverman, M. T., & Arnold, M. E. (2008). An evaluator's balancing act: Making
decisions about methodological rigor. New Directions for Evaluation(120), 71-
86.
Brisolara, S. (1998). The history of participatory evaluation and current debates in the
field. New Directions for Evaluation, 1998(80), 25-41.
155
Brown, R. E., & Reed, C. S. (2002). An integral approach to evaluating outcome
evaluation training. American Journal of Evaluation, 23(1), 1 - 17.
Chouinard, J. A. (2013). The case for participatory evaluation in an era of
accountability. American Journal of Evaluation, 34(2), 237-253.
Clinton, J. (2014). The true impact of evaluation: Motivation for ECB. American
Journal of Evaluation, 35(1), 120-127.
Compton, D., Baizerman, M., Preskill, H., Rieker, P., & Miner, K. (2001). Developing
evaluation capacity while improving evaluation training in public health: the
American Cancer Society's Collaborative Evaluation Fellows Project.
Evaluation and Program Planning, 24, 33 - 40.
Compton, D., Baizerman, M., & Stockdill, S. H. (Eds.). (2002). The art, craft, and
science of evaluation capacity building. San Francisco: Jossey-Bass.
Corcoran, T., Mosher, F. A., & Rogat, A. (2009). Learning Progressions in Science:
An Evidence-Based Approach to Reform: Consortium for Policy Research in
Education.
Cousins, J. B., Goh, S. C., Elliott, C., & Aubry, T. (2008). Government and voluntary
sector differences in organizational capacity to do and use evaluation. Paper
presented at the Annual meeting of the Canadian Evaluation Society, Quebec,
Canada. http://evaluation.ca/site.cgi?s=1
D'Eon, M., Sadownik, L., Harrison, A., & Nation, J. (2008). Using self-assessments to
detect workshop success: Do they work? American Journal of Evaluation,
29(1), 92-98.
Danseco, E., Halsall, T., & Kasprzak, S. (2009). Readiness assessment tool for
evaluation capacity building: The Provincial Centre for Excellence for Child
and Youth Mental Health at CHEO, Ottawa, Canada.
de Winter, J.C.F., & Dodou, D. (2012). Factor recovery by principal axis factoring
and maximum likelihood factor analysis as a function of factor pattern and
sample size. Journal of Applied Statistics. 29(4), 695-710.
Diaz-Puente, J. M., Yague, J. L., & Afonso, A. (2008). Building evaluation capacity in
Spain: A case ctudy of rural development and empowerment in the European
Union. Evaluation Review, 32(5), 478-506.
Dunaway, K. E., Morrow, J. A., & Porter, B. E. (2012). Development and validation
of the Cultural Competence of Program Evaluators (CCPE) self-report scale.
American Journal of Evaluation, 33(4), 496-514.
Fetterman, D., Rodriguez-Campos, L., Wandersman, A., & O'Sullivan, R. G. (2014).
Collaborative, participatory, and empowerment evaluation: Building a strong
conceptual foundation for stakeholder involvement approaches to evaluation
(A response to Cousins, Whitmore, and Shulha, 2013). American Journal of
Evaluation, 35(1), 144 - 148.
156
Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation :
alternative approaches and practical guidelines (4th ed.). Upper Saddle River,
N.J. : Pearson Education
Freedman, D. A. (2009). Statistical models: Theory and practice. Cambridge,
England: Cambridge University Press.
Griffin, P. (2007). The comfort of competence and the uncertainty of assessment.
Studies In Educational Evaluation, 33, 87-99.
Grob, G. F. (2010). Evaluation field building in South Asia: insights from the rear
view mirror. American Journal of Evaluation, 31(2), 241-245.
Gullickson, A. (2010). Mainstreaming evaluation: Four case studies of systematic
evaluation integrated into organizational culture and practices. Western
Michigan University.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: principles and
applications. Boston: Kluwer-Nijhoff Publishing.
Henry, G. T., & Mark, M. M. (2003). Beyond use: Understanding evaluation's
influence on attitudes and actions. American Journal of Evaluation, 24(3),
293-314.
Holvoet, N., & Dewachter, S. (2013). Building national M&E systems in the context
of changing aid modalities: The underexplored potential of National
Evaluation Societies. Evaluation and Program Planning, 41, 47 - 57.
. IBM SPSS Statistics (Version 20). (2011): International Business Machines
Corporation.
Huba, M.E., & Freed, J.E. (2000). Learner-centered assessment on college campuses:
Shifting the focus from teaching to learning. Boston: Allyn and Bacon.
King, J. A. (2010). Response to evaluation field building in South Asia: Reflections,
anecdotes, and questions. American Journal of Evaluation, 31(2), 232-237.
Kuzmin, A. (2012). Participatory Training Evaluation Method (PATEM) as a
collaborative evaluation capacity building strategy. Evaluation and Program
Planning, 35(4), 543-546.
Labin, S. N. (2008). Research synthesis: Toward broad-based evidence. In N. L.
Smith & P. R. Brandon (Eds.), Fundamental issues in evaluation. (pp. 89-110).
New York, NY US: Guilford Press.
Labin, S. N. (2014). Developing common measures in evaluation capacity building:
An iterative science and practice process. American Journal of Evaluation,
35(1), 107-115.
Labin, S. N., Duffy, J. L., Meyers, D. C., Wandersman, A., & Lesesne, C. A. (2012). A
research synthesis of the evaluation capacity building literature. American
157
Journal of Evaluation, 33(3), 307-338.
Lam, T. C. M. (2009). Do self-assessments work to detect workshop success? An
analysis of argument and recommendation by D'Eon et al. American Journal
of Evaluation, 30(1), 93-105.
Leviton, L. C. (2001). Presidential address: building evaluation's collective capacity.
American Journal of Evaluation, 22(1), 1.
Leviton, L. C. (2014). Some underexamined aspects of evaluation capacity building.
American Journal of Evaluation, 35(1), 90-94.
McDonald, B., Rogers, P., & Kefford, B. (2003). Teaching people to fish? Building
the evaluation capability of public sector organizations. Evaluation, 9(1), 9.
McGeary, J. (2009). A critique of using the Delphi Technique for assessing evaluation
capability-building needs. Evaluation Journal of Australasia, 9(1), 31.
Miller, R. L., & Campbell, R. (2006). Taking Stock of Empowerment Evaluation: An
Empirical Review. American Journal of Evaluation, 27(3), 296-319.
Milstein, B., Chapel, T. J., Wetterhall, S. F., & Cotton, D. A. (2002). Building
Capacity for Program Evaluation at the Centers for Disease Control and
Prevention. New Directions for Evaluation(93), 27-46.
Nielsen, S. B., Lemire, S., & Skov, M. (2011). Measuring evaluation capacity: Results
and implications of a Danish study. American Journal of Evaluation, 32(3),
324-344.
O'Sullivan, R. G. (2012). Collaborative evaluation within a framework of stakeholder-
oriented evaluation approaches. Evaluation and Program Planning, 35(4),
518-522.
Patton, M. Q. (2002). Qualitative research & evaluation methods Thousand Oaks,
California : Sage Publications.
Preskill, H. (2008). Evaluation's second act: A spotlight on Learning. American
Journal of Evaluation, 29(2), 127-138.
Preskill, H., & Boyle, S. (2008). A Multidisciplinary Model of Evaluation Capacity
Building. American Journal of Evaluation, 29(4), 443-459.
Preskill, H., & Russ-Eft, D. F. (2005). Building evaluation capacity : 72 activities for
teaching and training: Thousand Oaks, Calif. : Sage Publications.
Preskill, H., & Torres, R. T. (2000). Readiness for organizational learning and
evaluation instrument. Retrieved from [email protected]
Sanders, J. R. (2002). Presidential address: On mainstreaming evaluation. American
Journal of Evaluation, 23(3), 253 - 259.
Stockdill, S. H., Baizerman, M., & Compton, D. W. (2002). Toward a Definition of
158
the ECB Process: A Conversation with the ECB Literature. New Directions for
Evaluation(93), 1-25.
Stufflebeam, D. L., & Shinkfield, A. J. (2007). Evaluation theory, models, and
applications San Francisco : Jossey-Bass
Suarez-Balcazar, Y., & Taylor-Ritzler, T. (2014). Moving from science to practice in
evaluation capacity building. American Journal of Evaluation, 35(1), 95-99.
Taut, S. (2007). Studying self-evaluation capacity building in a large international
development organization. American Journal of Evaluation, 28(1), 45-59.
Taylor-Powell, E., & Boyd, H. H. (2008). Evaluation capacity building in complex
organizations. New Directions for Evaluation(120), 55-69.
Taylor-Ritzler, T., Suarez-Balcazar, Y., Garcia-Iriarte, E., Henry, D. B., & Balcazar, F.
E. (2013). Understanding and measuring evaluation capacity: A model and
instrument validation study. American Journal of Evaluation, 34(2), 190-206.
TCU. (2005). Organizational readiness for change (TCU ORC). Retrieved from
http://www.ibr.tcu.edu/evidence/evi-orc.html
Thompson, B. (2004). Exploratory and confirmatory factor analysis. Washington,
DC: American Psychological Association.
Urban, J. B., Burgermaster, M., Archibald, T., & Byrne, A. (2013). Relationships
between quantitative measures of evaluation plan and program model quality
and a qualitative measure of participant perceptions of an evaluation capacity
building approach. Journal of Mixed Methods Research, 201(X)(XX(X)), 1 -
24
Volkov, B., & King, J. (2007). A checklist for building organizational evaluation
capacity. Retrieved from http://www.wmich.edu/evalctr/checklists/ecb.pdf
Wandersman, A. (2014). Getting to outcomes: an evaluation capacity building
example of rationale, science, and practice. American Journal of Evaluation,
35(1), 100-106.
Weitzman, B. C., & Silver, D. (2013). Good evaluation measures: More than their
psychometric properties. American Journal of Evaluation, 34(1), 115-119.
Wholey, J. S. (1987). Evaluability assessment: Developing program theory. New
Directions for Program Evaluation(33), 77-92.
Wu, M., Adams, R., & Haldane, S. (2006). ConQuest: Multi-aspect test software
[Computer software]. Melbourne: Australian Council for Educational
Research.
159
APPENDIX A
List of Published ECB Cases in the Study Sample
Case
Number
Reference
1_CU1 Andrews, A.B., Motes, P.H., Floyd, A.G., Flerx, V.C., and Lopez-De
Fede, A. (2005). Building evaluation capacity in community-
based organizations: Reflections of an empowerment
evaluation team. Journal of Community Practice, Vol. 13(4).
2_CU2 Arnold, M.E. (2006). Developing evaluation capacity in extension 4-H
field faculty: A framework for success. American Journal of
Evaluation, Vol. 27(2).
3_CU3 Randall, A.L. (2005). Training school counselors in program evaluation.
Professional School Counseling, Vol. 9(1).
4_CU4 Atkinson, D.D., Wilson, M., and Avula, D. (2005). A participatory
approach to building capacity of treatment programs to
engage in evaluation. Evaluation and Program Planning, Vol.
28.
5_CU6 Brandon, P.R., and Higa, T.A.F. (2004). An empirical study of building
the evaluation capacity of K-12 site-managed project
personnel. The Canadian Journal of Program Evaluation,
Vol. 19(1).
6_CU7 Brown, N.L., Luna, V., Ramirez, M.H., Vail, K.A., and Williams, C.A.
(2005). Developing an effective intervention for IDU women:
A harm reduction approach to collaboration. AIDS Education
and Prevention, 17(4), 317-333.
7_CU8 Brown, R.E., and Reed, C.S. (2002). An integral approach to outcome
evaluation training. American Journal of Evaluation, 23(1), 1-
17.
160
8_CU9 Campbell, R., Dorey, H., Naegeli, M., Grubstein, L.K., Bennett, K.K.,
Bonter, F., Smith, P.K., Grzywacz, J., Baker, P.K, and
Davidson, W.S. II. (2004). An empowerment evaluation
model for sexual assault programs: Empirical evidence of
effectiveness. American Journal of Community Psychology,
34 (3/4), 251-262.
9_CU10 Carden, F., and Earl, S. (2007). Infusing evaluative thinking as a
process use: The case of the International Development
Research Center (IDRC). New Directions for Evaluation, 16,
61 – 73.
10_CU11 Chinman, M., Hunter, S.B., Ebener, P., Paddock, S.M., Stillman, L.,
Imm, P., and Wandersman, A. (2008). American Journal of
Community Psychology, 41, 206 – 224.
11_CU12 Cohen, C. (2006). Evaluation learning circles: a sole proprietor‘s
evaluation capacity-building strategies. New Directions for
Evaluation, 111, 85 – 93.
12_CU13 Compton, D., Baizerman, M., Preskill, H., Rieker, P., and Miner, K.
(2001). Developing evaluation capacity while improving
evaluation training in public health: the American Cancer
Society‘s Collaborative Evaluation Fellow‘s Project.
Evaluation and Program Planning, 24, 33 – 40.
13_CU14 Diaz-Puente, J.M., Yague, J.L., and Afonso, A. (2008). Building
evaluation capacity in Spain: a case study of rural
development and empowerment in the European Union.
Evaluation Review, 32(5), 478 – 506.
14_CU17 Fetterman, D., and Bowman, C. (2002). Experiential education and
empowerment evaluation: Mars Rover Educational Program
case example. The Journal of Experiential Education, 25(2),
286 – 295.
161
15_CU18 Fetterman, D. (2001). Empowerment evaluation and self-determination:
A practical approach toward program improvement and
capacity building. In N. Schneiderman, M.A. Speers, J.M.
Silva, H. Tomes, & J.H. Gentry (Eds.), Integrating
behavioural and social sciences with public health. (pp. 321 –
350). Washington, D.C. US: American Psychological
Association.
16_CU21 Flaspohler, P., Wandersman, A., Keener, D., Maxwell, K.N., Ace, A.
Andrews, A., & Holmes, B. (2003). Promoting program
success and fulfilling accountability requirements in a state-
wide community-based initiative. Journal of Prevention and
Intervention in the Community, 26(2), 37 – 52.
17_CU22 Harper, G.W., Contreras, R., Bangi, A., & Pedraza, A. (2003).
Collaborative process evaluation. Journal of Prevention and
Intervention in the Community, 26(2), 53 – 69.
18_CU23
19_CU24
20_CU25
21_CU25ab
Hoole, E., & Patterson, T.E. (2008). Voices from the field: Evaluation
as part of a learning culture. In J.G. Carman & K.A.
Fredericks (Eds.), Non-profits and evaluation. New
Directions for Evaluation, 119, 93 – 113.
22_CU27 Katz, S., Sutherland, S., & Earl, L. (2002). Developing an evaluation
habit of mind. The Canadian Journal of Program Evaluation,
17(2), 103 – 119.
23_CU30 King, J. (2002). Building the evaluation capacity of a school district.
New Directions for Evaluation, 93, 63 – 80.
24_CU32 Lennie, J. (2005). An evaluation capacity-building process for
sustainable community IT initiatives: Empowering and
disempowering impacts. Evaluation, 11(4), 390 – 414.
25_CU34 MacLellan-Wright, M.F., Patten, S., dela Cruz, A.M., & Flaherty, A.
(2007). A participatory approach to the development of an
evaluation framework: Process, pitfalls, and payoffs. The
Canadian Journal of Program Evaluation, 22(1), 99 – 124.
162
26_CU35 Maher, C.A. (1981). Training of managers in program planning and
evaluation. Journal of Organizational Behavior Management,
3(1), 45 – 56.
27_CU36 Mathews, M., & Lynch, A. (2007). Increasing research skills in rural
health boards: An evaluation of a training program for
Western Newfoundland. The Canadian Journal of Program
Evaluation, 22(2), 41 – 56.
28_CU37 McDonald, B., Rogers, P., & Kefford, B. (2003). Teaching people to
fish? Building the evaluation capability of public sector
organizations. Evaluation, 9(1), 9 – 29.
29_CU39 Milstein, B., Chapel, T.J., Wetterhall, S.F., & Cotton, D.A. (2002).
Building capacity for program evaluation at the Centers for
Disease Control and Prevention. New Directions for
Evaluation, 93, 27 – 46
30_CU40 Moon, S.M. (1996). Using the Purdue three-stage model to facilitate
local program evaluations. Gifted Child Quarterly, 40(3), 121
– 128.
31_CU41 Myrick, R., Lemell, A., Aoki, B., Truax, S., & Lemp, G. (2005). Best
practices for community collaborative research. AIDS
Education and Prevention, 17(4), 400 – 404.
32_CU42 Naccarella, L., Pirkis, J., Kohn, F., Morley, B., Burgess, P., & Blashki,
G. (2007). Building evaluation capacity: Definitional and
practical implications from and Australian case study.
Evaluation and Program Planning, 30, 231 – 236.
33_CU43 Nagao, M., Kuji-Shikatani, K., & Love, A.J. (2005). Preparing school
evaluators: Hiroshima pilot test of Japan Evaluation Society‘s
accreditation project. The Canadian Journal of Program
Evaluation, 20(2), 125 – 155.
34_CU44a O‘Sullivan, R.G., & D‘Agostino, A. (2002). Promoting evaluation
through collaboration: Findings from community-based
163
programs for young children and their families. Evaluation,
8(3), 372 – 387.
35_CU46 Ploeg, J., de Witt, L., Hutchison, B., Hayward, L., & Grayson, K.
(2008). Evaluation of a research mentorship program in
community care. Evaluation and Program Planning, 31, 22 –
33.
36_CU47 Porteous, N.L., Sheldrick, B.J., & Stewart, P.J. (1999). Enhancing
managers‘ evaluation capacity: A case study from Ontario
public health. The Canadian Journal of Program Evaluation,
Special Issue, 137 – 154.
37_CU48 Ryan, K.E., Geissler, B., & Knell, S. (1996). Progress and
accountability in family literacy: Lessons from collaborative
approach. Evaluation and Program Planning, 19(3), 263 –
272.
38_CU49 Schnoes, C.J., Murphy-Berman, V., & Chambers, J.M. (2000).
Empowerment evaluation applied. American Journal of
Evaluation, 21(1), 53 – 64.
39_CU50 Secret, M., Jordan, A., & Ford, J. (1999). Empowerment evaluation as a
social work strategy. Health and Social Work, 24(2), 120 –
127.
40_CU52 Stevenson, J.F., Florin, P., Mills, D.S., & Andrade, M. (2002). Building
evaluation capacity in human services organizations: A case
study. Evaluation and Program Planning, 25, 233 – 243.
41_CU53 Suarez-Balcazar, Y., Orellana-Damacela, L., Portillo, N., Sharma, A.,
& Lanum, M. (2003). Implementing an outcomes model in
the participatory evaluation community initiatives. Journal of
Prevention & Intervention in the Community, 26(2), 5 – 20.
42_CU54 Sullins, C.D. (2003). Adapting the empowerment evaluation model: A
mental health drop-in center case example. American Journal
of Evaluation, 24(3), 387 – 398.
43_CU55 Tang, H., Cowling, D.W., Koumjian, D.W., Roeseler, A., Lloyd, J., &
Rogers, T. (2002). Building local program evaluation
capacity toward a comprehensive evaluation. New Directions
164
for Evaluation, 95, 39 – 56.
44_CU56 Taut, S. (2007). Studying self-evaluation capacity building in a large
international development organization. American Journal of
Evaluation, 28(1), 45 – 59.
45_CU57 Trevisan, M. (2001). Implementing comprehensive guidance program
evaluation support: Lessons learned. Professional School
Counseling, 4(3), 225 – 229.
46_CU59 Valery, R., & Shakir,S. (2005). Evaluation capacity building and
humanitarian organization. Journal of Multidisciplinary
Evaluation, 3, 78 – 112.
47_CU60 Willer, B.S., Bartlett, D.P., & Northman, J.E. (1978). Simulation as a
method for teaching program evaluation. Evaluation and
Program Planning, 1, 221 – 228.
48_CU61 Yawson, R.M., Amoa-Awua, W.K., Sutherland, A.J., Smith, D.R., &
Noamesi, S.K. (2006). Developing a performance
measurement framework to enhance the impact orientation of
the Food Research Institute, Ghana. R&D Management,
36(2), 161 – 172.
49_P58 Mayberry, R.M., Daniels, P., Yancey, E.M., Akintobi, T.H., Berry, J.,
Clark, N., & Dawaghreh, A. (2009). Enhancing community-
based organizations‘ capacity for HIV/AIDS education and
prevention. Evaluation and Program Planning, 32, 213 –
220.
50_P59 Fleming, M.L., & Easton, J. (2010). Building environmental educators‘
evaluation capacity through distance education. Evaluation
and Program Planning, 33, 172 – 177.
51_P60 Bourgeois, I., Hart, R.E., Townsend, S.H., & Gagne, M. (2011). Using
hybrid models to support the development of organizational
evaluation capacity: A case narrative. Evaluation and
Program Planning, 34, 228 – 235.
52_P61 Kapucu, N., Healy, B.F., & Arslan, T. (2011). Survival of the fittest:
Capacity building for small nonprofit organizations.
165
Evaluation and Program Planning, 34, 236 – 245.
53_P63 Satterland, T.D., Treiber, J., Kipke, R., Kwon, N., & Cassady, D.
(2013). Accommodating diverse clients‘ needs in evaluation
capacity building: A case study of the Tobacco Control
Evaluation Center. Evaluation and Program Planning, 36, 49
– 55.
54_P64 Akintobi, T.H., Yancey, E.M., Daniels, P., Mayberry, R.M., Jacobs, D.,
& Berry, J. (2012). Using evaluability assessment and
evaluation capacity-building to strengthen community-based
prevention initiatives. Journal of Health Care for the Poor
and Underserved, 23(2), 33 – 48.
55_P65 Compton, D.W. (2009). Managing studies versus managing for
evaluation capacity-building. In D.W. Compton & M.
Baizerman (Eds.), Managing program evaluation: Towards
explicating a professional practice. New Directions for
Evaluation, 121, 55 – 69.
56_P66 Baron, M.E. (2011). Designing internal evaluation for small
organization with limited resources. In B.B. Volkov and M.E.
Baron (Eds.), Internal evaluation in the 21st century. New
Directions for Evaluation, 132, 87 – 99.
57_P67 Rotondo, E. (2012). Lessons learned from evaluation capacity building.
In S. Kushner & E. Rotondo (Eds.), Evaluation voices from
Latin America. New Directions for Evaluation, 134, 93 – 101.
58_P68 Adams, J. & Dickinson, P. (2010). Evaluation training to build capacity
in the community and public health workforce. American
Journal of Evaluation, 31(3), 421 – 433.
59_P69 Rogers, S.J., Ahmed, M., Hamdallah, M., & Little, S. (2010). Garnering
grantee buy-in on a national cross-site evaluation: The case of
ConnectHIV. American Journal of Evaluation, 31(4), 447 –
462.
60_P70 Garcia-Iriarte, E., Suarez-Balcazar, Y., Taylor-Ritzler, T., & Luna, M.
(2011). A catalyst-for-change approach to evaluation capacity
166
building. American Journal of Evaluation, 32(2), 168 – 182.
61_P72 Anderson, C., Chase, M., Johnson, J., Mekiana, D., McIntyre, D.,
Ruerup, A., & Kerr, S. (2012). It is only new because it has
been missing for so long: Indigenous evaluation capacity
building. American Journal of Evaluation, 33(4), 566 – 582.
62_P75 Hanwright, J., & Makinson, S. (2008). Promoting evaluation culture.
Evaluation Journal of Australasia, 8(1), 20 – 25.
63_P76 Karlsson, P., & Beijer, E. (2008). Evaluation workshops for capacity
building in welfare work. Evaluation, 14(4), 483 – 498.
166
APPENDIX B
Coding Form
Instrument for ECB Context, Content and Implementation
and Assessment Tool for Rigor of ECB Quantitative Measurement Practice
This coding form aims to assess the Evaluation Capacity Building (ECB) practices among
organizations using published ECB reports. The primary aim is to:
Document ECB context, content and implementation variables. This refers to a checklist
where the descriptive characteristics of ECB practice are coded. The items were developed
mostly from Labin, Duffy, Meyers, Wandersman and Lesesne (2012) Integrative ECB model.
Measure the Rigor of ECB quantitative measurement practice. This refers to a rating scale to
determine the levels of ECB quantitative measurement practices. This is applied to reports that
used the quantitative approach. The items were developed partly from Braverman (2013).
This instrument is divided into several parts. The first part documents the profile of ECB
being reported, allowing to record the contextual facts of the implemented ECB as well as
about the organization and its programs. The second part documents the context, content and
implementation variables of ECB. The third part measures the rigor of ECB quantitative
measurement practice for reports that use quantitative approach.
Scoring
The ECB, organization and program profile is not scored but determines the characteristics
and typologies of ECB.
The Rigor of ECB Quantitative Measurement Practice and the Quality and Credibility of ECB
Qualitative Evidence Practice are scored using the rubrics provided.
Note: The content items were mostly adapted from Labin et al. (2012). Additions
were: Part 1, participant focus (Item 7), ECB contact duration (Item 8),
outcomes expectations (Item 9), leadership collaboration (Item 10) and Items 19 -
23. Part 2, Item 2 (F to K). Part 3 was developed by the researcher.
167
Instruction: Write or tick boxes when appropriate.
PART 1 Evaluation Capacity Building, Organization and Program Profile
Reference Number
(For Atlasti Code Tracking)
1. Title of ECB/Article
2. Authors
3. Journal/Publisher
4. Country
5. Year the report was published
6. Year ECB was completed
7. ECB Domain □ Education
□ Health
□ Child/youth development
□ Community/rural development
□ Policy research
8. Type of organization □ Non-profit
□ For-profit □ Government
□ School/school district
□ University only
□ Multiple types □ Other:
9. Technological capability
□ Presence of IT infrastructure (computers)
□ Availability of IT communications (internet)
□ Availability of IT skilled personnel
□ Other:
10. Paradigm of ECB report □ Quantitative
□ Qualitative
If both: □ Mixed methods
□ Multiple methods
11. ECB report or evaluation made by □ ECB practitioner/facilitator
□ External or independent evaluator
□ Recipient organization internal evaluator □ Other:
168
12. Evaluator Affiliation □ University □ Private consultancy
□ Internal evaluation unit
□ Other:
13. Report on □ Single organization
□ Multiple organizations
14. ECB Initiator □ Funder/Grant maker □ Organization/Grantee
□ Government
□ University
□ Research facility
15. Purpose of Measurement □ Establish baseline information or after action information □ Inform ECB Design
□ Guide implementation for adjustments
□ Evaluate ECB impact
16. Program Implementation □ Single program
□ Multiple programs
17. Program Site □ One-site
□ Multi-site
18. Program delivery □ Services
□ Education or capability building □ Advocacy
□ Research
19. ECB Intervention Description
20. ECB Stakeholders
21. Program Description
22. ECB View □ Training
□ Non-training
23. ECB Data Collection Approach □ Questionnaire/Survey
□ Individual Interview □ Focus groups
169
PART 2 ECB Content and Implementation Checklist
1. ECB content (Individual-level) A. □ Awareness/Attitudes
B. □ Terms, approaches or methods
C. □ Logic models D. □ Evaluation plan
E. □ How to do an evaluation
F. □ Management, analysis, interpretation or use of data
G. □ Program planning H. □ Program implementation
2. ECB content (Organization-level) A. □ Organization evaluation practices
B. □ Evaluation Readiness/Willingness
C. □ Building leadership support
D. □ Building culture for evaluation E. □ Mainstreaming evaluation
F. □ Creating/Strengthening evaluation policy requirements
G. □ Creating/Strengthening evaluation structures (teams, job roles
and Responsibilities, evaluation units) H. □ Creating/Strengthening evaluation systems (databases, shared
measurement tools, common metrics like KPIs, monitoring
systems and tools
I. □ Creating/Strengthening support for evaluation resources J. □ Improving organizational evaluation social context
K. □ Other:
3. Type of strategies reported A. □ Training/Teaching
B. □ Technical Assistance/Coaching/Support/Consultations
C. □ Involvement in evaluation D. □ Printed materials
4. Mode of strategies reported A. □ Face-to-face only
B. □ Face-to-face combined with other modes
C. □ Combination not including face-to-face
D. □ Other:
5. Intended target of ECB A. □ Individual only B. □ Organization only
C. □ Individuals and organizations
D. □ Not reported
6. Evaluation of ECB work □ Any evaluation reported
□ Not reported
7. Participant Focus A. □ Program staff or ground staff B. □ Program managers
C. □ Program beneficiaries
D. □ Organization top management and leadership
8. ECB contact duration A. □ One day or less engagement (teaching, training, workshop or
TA) B. □ More than one day engagement
C. □ Multiple times a year or once a year for multiple years
9. Intervention design (outcomes
expectation)
A. □ ECB as teaching/training but no ECB program design
B. □ ECB as teaching/training component with explicit ECB program design
10. Leadership collaboration A. □ ECB process did not involve leadership
B. □ ECB process involved leadership
170
PART 3 Rigor of ECB Measurement Practice
Criterion
(Progress variables)
Outcome Space
(Descriptions of evidence across progress variables)
Cannot be
determined
(0)
Low
(1)
Moderate
(2)
High
(3)
1. Scope of variables
measured
Not reported. Measured individuals‘
evaluation capacity (which may include for
example awareness,
knowledge, skills or
attitudes.
Measured
individuals‘ evaluation
capacity and the
organizational
evaluation capacity that
includes
evaluation
leadership, policies, systems,
resources or
structures.
Measured
individuals‘ and organizational
capacities as well
as the
organization‘s contextual
measures such as
social climate,
learning capacity, culture or social
network.
2. Obtaining
evidence
Not reported. Indirect measurement or
testing like self-report
or self-rating.
Direct
measurement such
as obtained by
observation and direct testing.
Combination of
direct and indirect
measurement.
3. Reliability of
measurement tools
Not reported. Uses tools with
unreported/unmeasured
reliability.
Uses tools with
reported reliability
and within acceptable values.
Standardized or
validated
measurement tools
are used with
justification of its appropriate
contextual use.
[Lack of
justification is
equivalent to low].
4. Utilization of ECB
measures
Not reported. ECB measures are used
to establish baseline
information to inform ECB design.
ECB measures did
not only inform
ECB design but also used to guide
ECB
implementation.
For example, adjustments in
ECB approach.
ECB measures are
ultimately used to
evaluate ECB impact at the end
of the program on
top of informing
ECB design and guiding
implementation.
5. Representativeness Not reported. The measurement used
non-probability sample like purposive
sampling, e.g. key
informants only.
The measurement
used probability sample with the
use of some form
of random
sampling techniques.
The measurement
used each case units of the
population of
interest, e.g. all
members of the organization.
171
6. Timing of measurement
Not reported. Measurements were only made once at the
beginning or at the end
of ECB project.
Measurements were made at the
beginning and the
end of ECB; may also include
measures during
ECB.
Measurements were made over an
extended period
time after ECB to see its changes in
the long term.
7. Validity of inference from
obtained measures
Not reported. The conclusions are at best anecdotal with
descriptions of
evaluation capacities
but no measures to back up claims.
Descriptions of evaluation
capacities were
made and backed
up by figures from measures; may
also extend to
comparing
measures.
Conclusions were carried out with
sound measures
and statistical
procedures that warrant statistical
inference, e.g.
hypothesis testing
or modelling.
8. Measurement
design used
Not reported. The measurement
design use simple
observational method
(no control or comparison groups and
no randomization of
case units made).
The measurement
design used
comparison groups
but lacks random assignment.
The elements of
experimental
design are present
with control and comparison groups
and random
assignments made.
END
Minerva Access is the Institutional Repository of The University of Melbourne
Author/s:
PONCE, ROY
Title:
Measurement practice in Evaluation Capacity Building
Date:
2014
Persistent Link:
http://hdl.handle.net/11343/56512
File Description:
Measurement Practice in Evaluation Capacity Building