evaluation revisited: improving the quality of evaluative practice by embracing complexity

Implications of complication and complexity for evaluation

Patricia J. RogersCIRCLE (Collaboration for Interdisciplinary Research, Consulting and Learning in Evaluation) Royal Melbourne Institute of Technology, [email protected]

Evaluation Revisited: Improving the quality of evaluative practice by embracing complexity

Utrecht, the Netherlands May 20-21 2010

2

The naïve experimentalism view of evaluation and evidence-based policy and practice

DO THING ‘A’

DECIDE TO DO THING ‘A’

FIND THAT THING ‘A’ WORKS

SINGLE STUDY

SEVERAL STUDIES

PRACTITIONERS

POLICYMAKERS

RESEARCHERS

INTENDED BENEFICIARIES

BENEFIT AS EXPECTED

3

But things are often more complicated or complex than this …

4

What can (and does) go wrong with naïve experimentalism

DO THING ‘A’

DECIDE TO DO THING ‘A’

FIND THAT THING ‘A’ WORKS

PRACTITIONERS

POLICYMAKERS

RESEARCHERS

NARROW STUDIES THAT IGNORE IMPORTANT

EVIDENCE

DIFFERENTIAL EFFECTS – THING ‘A’

ONLY WORKS IN SOME CONTEXTS

MISREPRESENTATION OF RESULTS

RANDOM ERROR

NOT FEASIBLE IN OTHER LOCATIONS

NOT SCALEABLE

NEGATIVE EFFECTS IGNORED

5

An alternative view of knowledge- building

6

An approach to evaluation and evidence-based policy and practice that recognizes the complicated and complex aspects of situations and interventions

What is needed? What is possible?What works? What works for whom in what situations?

What is working?

Researchers and evaluators

Community and civil society

Practitioners and managers

Policymakers

7

Advocacy for RCTs (Randomised Controlled Trials) in development evaluation

“J-PAL is best understood as a network of affiliated researchers … united by their use of

the randomized trial methodology”

20062003

2010

“Advocated more use of RCTs

Argued that experimental and

quasi-experimental designs had a comparative

advantage because they provide an

unbiased numeric estimate of impact

TED talk

2009

Used leeches to illustrate the

alternative to using RCTs as evidence

8

Distinguishing between RCTs and naïve experimentalism

RCT (Randomised Controlled Trial)

– one of many research designs that can be suitable

– involves randomly assigning (truly randomly, not ad hoc) potential participants to either receive the treatment (or one of several version of the treatment) or to be in the control group (who might receive nothing or the current standard treatment)

– in ‘double blind’ RCTs neither the participants nor the researchers know who is in the treatment group (eg the control group get pills that look the same and the details of the group are kept secret until after the results are recorded)

Naïve experimentalism

– believes that RCTs always provide the best evidence (the ‘gold standard’ approach)

– ignores (or is ignorant) of the potential risks in using RCTs and the other approaches that can be appropriate

9

Exploring complication and complexity in evaluation

20061997

2008

2008

2009

2010

10

Some unhelpful ways ‘complex’ is used

• Difficult – eg little available data, hard to get additional data

• Beyond scrutiny – eg too technical for others to understand or challenge

• Ad hoc – eg too overwhelmed with implementation to think about planning or follow through

11

Two framings of simple, complicated and complex

Glouberman and Zimmerman 2002

Kurtz and Snowden 2003

Simple Tested ‘recipes’ assure replicability

Expertise is not needed

The domain of the ‘known’,

Cause and effect are well understood,

Best practices can be confidently recommended,

Complicated Success requires high level of expertise in many specialized fields + coordination

The domain of the ‘knowable’

Expert knowledge is required,

Complex Every situation is unique – previous success does not guarantee success

Expertise can help but is not sufficient; relationships are key

The domain of the ‘unknowable’,

Patterns are only evident in retrospect.

12

Using the framework

Can be used to refer to a situation or to an intervention

Not useful as a way of classifying the whole situation or intervention

most useful to consider aspects of interventions

Not normative

complex is not better than simple

simple interventions can still be difficult to do well, or to get good data about

13

Simple can sometimes be appropriate

“It can scarcely be denied that the supreme goal of all theory is to

make the irreducible basic elements as simple and as few as

possible without having to surrender the adequate

representation of a single datum of experience.”

Albert Einstein, Oxford University, 1933

“Everything should be made as simple as

possible, but no simpler.”

14

Implications of complicated and complex situations and interventions for evaluation

1. Focus

2. Governance

3. Consistency

4. Necessariness

5. Sufficiency

6. Change trajectory

7. Unintended outcomes

(Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)

15

1. Focus - implications for evaluation?

Simple Single set of objectives

Complicated Different objectives valued by different stakeholders

Multiple, competing imperatives

Objectives at multiple levels of a system

Complex Emergent objectives

Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)

16

Intervention

Longer term outcomes Shorter term outcomes at

system level Activities at system level

Activities at site level

Activities at client level

Shorter term outcomes at site level

Shorter term outcomes at client level

Focus - Objectives at multiple levels of a system


17

2. Governance - implications for evaluation?

Simple Single organization

Complicated Specific organizations with formalized requirements

Complex Emergent organizations working together in flexible ways


18

3. Consistency - implications for evaluation?

Simple Standardized

Complicated Adapted

Complex Adaptive


19

What interventions look like – teaching reading

Simple – best practice

Teachers select a reading program which has been shown in RCTs to be effective (eg Reading First program - $1b p.a)

Complicated - adapted

Teachers identify children’s learning stage and provide exercises to match this (eg Victorian Catholic Education Systems Literacy Assessment Project)

Griffin, P. 2009 ‘Ambitious new project to raise literacy and numeracy levels in Victorian Schools. http://newsroom.melbourne.edu/studio/ep-29Griffin, P., Murray, L., Care, E., Thomas, A., & Perri, P. (2009). Developmental Assessment: Lifting literacy through Professional Learning Teams, Assessment in Education. In press

20

What interventions look like – supporting small businesses

Complicated – what are the ‘active ingredients’

An RCT compares the effect on small businesses of providing

(i) business training

(ii) savings incentive

(iii) wages support

(iv) business training and savings incentive

(v) business training and wages support

(vi) savings incentive and wages support (McKenzie, 2010)

Complex - adaptive A program works with small businesses to iteratively dentify what they need, and meet this need

21

4. Necessariness - implications for evaluation?

Simple Only way to achieve the intended impacts

Complicated One of several ways to achieve the intended impacts – which can be identified in advance

Complex One of several ways to achieve the intended impacts – which are only evident in retrospect


22

Necessariness – with/without comparisons

A US program to assist poor families through linking them to services found that families receiving the program experienced improvements in welfare — but so did the families that were randomly assigned to a control group that did not receive the visits (St. Pierre and Layzer 1999).

[As this case shows], a good study helps avoid spending funds on ineffective programs and redirects attention to improving designs or to more promising alternatives.’ (When Will We Ever Learn?)

But families in the control group had also accessed services..

The appropriate comparison would have been to compare the costs incurred in the different groups

St Pierre et al, 1996 Report on the National Evaluation of the Comprehensive Child Development Program. Summary and links to reports available at http://www.researchforum.org/project_abstract_166.html

http://www.researchforum.org/project_abstract_166.html

23

5. Sufficiency - implications for evaluation?

Simple Sufficient to produce the intended impacts. Works the same for everyone

Complicated Only works in conjunction with other interventions (previously, concurrently, or subsequently) and/or only works for some people and/or only works in some circumstances – which can be identified in advance

Complex Only works in conjunction with other interventions (previously, concurrently, or subsequently) and/or only works for some people and/or only works in some circumstances – which is only evident in retrospect


If 200 potted plants are randomly assigned to either a treatment group that receives daily water, or to a control that receives none,

: False negatives – the potted plant thought experiment

and both groups are placed in a dark cupboard,

Possible conclusions: Watering plants is ineffective in making them grow.

the treatment group does not have better outcomes than the control.

Better conclusion: Water is not sufficient.

: False positives – Early Head Start

• Early Head Start program - on average effective. Listed as an ‘evidence-based program’

• But unfavourable outcomes for children in families with high levels of demographic risk factors (Mathematica Policy Research Inc, 2002, Westhorp (2008)

Westhorp, G (2008) Development of Realist Evaluation Methods for Small Scale Community Based Settings Unpublished PhD Thesis, Nottingham Trent University

Mathematica Policy Research Inc (2002). Making a Difference in the Lives of Infants and Toddlers and Their Families: The Impacts of Early Head Start, Vol 1. US Department of Health and Human Services.

26

6. Change trajectory - implications for evaluation?

Simple Simple relationship– readily understood

Complicated Complicated relationship– needs expertise to understand and predict

Complex Complex relationship (including tipping points)– cannot be predicted but only understood in retrospect


27

: Complicated dose-response relationship – does stress improve performance?

28

7. Unintended outcomes - implications for evaluation?

Simple Unintended outcomes can be anticipated and monitored

Complicated Different unintended outcomes are likely in particular combinations of circumstances – expertise is needed to anticipate them and identify them

Complex Unintended outcomes cannot be anticipated but only identified (and addressed) as they emerge or in retrospect


29

Some thoughts on how evaluation might help us to understand the complicated and the complex

Issues that may need to be addressed

1. Focus2. Governance3. Consistency4. Necessariness5. Sufficiency6. Change trajectory7. Unintended outcomes

Possible evaluation methods, approaches and methodologies

• Emergent evaluation design that can accommodate emergent program objectives and emergent evaluation issues

• Collaborative evaluation across different stakeholders and organisations

• Non-experimental approaches to causal attribution/contribution that don’t rely on a standardized ‘treatment’

• Realist evaluation that pays attention to the contexts in which causal mechanisms operate

• Realist synthesis that can integrate diverse evidence (including credible single case studies) in different contexts

• ‘Butterfly nets’ to catch unanticipated results

30

Looking forward to hearing about your approaches to addressing these issues in evaluation

evaluation revisited: improving the quality of evaluative practice by embracing complexity

Documents

thing afind

treatment group

practicedo thing adecide

best evidence

control group

double blind rcts

evidencebased policy

evaluation patricia