complexities of measuring change in psychotherapy

Complexities of measuring change in psychotherapy

Chris Evans

Acknowledgements

Phil Richardson, Kevin Jones & othersJan Lees, Mark Freestone, Nick Manning

& othersMichael Barkham and many othersMark Ashworth, Mel Shepherd, Susan

Robinson, Maria Kordowicz & othersSusan McPhersonJo-anne Carlyle

Classical psychometric model

We all have a position on an unmeasurable (“latent”) dimension of interest and of change (“true” value).

Quality of measurement a function of two issues:ReliabilityValidity

“No validity without reliability”

ReliabilityExtent to which the measure is uncontaminated by

random noise

Example 1: (“hard measurement”) working with obesity and using a poor scales to measure people’s weight it may fluctuate a lot entirely randomly.

Example 2 (“our measurement”) measuring depression using a visual analogue rating scale the measurement may be contaminated by imprecision in where the person places their mark (and much else potentially).

ValidityExtent to which the measure measures what it is supposed

to and is uncontaminated by systematic corrupting by measuring other non-random issues

Example 1: obesity – measuring people’s weight is pretty useless unless you also measure height as obesity is (largely) a function of weight and height so measuring the one without the other leaves your measure systematically biased: invalid.

Example 2: in a multi-item measure of depression an item asking about weight loss will be systematically affected by recent deliberate dieting, drinking alcohol rather than eating wisely or by serious physical illness causing weight loss (or famine but rarely in the western world).

Reliability: a graphical modelcircles are “latent”,

unmeasurable, variablessquares are measurablesstraight arrows show

directional influenceeverything is nomothetic,......i.e. something on which

each person has a value

Psychometrics: classical modelassume one source of

common variance...... the latent trait to measure… only source of

covariation between itemseach item is also affected by

“error”errors are independent, and... uncorrelated with the

latent trait of interest

Cronbach’s alphaReliability: proportion of

the measured variation (sum of the boxes)

… from the latent trait... not to the sources of

errorestimated as coefficient

alpha, proportion of covariance to variance

Challenges of “our measurement”

We have little time or money for measuring

What we measure is often either complex (“quality of life”) or

idiosyncratic (“recovering from death of partner bringing back abuse in childhood and early death of abusing parent”).

Recent measures Format:

Short(ish), multi-item, self-report measures

Intention Not so much designed to provide strong measurement of a

unidimensional latent variable but … to provide rapid coverage of a broad range of issues likely

to cover many clients’ likely change. Typical e.g.s:

Brief Symptom Inventory CORE-OM OQ-45.

Typical measuresMultiple items, e.g.

“I have felt terribly alone and isolated”

Time focus “Over the last week”

Use rating anchors by frequency:“Not at all”, “Only occasionally”, “Sometimes”,

“Often”, “Most or all the time”

Or intensity“Not at all” to “Extremely”

Issues about items I have felt I have someone to turn to for support when needed What does “turning to” involve? What is “support”? How much

does “when needed” limit applicability? I have felt O.K. about myself How OK is OK?!

I have felt able to cope when things go wrong How wrong is wrong? What is coping? (Quite a few European

languages don’t have a verb “to cope”) Tension and anxiety have prevented me doing important

things What if it was only tension? Or only anxiety? How important do

things have to be to be important?

Issues about time frame

“Over the last week”Do people really anchor to that?Could it mean:

since Sunday?since Monday?the last seven days?

Issues about anchorsNot at allOnly occasionallySometimesOftenMost or all the time

Is my “Only occasionally” your “Sometimes”?

“Panel” change model

1t

2t

Simple change variance model

Instead of modelling each occasion separately

… look at the variance of the differences between

… observed scores

… for each individual

Get internal reliability of item change

Item change

Binary, Y/N item now have three possible change scores:

-1, 0, +1Three level item: five scores:

-2, -1, 0, +1, +2Four level item: seven scores:

-3, -2, -1, 0, +1, +2, +3n-level item: always 2n – 1 differences

Real data for the simple modelExploratory, pragmatic RCT

“Slim” paradigm RCT:Twelve weeks of

Group based AT cf.Treatment as usual

Design was N = 120 (60 per arm)Minimisation randomisationRichardson, Jones, Evans, Stevens & Rowe

(2007) An exploratory randomised trial of group based art therapy as an adjunctive treatment in severe mental illness. Journal of Mental Health 16(4): 483-491.

Test: BrSI (k=53)n Low α Up n Low α Up

T1 43 .97 .98 .99 38 .92 .95 .97

T2 36 .96 .97 .98 34 .89 .93 .96

T3 22 .93 .96 .98 17 .90 .95 .98

1-2 34 .90 .94 .96 31 .87 .92 .95

1-3 22 .83 .90 .95 15 .76 .87 .95

Test2: SANS (k=24)n Low α Up n Low α Up

1 46 .90 .93 .96 42 .81 .87 .92

2 38 .89 .93 .96 35 .81 .88 .93

3 22 .93 .96 .98 18 .48 .71 .87

1-2 38 .81 .88 .93 35 .78 .86 .92

1-3 22 .86 .92 .96 18 .62 .79 .91

But … IIP (k=32)n Low α Up n Low α Up

T1 44 .86 .90 .94 42 .80 .87 .92

T2 37 .87 .90 .95 35 .84 .90 .94

T3 22 .78 .87 .94 18 .82 .90 .96

1-2 36 .64 .76 .86 34 .29 .54 .74

1-3 21 .47 .69 .85 17 .60 .79 .91

Rating? BPRS (k=19)n Low α Up n Low α Up

1 46 .64 .75 .85 43 .53 .68 .81

2 38 .62 .75 .85 35 .40 .62 .78

3 22 .62 .78 .89 18 .47 .70 .87

1-2 38 .60 .75 .85 35 .20 .48 .70

1-3 22 .66 .80 .90 18 -.10 .39 .73

Ratings: HoNOS (k=12)n Low α Up n Low α Up

1 46 .58 .72 .83 43 .45 .64 .78

2 38 .46 .65 .8 35 .45 .65 .8

3 22 .44 .68 .85 18 .23 .58 .82

1-2 38 -.20 .22 .54 35 -1.24 -.43 .18

1-3 22 .07 .47 .74 18 -1.15 -.16 .49

Routine test-retest (CORE, k=34, students)

n Low α Up

1 53 .92 .94 .96

2 41 .94 .96 .98

1-2 40 .65 .77 .86

Diversity & complexity of change Naturalistic study of Therapeutic Communities in the

UK Borderline Syndrome Index Lees, Evans, et al. (2006) Who comes into therapeutic

communities? A description of the characteristics of a sequential sample of client members admitted to 17 therapeutic communities Therapeutic Communities 27(3): 411-433

Lees, Evans, et al. (2005) A cross-sectional snapshot of therapeutic community client members Therapeutic Communities 26(3): 295-314

Change boxplots: men

Drug TC Prison TC Residential TC Day TC

-50

-40

-30

-20

-10

010

TC group

Cha

nge

n = 34 9 13 7

BoSI change to 90 days: men, sequential

Change boxplots: women

Private sector Drug TC Residential TC Day TC

-30

-20

-10

010

TC group

Cha

nge

n = 10 1 32 19

BoSI change to 90 days: women, sequential

Jacobson plot: men

0 10 20 30 40 50

010

2030

4050

Initial BoSI score

BoS

I sc

ore

at 9

0 da

ys

Drug TCPrison TCResidential TCDay TC

Jacobson, women

0 10 20 30 40 50

010

2030

4050

Initial BoSI score

BoS

I sc

ore

at 9

0 da

ys

Private sectorDrug TCResidential TCDay TC

Cat’s cradle plot: men

0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Male, sequential data, drug TCs only

Cat’s cradle: men

0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Male, sequential data, residential TCs only

Cat’s cradle, men

0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Male, sequential data, day TCs only

Cat’s cradle, women

0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Female, sequential data, private sector TCs only


0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Female, sequential data, drug TCs only


0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Female, sequential data, residential TCs only


0 100 200 300 400 500

010

2030

4050

Days

BoS

I sc

ore

BoSI scores: Female, sequential data, day TCs only

Idiographic & hybrid measures “Patient generated” measures:

Problem rating & target rating Personal questionnaire

PSYCHLOPS (from MYMOPS) www.psychlops.org Ashworth, Robinson, et al. (2005) Measuring mental health

outcomes in primary care: the psychometric properties of a new patient-generated outcome measure, 'Psychlops' ('Psychological Outcome Profiles') Primary care mental health 3: 261-270.

Ashworth, Evans, et al. (2009) Measuring psychological outcomes after cognitive behaviour therapy in primary care: a comparison between a new patient-generated measure ‘PSYCHLOPS’ (Psychological Outcome Profiles) and ‘HADS’ (Hospital Anxiety and Depression Scale) Journal of Mental Health 18(2): 169-177.

Conventional psychometrics

110 pre and post PSYCHLOPS from primary care largely CBT interventions

Cronbach alpha t1 .79 and t2 .87 (cf. usual .94/.95 for CORE-OM)

Change effect size large 1.53 cf. 1.06 for CORE-OM (p <.001)

Correlations with CORE-OM .48 to .61

Conclusions Applying cross-sectional psychometric models (same

for IRT/Rasch) is hiding complexity in our change data Group summaries are hiding non-linearity and diversity

in change profiles Nomothetic questionnaires should be complemented

with patient generated measures (PSYCHLOPS/PQ) We need to stop hiding the complexity of our therapies! … but we need a paradigm shift if we’re to manage the

organisational anxieties that provokes … and we need money and time to explore complexity … and we won’t get money/time without a paradigm

shift that answers questions

Thanks!

[email protected]

complexities of measuring change in psychotherapy

Documents