complexities of measuring change in psychotherapy
DESCRIPTION
Complexities of measuring change in psychotherapy. Chris Evans. Acknowledgements. Phil Richardson, Kevin Jones & others Jan Lees, Mark Freestone, Nick Manning & others Michael Barkham and many others Mark Ashworth, Mel Shepherd, Susan Robinson, Maria Kordowicz & others Susan McPherson - PowerPoint PPT PresentationTRANSCRIPT
Complexities of measuring change in psychotherapy
Chris Evans
Acknowledgements
Phil Richardson, Kevin Jones & othersJan Lees, Mark Freestone, Nick Manning
& othersMichael Barkham and many othersMark Ashworth, Mel Shepherd, Susan
Robinson, Maria Kordowicz & othersSusan McPhersonJo-anne Carlyle
Classical psychometric model
We all have a position on an unmeasurable (“latent”) dimension of interest and of change (“true” value).
Quality of measurement a function of two issues:ReliabilityValidity
“No validity without reliability”
ReliabilityExtent to which the measure is uncontaminated by
random noise
Example 1: (“hard measurement”) working with obesity and using a poor scales to measure people’s weight it may fluctuate a lot entirely randomly.
Example 2 (“our measurement”) measuring depression using a visual analogue rating scale the measurement may be contaminated by imprecision in where the person places their mark (and much else potentially).
ValidityExtent to which the measure measures what it is supposed
to and is uncontaminated by systematic corrupting by measuring other non-random issues
Example 1: obesity – measuring people’s weight is pretty useless unless you also measure height as obesity is (largely) a function of weight and height so measuring the one without the other leaves your measure systematically biased: invalid.
Example 2: in a multi-item measure of depression an item asking about weight loss will be systematically affected by recent deliberate dieting, drinking alcohol rather than eating wisely or by serious physical illness causing weight loss (or famine but rarely in the western world).
Reliability: a graphical modelcircles are “latent”,
unmeasurable, variablessquares are measurablesstraight arrows show
directional influenceeverything is nomothetic,......i.e. something on which
each person has a value
Psychometrics: classical modelassume one source of
common variance...... the latent trait to measure… only source of
covariation between itemseach item is also affected by
“error”errors are independent, and... uncorrelated with the
latent trait of interest
Cronbach’s alphaReliability: proportion of
the measured variation (sum of the boxes)
… from the latent trait... not to the sources of
errorestimated as coefficient
alpha, proportion of covariance to variance
Challenges of “our measurement”
We have little time or money for measuring
What we measure is often either complex (“quality of life”) or
idiosyncratic (“recovering from death of partner bringing back abuse in childhood and early death of abusing parent”).
Recent measures Format:
Short(ish), multi-item, self-report measures
Intention Not so much designed to provide strong measurement of a
unidimensional latent variable but … to provide rapid coverage of a broad range of issues likely
to cover many clients’ likely change. Typical e.g.s:
Brief Symptom Inventory CORE-OM OQ-45.
Typical measuresMultiple items, e.g.
“I have felt terribly alone and isolated”
Time focus “Over the last week”
Use rating anchors by frequency:“Not at all”, “Only occasionally”, “Sometimes”,
“Often”, “Most or all the time”
Or intensity“Not at all” to “Extremely”
Issues about items I have felt I have someone to turn to for support when needed What does “turning to” involve? What is “support”? How much
does “when needed” limit applicability? I have felt O.K. about myself How OK is OK?!
I have felt able to cope when things go wrong How wrong is wrong? What is coping? (Quite a few European
languages don’t have a verb “to cope”) Tension and anxiety have prevented me doing important
things What if it was only tension? Or only anxiety? How important do
things have to be to be important?
Issues about time frame
“Over the last week”Do people really anchor to that?Could it mean:
since Sunday?since Monday?the last seven days?
Issues about anchorsNot at allOnly occasionallySometimesOftenMost or all the time
Is my “Only occasionally” your “Sometimes”?
“Panel” change model
1t
2t
Simple change variance model
Instead of modelling each occasion separately
… look at the variance of the differences between
… observed scores
… for each individual
Get internal reliability of item change
Item change
Binary, Y/N item now have three possible change scores:
-1, 0, +1Three level item: five scores:
-2, -1, 0, +1, +2Four level item: seven scores:
-3, -2, -1, 0, +1, +2, +3n-level item: always 2n – 1 differences
Real data for the simple modelExploratory, pragmatic RCT
“Slim” paradigm RCT:Twelve weeks of
Group based AT cf.Treatment as usual
Design was N = 120 (60 per arm)Minimisation randomisationRichardson, Jones, Evans, Stevens & Rowe
(2007) An exploratory randomised trial of group based art therapy as an adjunctive treatment in severe mental illness. Journal of Mental Health 16(4): 483-491.
Test: BrSI (k=53)n Low α Up n Low α Up
T1 43 .97 .98 .99 38 .92 .95 .97
T2 36 .96 .97 .98 34 .89 .93 .96
T3 22 .93 .96 .98 17 .90 .95 .98
1-2 34 .90 .94 .96 31 .87 .92 .95
1-3 22 .83 .90 .95 15 .76 .87 .95
Test2: SANS (k=24)n Low α Up n Low α Up
1 46 .90 .93 .96 42 .81 .87 .92
2 38 .89 .93 .96 35 .81 .88 .93
3 22 .93 .96 .98 18 .48 .71 .87
1-2 38 .81 .88 .93 35 .78 .86 .92
1-3 22 .86 .92 .96 18 .62 .79 .91
But … IIP (k=32)n Low α Up n Low α Up
T1 44 .86 .90 .94 42 .80 .87 .92
T2 37 .87 .90 .95 35 .84 .90 .94
T3 22 .78 .87 .94 18 .82 .90 .96
1-2 36 .64 .76 .86 34 .29 .54 .74
1-3 21 .47 .69 .85 17 .60 .79 .91
Rating? BPRS (k=19)n Low α Up n Low α Up
1 46 .64 .75 .85 43 .53 .68 .81
2 38 .62 .75 .85 35 .40 .62 .78
3 22 .62 .78 .89 18 .47 .70 .87
1-2 38 .60 .75 .85 35 .20 .48 .70
1-3 22 .66 .80 .90 18 -.10 .39 .73
Ratings: HoNOS (k=12)n Low α Up n Low α Up
1 46 .58 .72 .83 43 .45 .64 .78
2 38 .46 .65 .8 35 .45 .65 .8
3 22 .44 .68 .85 18 .23 .58 .82
1-2 38 -.20 .22 .54 35 -1.24 -.43 .18
1-3 22 .07 .47 .74 18 -1.15 -.16 .49
Routine test-retest (CORE, k=34, students)
n Low α Up
1 53 .92 .94 .96
2 41 .94 .96 .98
1-2 40 .65 .77 .86
Diversity & complexity of change Naturalistic study of Therapeutic Communities in the
UK Borderline Syndrome Index Lees, Evans, et al. (2006) Who comes into therapeutic
communities? A description of the characteristics of a sequential sample of client members admitted to 17 therapeutic communities Therapeutic Communities 27(3): 411-433
Lees, Evans, et al. (2005) A cross-sectional snapshot of therapeutic community client members Therapeutic Communities 26(3): 295-314
Change boxplots: men
Drug TC Prison TC Residential TC Day TC
-50
-40
-30
-20
-10
010
TC group
Cha
nge
n = 34 9 13 7
BoSI change to 90 days: men, sequential
Change boxplots: women
Private sector Drug TC Residential TC Day TC
-30
-20
-10
010
TC group
Cha
nge
n = 10 1 32 19
BoSI change to 90 days: women, sequential
Jacobson plot: men
0 10 20 30 40 50
010
2030
4050
Initial BoSI score
BoS
I sc
ore
at 9
0 da
ys
Drug TCPrison TCResidential TCDay TC
Jacobson, women
0 10 20 30 40 50
010
2030
4050
Initial BoSI score
BoS
I sc
ore
at 9
0 da
ys
Private sectorDrug TCResidential TCDay TC
Cat’s cradle plot: men
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Male, sequential data, drug TCs only
Cat’s cradle: men
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Male, sequential data, residential TCs only
Cat’s cradle, men
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Male, sequential data, day TCs only
Cat’s cradle, women
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Female, sequential data, private sector TCs only
Cat’s cradle, women
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Female, sequential data, drug TCs only
Cat’s cradle, women
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Female, sequential data, residential TCs only
Cat’s cradle, women
0 100 200 300 400 500
010
2030
4050
Days
BoS
I sc
ore
BoSI scores: Female, sequential data, day TCs only
Idiographic & hybrid measures “Patient generated” measures:
Problem rating & target rating Personal questionnaire
PSYCHLOPS (from MYMOPS) www.psychlops.org Ashworth, Robinson, et al. (2005) Measuring mental health
outcomes in primary care: the psychometric properties of a new patient-generated outcome measure, 'Psychlops' ('Psychological Outcome Profiles') Primary care mental health 3: 261-270.
Ashworth, Evans, et al. (2009) Measuring psychological outcomes after cognitive behaviour therapy in primary care: a comparison between a new patient-generated measure ‘PSYCHLOPS’ (Psychological Outcome Profiles) and ‘HADS’ (Hospital Anxiety and Depression Scale) Journal of Mental Health 18(2): 169-177.
Conventional psychometrics
110 pre and post PSYCHLOPS from primary care largely CBT interventions
Cronbach alpha t1 .79 and t2 .87 (cf. usual .94/.95 for CORE-OM)
Change effect size large 1.53 cf. 1.06 for CORE-OM (p <.001)
Correlations with CORE-OM .48 to .61
Conclusions Applying cross-sectional psychometric models (same
for IRT/Rasch) is hiding complexity in our change data Group summaries are hiding non-linearity and diversity
in change profiles Nomothetic questionnaires should be complemented
with patient generated measures (PSYCHLOPS/PQ) We need to stop hiding the complexity of our therapies! … but we need a paradigm shift if we’re to manage the
organisational anxieties that provokes … and we need money and time to explore complexity … and we won’t get money/time without a paradigm
shift that answers questions
Thanks!