risk evaluation: maximizing risk accuracy presentation to special commission to reduce the...

Risk Evaluation:Maximizing Risk Accuracy

Presentation to

Special Commission to Reduce the Recidivism of Sex Offenders

10/8/2014

Overview of Presentation

• Brief history of risk assessment and the different kinds of assessment that have been developed;

• Indication of where MA SORB Classification fits• in this historical context, and • in the context of current state strategies;

• Summary of the criteria for how one should evaluate risk instruments;

• Quick overview of the recent empirical evaluations of risk instruments;

• Suggestions of two strategies for improving classification in MA.

BRIEF HISTORY OF RISK ASSESSMENT

Brief History

• First generation – Unstructured clinical judgment, including structured clinical guidelines (SCG).

• Second generation – Actuarial risk scales comprising static, historical factors.

• Third generation – the assessment of “criminogenic needs” or dynamic risk factors.

Bonta, 1996

Fixed or historical factors that cannot be changed

(such as age at first offense)

Potentially changeable factors, both stable, but potentially

changeable risk traits, and acute, rapidly changing factors.

Brief History

• Characteristics of Unstructured Clinical Judgments –• No items specified for considering risk level;• Method for combining items is not specified.

(Hanson & Morton-Bourgon, 2009)

First Generation

Brief History

• Characteristics of SCGs–• They identify items to use in the decision and

typically provide numerical values for each item;

• Although they also usually provide a method for combining the items into a total score, they do not specify a priori how the clinician should integrate the items;

• No tables linking the summary scores to recidivism rates.


First Generation

Brief History

• Requirements of Empirical Actuarials –• Provide specific items to make the decision with

quantitative anchors, which are derived from empirical investigation;

• Method for combining the items into an overall score is specified;

• Tables linking the summary scores to recidivism rates are provided.


Second Generation

Brief History

• Requirements of Mechanical Actuarials –• They provide specific items for the decision

with numeric values for each item, which are derived from a review of literature and theory;

• Method for combining the items into an overall score is specified;

• Tables linking the summary scores to recidivism rates are not provided.


Second Generation

Brief History

• Additional condition Adjusted Actuarials –• Use appropriate actuarials (empirical or

mechanical);• The clinician adjusts the score (and the

recommendation) using factors external to the actuarial.


Second Generation

MA SORB CLASSIFICATION FACTORSWhere Does It Fit?

MA SORB Classification Factors

• Somewhere between an unstructured judgment and an SCG –

Where Does It Fit?

Predictive Validity

AWAcrime

ClinJudgmt

SCG Emp.Actuarial

Em. Act.+ Dyn.

MASORB


• Somewhere between an unstructured judgment and an SCG –• It specifies a set of factors to be considered; but• It does not provide any quantification of these

factors (i.e., numeric item scores).• In many items it does not provide clear

specification of where the cutoff for “presence” or “absence” of a factor would be.

• Thus, it provides limited guidance both on the presence of items and on the combining of items.

Why Does It Fit Here?

Item 3. PsychopathyCode this by reference to the PCL-R. Code PCL-R scores of 30 or above as “Y,” scores of 21-29 as “?,” and scores of 20 or lower as “N.”

Y = 2

? = 1

N = 0

Example of SCGMA SORB Classification Factors

SVR-20

Item 2. Repetitive and Compulsive Behavior

Example of SORB Factors

?charges, convictions, self-report?

?includes both impulsive and compulsive behavior?


Could be either

NoScore

VagueCriteria

&No

Cutoff

• So the MA SORB criteria neither—• provide a metric for each item, so it is not known which

items an expert is depending on and no item improvement can be attempted, nor

• specify the cutoff criteria necessary for items to be judged present or absent by two raters, so no determination of agreement or reliability can be ascertained.

• Moreover, there are no rules on how to combine or weight items in reaching a decision.



• Relative to other states?

Where Does It Fit?

Identified “Tiering”

39%

61%

Tiering

No Tiering

Tiering

De Facto “Tiering”

2%14%

84%

Tiering

No Tiering

One Level

> One Level

2% 17%

62%

6%

6% 6%

Criteria

No Tiering

Unspecified

Crime

SCG

State Actuarial

Standard Actuarial

Criteria for De Facto “Tiering”

6%

State Actuarial

Criteria for De Facto “Tiering”

6%

State Actuarial

MN Leveling CriteriaActuarial Leveling Criteria

Clinical Judgment Trumps

6%

Hx of gratuitous violenceUnsuccessful treatmentPredatory offense behaviorSupervision failures

HOW DO WE EVALUATERISK TOOLS?Evaluating Reliability and Validity

ReliabilityReliability

HOW DO WE EVALUATERISK TOOLS?

Reliability

• Accuracy• Freedom from variable error• Consistency

• Across raters• Across items• Across different measures of the same construct• Across time

Reliability is --

Reliability

• Interrater

Interrater Reliability

R 1 R 2

AgreementHigh ReliabilityLow ReliabilityDisagreement

Reliability

• Interrater• Internal Consistency

Internal Consistency

Agreement or Correlation Among Items = High Reliability

• Allows one to calculate various forms of reliability –• Item reliability• Reliability of subscales (e.g., sexual deviance,

criminality, etc.)• Internal consistency of items in the instrument

• Thus, quantification allows us to restructure items and their anchors to improve reliability.

Advantages of QuantificationAllows Reliability Checks

Gives us thePower of Being on the Same Page

https://www.google.com/imgres?imgurl=http://3.bp.blogspot.com/--bD2nLNJd78/T263SXLCXWI/AAAAAAAAAGU/H76z_GyYoNA/s1600/on+the+same+page.jpg&imgrefurl=http://judys-ukrainian-journal.blogspot.com/2012/03/same-page-meeting-yesterday-my-husband.html&docid=ii3A0mhN61DI4M&tbnid=Z0MoXA4p4vyiyM:&w=450&h=365&ei=U6cgVKXiAqXB7Aa07YHwCg&ved=0CAIQxiAwAA&iact=c

• Most popular SCGs and actuarials assessed in the comparative literature have acceptable reliability.

• Unstructured judgments have poor reliability.

• The reliability of MA SORB Classification Factors have not and can not be assessed.

SCGs and ActuarialsReliability Results

HOW DO WE EVALUATERISK TOOLS?Evaluating Reliability and Validity

Validity

HOW DO WE EVALUATERISK TOOLS?

Validity

Validity

Validity Answers the Question

• Does a test measure what it is suppose to measure?

• What does a test measure?• What can one do with the test? • What does a test score predict?

Predicting Sexual Recidivism

Instrument Type d (95% CI)Empirical Actuarial .67 (.63 - .72)Mechanical Actuarial .66 (.58 - .74)SCG .46 (.29 - .62)Unstructured Judgmt .42 (.32 - .51)


• Overall, controlling for a large number of study variables, Empirical and Mechanical were significantly better predictors of recidivism;

• SCGs using clinical judgment and SCGs that calculate total scores did not differ.

• In all studies examined, clinicians’ adjustment of actuarial scores consistently lowered predictive accuracy.

Predicting Sexual Recidivism


• Across multiple areas of prediction, mechanical actuarial prediction (statistical prediction rules [SPRs]) has been shown to be superior to clinical judgment.

• A recent meta-analysis summarizes the results of years of research (Grove et al., 2000).

Why Is Clinical Judgment Inferior?

• All studies published in English from 1920s to mid 1990s.

• 136 studies on the prediction of health-related phenomena or human behavior.

(Grove et al., 2000)

(Grove et al., 2000)

47%

47%

6%

Accuracy

SPR>Clinical

SPR=Clinical

Clinical>SPR

• A large body of research has documented the reasons for the cognitive errors that clinicians make.

• For instance, clinicians are great at making observations and rating items, but they, like all humans, are worse than a formula at adding the items together and combining them.

Why Is Clinical Judgment Inferior?

• Allows one to use various strategies for improving validity of a measure–• Assess item correlation with outcome;• Adjust item cutoffs to maximize prediction;• Assess the validity of subscales (e.g., sexual deviance,

criminality, etc.);• Optimize item weights for decision-making and

predicting.

• Thus, one can restructure items, their anchors, cutoffs, and combinations to improve validity.

Advantages of QuantificationAllows Validity Checks

STRATEGIES FOR IMPROVING MA SORB CLASSIFICATION

Examples from Two States

New Jersey

Oregon

New Jersey

New Jersey: State Generated Actuarial

RRASItems

Scoring: Highest possible total score = 111 Low Range: 0 – 36 Moderate Range: 37 – 73 High Range: 74 - 111

• Focuses on the current empirical literature to generate items and a scale.

• Each item is quantified and anchored cutoffs are provided.

• Method of combining items to generate a score is specified.

• Levels are tied to specific scores.


Advantages

• Reliability is an iterative process that takes time to develop.

• Baserates of scores not initially available.• No follow-up data are available.• No reoffense probabilities available until

prospective study completed.


Disadvantages

485 year 10 year0

2

4

6

8

10

12

14

16

18

20

low riskhigh risk

Re-offense Rates by State Risk Levels

MN & NJ: 3 Level System

FL & SC: Offender / Predator

(c2(1) = 3.37, p = .066)(AUCs = .493 - .569, ns)

(Zgoba et al., 2014)

STRATEGIES FOR IMPROVING MA SORB CLASSIFICATION

Examples from Two States

New Jersey

Oregon

Oregon

Oregon

Oregon: Standard Actuarial

Oregon: Standard ActuarialThe Static-99R is the chosen risk assessment scale for Oregon, with the following level cutoffsrecommended:

Level I: Score -3 to 3 (Low)Level II: Score 4 to 5 (Moderate)Level III: Score of 6+

Override and downward departure factors are taken into consideration:

• Aggravating factors that result in override to a higher level:1. Deviant Sexual Preference (by STABLE-2007 definition);2. Emotional Identification with Children (STABLE-2007 definition);3. High level of psychopathic traits as identified by validated assessment4. Individual articulates to officials/treatment professional an unwillingness to control future sexually assaultive behaviors and/or plans to reoffend violently or sexually.

• Mitigating factors that result in downward departure to lower level:1. Debilitating illness and/or permanent incapacitation2. 10+ years clean record within the community

• Assessments for aggravating and mitigating factors must be completed by a trained professional.

53

Static 99RItems

• Focuses on the current empirical literature to generate items and a scale.

• Each item is quantified and anchored cutoffs are provided.

• Method of combining items to generate a score are specified.

• Levels are tied to specific scores.


Advantages

• Extensive follow-up data have been already been gathered.

• There are existing estimates of the probabilities of recidivism for score levels.


Advantages

• Actuarial not made specifically for the local state environment.

• Tied to standardized instrument that you are less likely to assess for continuous improvement.

Disadvantages


APPLYING THE TWO STRATEGIES TO THE MA SORB CRITERIA

General Issues

• Creation of separate adult and juvenile actuarials;

• Creation of separate male and female actuarials;

• Dealing with the issues of Mental Illness and Intellectual Disabilities.

Improving the Current MA SORB Criteria

Strategy 1: NJ Solution

Fix the Current MA SORB Criteriafor Adult Males

• Divide instrument into static and dynamic item subsets;

• Use recent meta-analytic literature to purge items that are not likely to have predictive validity;

Examples of Poor Predictors

• Released from civil commitment vs. not committed (Knight & Thornton, 2007)

• Maximum term of incarceration;• Current home situation (?vague and

unspecified?);• Physical condition;• Documentation from a licensed mental health

professional specifically indicating that offender poses no risk to reoffend;

• Recent Threats;• Supplemental material;• Victim impact statement.

Examples of Poor Predictors



• Divide instrument into static and dynamic item subsets;

• Use recent meta-analytic literature to purge items that are not likely to have predictive validity;

• Transform remaining items into a quantifiable format with clear cutoffs;

• Do a small study on a subset of offenders to establish reliability.

? Add items to capture predictive domains not adequately sampled?



• Adjust items with the reliability data;• Do a preliminary check on the predictive validity of

revised items using existing data bases;• Revise items as a function of predictive study and

establish preliminary leveling cutoffs;• Use the revised instrument, requiring item and total

scores of raters for future validation studies.



• Follow all offenders and prospectively assess the instrument’s predictive validity of recidivism;

• Continually adjust instrument to improve predictive accuracy.

Strategy 2: OR Solution

• Use the Static99R to determine leveling; • Any “aggravating” or “mitigating” criteria should

be operationally defined (e.g., STABLE 2007; PCL:R), and its adjustment contribution should be quantitatively specified.

• SORB has been doing Static99Rs for a while, so use the ones that they have done.

• Have a team of trained graduate student raters (cheap and accurate) do Static99Rs on remaining offenders.

ESTIMATING LEVEL 3 FREQUENCY

MTC Committed

56%

44%

Static 99

< Six

≥ Six

MTC Not Committed

77%

23%

Static 99

< Six

≥ Six

69

STATIC-99R Scores (n = 1312)

-3.00-2.00-1.00

.001.002.003.004.005.006.007.008.009.00

10.0011.00

0 50 100 150 200 250

21.2%

Zgoba et al., 2014

MA % RSO Level 3 (2010)

75%

25%

Level 3

RSO not Level 3

Level 3

As cited in Harris, Levenson, & Ackerman, 2012

• Moving forward use existing dynamic instruments to create profiles for treatment and management of offenders and for future adjustments.

Strategy 2: OR Solution

risk evaluation: maximizing risk accuracy presentation to special commission to reduce the...

Documents

generation slide

dynamic risk factors

risk evaluation

specific items

historical factors

risk level method

changeable factors

hanson mortonbourgon