risk evaluation: maximizing risk accuracy presentation to special commission to reduce the...
TRANSCRIPT
Risk Evaluation:Maximizing Risk Accuracy
Presentation to
Special Commission to Reduce the Recidivism of Sex Offenders
10/8/2014
Overview of Presentation
• Brief history of risk assessment and the different kinds of assessment that have been developed;
• Indication of where MA SORB Classification fits• in this historical context, and • in the context of current state strategies;
• Summary of the criteria for how one should evaluate risk instruments;
• Quick overview of the recent empirical evaluations of risk instruments;
• Suggestions of two strategies for improving classification in MA.
BRIEF HISTORY OF RISK ASSESSMENT
Brief History
• First generation – Unstructured clinical judgment, including structured clinical guidelines (SCG).
• Second generation – Actuarial risk scales comprising static, historical factors.
• Third generation – the assessment of “criminogenic needs” or dynamic risk factors.
Bonta, 1996
Fixed or historical factors that cannot be changed
(such as age at first offense)
Potentially changeable factors, both stable, but potentially
changeable risk traits, and acute, rapidly changing factors.
Brief History
• Characteristics of Unstructured Clinical Judgments –• No items specified for considering risk level;• Method for combining items is not specified.
(Hanson & Morton-Bourgon, 2009)
First Generation
Brief History
• Characteristics of SCGs–• They identify items to use in the decision and
typically provide numerical values for each item;
• Although they also usually provide a method for combining the items into a total score, they do not specify a priori how the clinician should integrate the items;
• No tables linking the summary scores to recidivism rates.
(Hanson & Morton-Bourgon, 2009)
First Generation
Brief History
• Requirements of Empirical Actuarials –• Provide specific items to make the decision with
quantitative anchors, which are derived from empirical investigation;
• Method for combining the items into an overall score is specified;
• Tables linking the summary scores to recidivism rates are provided.
(Hanson & Morton-Bourgon, 2009)
Second Generation
Brief History
• Requirements of Mechanical Actuarials –• They provide specific items for the decision
with numeric values for each item, which are derived from a review of literature and theory;
• Method for combining the items into an overall score is specified;
• Tables linking the summary scores to recidivism rates are not provided.
(Hanson & Morton-Bourgon, 2009)
Second Generation
Brief History
• Additional condition Adjusted Actuarials –• Use appropriate actuarials (empirical or
mechanical);• The clinician adjusts the score (and the
recommendation) using factors external to the actuarial.
(Hanson & Morton-Bourgon, 2009)
Second Generation
MA SORB CLASSIFICATION FACTORSWhere Does It Fit?
MA SORB Classification Factors
• Somewhere between an unstructured judgment and an SCG –
Where Does It Fit?
Predictive Validity
AWAcrime
ClinJudgmt
SCG Emp.Actuarial
Em. Act.+ Dyn.
MASORB
MA SORB Classification Factors
• Somewhere between an unstructured judgment and an SCG –• It specifies a set of factors to be considered; but• It does not provide any quantification of these
factors (i.e., numeric item scores).• In many items it does not provide clear
specification of where the cutoff for “presence” or “absence” of a factor would be.
• Thus, it provides limited guidance both on the presence of items and on the combining of items.
Why Does It Fit Here?
Item 3. PsychopathyCode this by reference to the PCL-R. Code PCL-R scores of 30 or above as “Y,” scores of 21-29 as “?,” and scores of 20 or lower as “N.”
Y = 2
? = 1
N = 0
Example of SCGMA SORB Classification Factors
SVR-20
Item 2. Repetitive and Compulsive Behavior
Example of SORB Factors
?charges, convictions, self-report?
?includes both impulsive and compulsive behavior?
MA SORB Classification Factors
Could be either
NoScore
VagueCriteria
&No
Cutoff
• So the MA SORB criteria neither—• provide a metric for each item, so it is not known which
items an expert is depending on and no item improvement can be attempted, nor
• specify the cutoff criteria necessary for items to be judged present or absent by two raters, so no determination of agreement or reliability can be ascertained.
• Moreover, there are no rules on how to combine or weight items in reaching a decision.
MA SORB Classification Factors
MA SORB Classification Factors
• Relative to other states?
Where Does It Fit?
Identified “Tiering”
39%
61%
Tiering
No Tiering
Tiering
De Facto “Tiering”
2%14%
84%
Tiering
No Tiering
One Level
> One Level
2% 17%
62%
6%
6% 6%
Criteria
No Tiering
Unspecified
Crime
SCG
State Actuarial
Standard Actuarial
Criteria for De Facto “Tiering”
6%
State Actuarial
Criteria for De Facto “Tiering”
6%
State Actuarial
MN Leveling CriteriaActuarial Leveling Criteria
Clinical Judgment Trumps
6%
Hx of gratuitous violenceUnsuccessful treatmentPredatory offense behaviorSupervision failures
HOW DO WE EVALUATERISK TOOLS?Evaluating Reliability and Validity
ReliabilityReliability
HOW DO WE EVALUATERISK TOOLS?
Reliability
• Accuracy• Freedom from variable error• Consistency
• Across raters• Across items• Across different measures of the same construct• Across time
Reliability is --
Reliability
• Interrater
Interrater Reliability
R 1 R 2
AgreementHigh ReliabilityLow ReliabilityDisagreement
Reliability
• Interrater• Internal Consistency
Internal Consistency
Agreement or Correlation Among Items = High Reliability
• Allows one to calculate various forms of reliability –• Item reliability• Reliability of subscales (e.g., sexual deviance,
criminality, etc.)• Internal consistency of items in the instrument
• Thus, quantification allows us to restructure items and their anchors to improve reliability.
Advantages of QuantificationAllows Reliability Checks
Gives us thePower of Being on the Same Page
• Most popular SCGs and actuarials assessed in the comparative literature have acceptable reliability.
• Unstructured judgments have poor reliability.
• The reliability of MA SORB Classification Factors have not and can not be assessed.
SCGs and ActuarialsReliability Results
HOW DO WE EVALUATERISK TOOLS?Evaluating Reliability and Validity
Validity
HOW DO WE EVALUATERISK TOOLS?
Validity
Validity
Validity Answers the Question
• Does a test measure what it is suppose to measure?
• What does a test measure?• What can one do with the test? • What does a test score predict?
Validity Answers the Question
• Does a test measure what it is suppose to measure?
• What does a test measure?• What can one do with the test? • What does a test score predict?
Predicting Sexual Recidivism
Instrument Type d (95% CI)Empirical Actuarial .67 (.63 - .72)Mechanical Actuarial .66 (.58 - .74)SCG .46 (.29 - .62)Unstructured Judgmt .42 (.32 - .51)
(Hanson & Morton-Bourgon, 2009)
• Overall, controlling for a large number of study variables, Empirical and Mechanical were significantly better predictors of recidivism;
• SCGs using clinical judgment and SCGs that calculate total scores did not differ.
• In all studies examined, clinicians’ adjustment of actuarial scores consistently lowered predictive accuracy.
Predicting Sexual Recidivism
(Hanson & Morton-Bourgon, 2009)
• Across multiple areas of prediction, mechanical actuarial prediction (statistical prediction rules [SPRs]) has been shown to be superior to clinical judgment.
• A recent meta-analysis summarizes the results of years of research (Grove et al., 2000).
Why Is Clinical Judgment Inferior?
• All studies published in English from 1920s to mid 1990s.
• 136 studies on the prediction of health-related phenomena or human behavior.
(Grove et al., 2000)
(Grove et al., 2000)
47%
47%
6%
Accuracy
SPR>Clinical
SPR=Clinical
Clinical>SPR
• A large body of research has documented the reasons for the cognitive errors that clinicians make.
• For instance, clinicians are great at making observations and rating items, but they, like all humans, are worse than a formula at adding the items together and combining them.
Why Is Clinical Judgment Inferior?
• Allows one to use various strategies for improving validity of a measure–• Assess item correlation with outcome;• Adjust item cutoffs to maximize prediction;• Assess the validity of subscales (e.g., sexual deviance,
criminality, etc.);• Optimize item weights for decision-making and
predicting.
• Thus, one can restructure items, their anchors, cutoffs, and combinations to improve validity.
Advantages of QuantificationAllows Validity Checks
STRATEGIES FOR IMPROVING MA SORB CLASSIFICATION
Examples from Two States
New Jersey
Oregon
New Jersey
New Jersey: State Generated Actuarial
RRASItems
Scoring: Highest possible total score = 111 Low Range: 0 – 36 Moderate Range: 37 – 73 High Range: 74 - 111
• Focuses on the current empirical literature to generate items and a scale.
• Each item is quantified and anchored cutoffs are provided.
• Method of combining items to generate a score is specified.
• Levels are tied to specific scores.
New Jersey: State Generated Actuarial
Advantages
• Reliability is an iterative process that takes time to develop.
• Baserates of scores not initially available.• No follow-up data are available.• No reoffense probabilities available until
prospective study completed.
New Jersey: State Generated Actuarial
Disadvantages
485 year 10 year0
2
4
6
8
10
12
14
16
18
20
low riskhigh risk
Re-offense Rates by State Risk Levels
MN & NJ: 3 Level System
FL & SC: Offender / Predator
(c2(1) = 3.37, p = .066)(AUCs = .493 - .569, ns)
(Zgoba et al., 2014)
STRATEGIES FOR IMPROVING MA SORB CLASSIFICATION
Examples from Two States
New Jersey
Oregon
Oregon
Oregon
Oregon: Standard Actuarial
Oregon: Standard ActuarialThe Static-99R is the chosen risk assessment scale for Oregon, with the following level cutoffsrecommended:
Level I: Score -3 to 3 (Low)Level II: Score 4 to 5 (Moderate)Level III: Score of 6+
Override and downward departure factors are taken into consideration:
• Aggravating factors that result in override to a higher level:1. Deviant Sexual Preference (by STABLE-2007 definition);2. Emotional Identification with Children (STABLE-2007 definition);3. High level of psychopathic traits as identified by validated assessment4. Individual articulates to officials/treatment professional an unwillingness to control future sexually assaultive behaviors and/or plans to reoffend violently or sexually.
• Mitigating factors that result in downward departure to lower level:1. Debilitating illness and/or permanent incapacitation2. 10+ years clean record within the community
• Assessments for aggravating and mitigating factors must be completed by a trained professional.
53
Static 99RItems
• Focuses on the current empirical literature to generate items and a scale.
• Each item is quantified and anchored cutoffs are provided.
• Method of combining items to generate a score are specified.
• Levels are tied to specific scores.
Oregon: Standard Actuarial
Advantages
• Extensive follow-up data have been already been gathered.
• There are existing estimates of the probabilities of recidivism for score levels.
Oregon: Standard Actuarial
Advantages
• Actuarial not made specifically for the local state environment.
• Tied to standardized instrument that you are less likely to assess for continuous improvement.
Disadvantages
Oregon: Standard Actuarial
APPLYING THE TWO STRATEGIES TO THE MA SORB CRITERIA
General Issues
• Creation of separate adult and juvenile actuarials;
• Creation of separate male and female actuarials;
• Dealing with the issues of Mental Illness and Intellectual Disabilities.
Improving the Current MA SORB Criteria
Strategy 1: NJ Solution
Fix the Current MA SORB Criteriafor Adult Males
• Divide instrument into static and dynamic item subsets;
• Use recent meta-analytic literature to purge items that are not likely to have predictive validity;
Examples of Poor Predictors
• Released from civil commitment vs. not committed (Knight & Thornton, 2007)
• Maximum term of incarceration;• Current home situation (?vague and
unspecified?);• Physical condition;• Documentation from a licensed mental health
professional specifically indicating that offender poses no risk to reoffend;
• Recent Threats;• Supplemental material;• Victim impact statement.
Examples of Poor Predictors
Strategy 1: NJ Solution
Fix the Current MA SORB Criteriafor Adult Males
• Divide instrument into static and dynamic item subsets;
• Use recent meta-analytic literature to purge items that are not likely to have predictive validity;
• Transform remaining items into a quantifiable format with clear cutoffs;
• Do a small study on a subset of offenders to establish reliability.
? Add items to capture predictive domains not adequately sampled?
Strategy 1: NJ Solution
Fix the Current MA SORB Criteriafor Adult Males
• Adjust items with the reliability data;• Do a preliminary check on the predictive validity of
revised items using existing data bases;• Revise items as a function of predictive study and
establish preliminary leveling cutoffs;• Use the revised instrument, requiring item and total
scores of raters for future validation studies.
Strategy 1: NJ Solution
Fix the Current MA SORB Criteriafor Adult Males
• Follow all offenders and prospectively assess the instrument’s predictive validity of recidivism;
• Continually adjust instrument to improve predictive accuracy.
Strategy 2: OR Solution
• Use the Static99R to determine leveling; • Any “aggravating” or “mitigating” criteria should
be operationally defined (e.g., STABLE 2007; PCL:R), and its adjustment contribution should be quantitatively specified.
• SORB has been doing Static99Rs for a while, so use the ones that they have done.
• Have a team of trained graduate student raters (cheap and accurate) do Static99Rs on remaining offenders.
ESTIMATING LEVEL 3 FREQUENCY
MTC Committed
56%
44%
Static 99
< Six
≥ Six
MTC Not Committed
77%
23%
Static 99
< Six
≥ Six
69
STATIC-99R Scores (n = 1312)
-3.00-2.00-1.00
.001.002.003.004.005.006.007.008.009.00
10.0011.00
0 50 100 150 200 250
21.2%
Zgoba et al., 2014
MA % RSO Level 3 (2010)
75%
25%
Level 3
RSO not Level 3
Level 3
As cited in Harris, Levenson, & Ackerman, 2012
• Moving forward use existing dynamic instruments to create profiles for treatment and management of offenders and for future adjustments.
Strategy 2: OR Solution