utilizing predictive modeling to improve policy through
TRANSCRIPT
UtilizingPredictiveModelingtoImprovePolicythroughImprovedTargetingofAgencyResources:
ACaseStudyonPlacementInstabilityamongFosterChildren
DallasJ.Elgin,Ph.D.IMPAQInternationalRandiWalters,Ph.D.CaseyFamilyPrograms
2016APPAMFallResearchConference
ImageCredit:TheStrengths Initiative
• Challenge:Governmentagenciesoperateinanenvironmentthatincreasinglyrequiresusinglimitedresourcestomeetnearlylimitlessdemands.
• Opportunity:Advancesincomputingtechnology&administrativedatacanbeleveragedviapredictivemodelingtopredictthelikelihoodoffutureevents
• Goal:Toprovideanimprovedunderstandingofthemethodology&identifyassociatedbestpractices
The Utility of Predictive Modeling for Government Agencies
• Processofselectingamodelthatbestpredictstheprobabilityofanoutcome(Geisser,1993),orgeneratinganaccurateprediction(Kuhn&Johnson,2013).
• Overthepastseveraldecades,predictivemodelinghasbeenutilizedinavarietyoffieldstopredictdiverseoutcomes
• Withinchildwelfare,predictivemodelshavebeenusedtoinformdecision-making:– Riskassessmentinstruments– Maltreatmentrecurrence,futureinvolvement,childfatalities
What is Predictive Modeling?
• Data: – 2013 Adoption and Foster Care Analysis and Reporting System
(AFCARS)• Publicly-available dataset resembling administrative data
– Sample: 15,000 foster care children that were in care throughout 2013
• Operationalization: 3 or more moves, or a total of 4 placements (Hartnett, Falconnier, Leathers & Testa, 1999; Webster, Barth & Needell, 2000) – 11,649 children with 3 or fewer placements– 3,351 children with 4 or more placements
Case: Placement Instability
Methodological Approach: Data Partition Strategy
• The entire dataset of 15,000 children was split into 2 groups:– A training set used to train the models (75% of dataset= 11,250 children)– A test set used to validate the models (25% of dataset= 3,750 children)
Methodological Approach: Data Training Strategy• Train a collection of 10 models using the training set
• Utilize ROC Curves to evaluate how well the models calculate:1. The true-positive rate (sensitivity)2. The false-positive rate (specificity)
ModelType Model Interpretability ComputationTime
LinearDiscriminantAnalysisLogisticRegression High LowPartialLeastSquaresDiscriminantAnalysis High LowElasticNet/Lasso High Low
Non-LinearClassificationModelsK-NearestNeighbors Low HighNeuralNetworks Low HighSupportVectorMachines Low HighMultivariateAdaptiveRegressionSplines Moderate Moderate
Classification Trees&Rule-BasedModels
Classification Tree High HighBoostedTrees Low HighRandomForest Low High
Model Performance on the Test Set• 3 models with highest ROC scores were applied to the test set
(3,750 observations)
• Overall Accuracy= 87.8% - 87.9%– Less than 3 Moves= 90.1% - 90.2%– 4 or More Moves= 77.4% - 77.8%
NeuralNetworkModel4orMoreMoves Lessthan3Moves
4orMoreMoves 535 153Lessthan3Moves 302 2,759
RandomForestModel4orMoreMoves Lessthan3Moves
4orMoreMoves 537 157Lessthan3Moves 300 2,755
BoostedTreeModel4orMoreMoves Lessthan3Moves
4orMoreMoves 540 158Lessthan3Moves 297 2,754
Improving Model Accuracy• Iterative process involving: transforming variables, ‘fine-tuning’
model parameters, or combination of both• ‘Fine-Tuning’ parameters of the neural network model• Improved Overall Accuracy= 88.2%
Un-tunedNeuralNetworkModel4orMoreMoves Lessthan3Moves
4orMoreMoves 535 153Lessthan3Moves 302 2,759
TunedNeuralNetworkModel4orMoreMoves Lessthan3Moves
4orMoreMoves 569 176Lessthan3Moves 268 2,736
Improving Model Accuracy: Cost-Sensitive Tuning
• Considerable improvements in reducing false negatives, but at expense of notable increases in the number of false positives.
Classification TreewithNoCostPenalty4orMoreMoves Lessthan3Moves Sensitivity Specificity
4orMoreMoves 515 181 0.615 0.938Lessthan3Moves 322 2,731
Classification TreewithCostPenaltyof24orMoreMoves Lessthan3Moves Sensitivity Specificity
4orMoreMoves 620 354 0.741 0.878Lessthan3Moves 217 2,558
Classification TreewithCostPenaltyof54orMoreMoves Lessthan3Moves Sensitivity Specificity
4orMoreMoves 756 758 0.903 0.740Lessthan3Moves 81 2,154
Classification TreewithCostPenaltyof104orMoreMoves Lessthan3Moves Sensitivity Specificity
4orMoreMoves 790 970 0.944 0.667Lessthan3Moves 47 1,942
Classification TreewithCostPenaltyof204orMoreMoves Lessthan3Moves Sensitivity Specificity
4orMoreMoves 803 1,161 0.959 0.601Lessthan3Moves 34 1,751
1. Predictive Models Can Improve Upon, but Not Replace, Traditional Decision-Making Processes within Government Agencies.
2. Government Agencies Should Clearly Articulate the Methodological Approach and the Predictive Accuracy of their Models.
3. Consider Opportunities for Incorporating Community Engagement into the Predictive Modeling Process.
Best Practices for Designing & Implementing Predictive Models
~Questions & Feedback~
DallasElgin,[email protected]
RandiWalters,SeniorDirectorofKnowledgeManagementCaseyFamilyPrograms
• What is it? Occurs when a child in the care of a child welfare system experiences multiple moves to different settings
• Why does it matter?Placement instability can have significant consequences on children:– Greater risk for impaired development & psychosocial well-being– Greater uncertainty surrounding a child’s future– Greater likelihood of re-entry and/or emancipation
• Is it a big issue?25% of foster care children experience three or moves while in care (Doyle, 2007)
Placement Instability
Improving Model Accuracy: Cost-Sensitive Tuning
• False-negative predictions may be unacceptable as a failure to correctly identify placement instability could result in unnecessary exposure to adverse events
• Cost-sensitive models impose cost penalties to minimize the likelihood of false predictions
Data• 2013AdoptionandFosterCareAnalysisandReportingSystem
(AFCARS):Federaldataprovidedbythestatesonallchildreninfostercare
• Sample:15,000fostercarechildrenthatwereincarethroughout2013 77.66%ofChildrenin
theSamplehave3orfewermoves
22.34%ofChildrenintheSamplehave4ormoremoves
3 Highest Performing Models on the Training Set
• BoostedTrees:buildupontraditionalclassificationtreemodels– Fitaseriesofindependentdecisiontreesandthenaggregatethetreesto
formasinglepredictivemodel
• RandomForests:buildupontraditionalclassificationtreemodelsbyutilizingbootstrappingmethodstobuildacollectionofdecisiontrees– Considerationofsmallersubsetofpredictorsminimizesthelikelihoodofa
highdegreeofcorrelationamongmultipletrees
• NeuralNetworks:resemblethephysiologicalstructureofthehumanbrainornervoussystem– Usemultiple layers(oralgorithms)forprocessingpiecesofinformation
Linear Discriminant Analysis Models• Utilizelinearfunctionstocategorizeobservationsintogroupsbased
onpredictorcharacteristics• Examples:logisticregressions,partialleastsquaresdiscriminant
analysis,andElasticNet/Lassomodels• Thesemodelscommonlyhave:
– Highdegreeofinterpretability– Lowamountofcomputational time
Non-Linear Classification Models• Utilizenon-linearfunctionstocategorizeobservations
• Examples:k-nearestneighbors,neuralnetworks,supportvectormachines,andmultivariateadaptiveregressionsplines
• Thesemodelscommonlyhave:– Lowtomoderateinterpretability– Moderatetohighcomputational time
Classification Trees and Rule-Based Models• Utilizerulestopartitionobservationsintosmallerhomogenous
groups
• Examples:classificationtrees,boostedtrees,andrandomforests
• Thesemodelscommonlyhave:– Lowtohighinterpretability– Highdegreeofcomputational time
Model Performance on the Test Set
• 3modelswiththehighestROCvalueswereappliedtothetestsetof3,750children
Identifying Prominent Predictors
• Thecaretpackage’svariableimportancefeatureprovidesoneoptionforcharacterizingthegeneraleffectsofpredictorswithpredictivemodels
• Thefeaturewasranontheneuralnetwork,randomforest,andboostedtreemodelstoidentifythemostimportantvariables
VariableNameNeuralNetwork
RankingRandomForest
RankingBoostedTrees
RankingAverageRanking
DateofLatestRemoval 3 1 1 1.7BeginningDateforCurrentPlacementSetting 2 3 4 3.0DateofFirstRemoval 1 5 5 3.7Child'sDateofBirth 4 6 6 5.3EmotionallyDisturbedDiagnosis 8 11 8 9.0DischargeDateofChild's PreviousRemoval 5 12 11 9.3CurrentlyPlacedinNon-RelativeFosterHome 9 14 13 12.0CurrentlyPlacedinanInstitution 6 20 10 12.0NumberofDaysinCurrentPlacementSetting 36 4 3 14.3Female Child 16 17 18 17.0