predictive modeling spring 2005 camar meeting louise francis, fcas, maaa francis analytics and...
TRANSCRIPT
![Page 1: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/1.jpg)
Predictive ModelingSpring 2005 CAMAR meeting
Louise Francis, FCAS, MAAAFrancis Analytics and Actuarial Data Mining, Inc
www.data-mines.com
![Page 2: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/2.jpg)
2
Objectives
Introduce Predictive modeling Why use it? Describe some methods in depth
Trees Neural networks Clustering
Apply to fraud data
![Page 3: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/3.jpg)
3
Predictive Modeling Family
Predictive Modeling
Classical Linear Models GLMs Data Mining
![Page 4: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/4.jpg)
4
Why Predictive Modeling?
Better use of insurance data
Advanced methods for dealing with messy data now available
![Page 5: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/5.jpg)
5
Major Kinds of Modeling
Supervised learning Most common situation A dependent variable
Frequency Loss ratio Fraud/no fraud
Some methods Regression CART Some neural networks
Unsupervised learning No dependent variable Group like records
together A group of claims with
similar characteristics might be more likely to be fraudulent
Some methods Association rules K-means clustering Kohonen neural
networks
![Page 6: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/6.jpg)
6
Two Big Specialties in Predicative Modeling
GLMS
Regression Logistic Regressions Poisson Regression
0 125 250 375 500
90% 5% .5944 328.9876
Mean=100047.5
Distribution for Severity/B10
Val
ues
in 1
0^ -5
Values in Thousands
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
Mean=100047.5
0 125 250 375 500 -1 0 1 2 3 4 5 6 7
5% 90% 5% 0 3
Mean=1.001
Distribution for Claims/B3
0.000
0.200
0.400
0.600
0.800
1.000
Mean=1.001
-1 0 1 2 3 4 5 6 7
Data Mining
Trees Neural Networks Clustering
![Page 7: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/7.jpg)
7
Modeling ProcessInternal
Data
Data Cleaning
External Data
Other Preprocessing
Build Model Validate Model Test Model
Deploy Deploy ModelModel
![Page 8: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/8.jpg)
8
Data Complexities Affecting Insurance Data Nonlinear functions Interactions Missing Data Correlations Non normal data
![Page 9: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/9.jpg)
9
Kinds of Applications
Classification Prediction
![Page 10: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/10.jpg)
10
The Fraud Study Data
• 1993 Automobile Insurers Bureau closed Personal Injury Protection claims
• Dependent Variables• Suspicion Score
• Number from 0 to 10• Expert assessment of liklihood of fraud or abuse
• 5 categories• Used to create a binary indicator
• Predictor Variables• Red flag indicators • Claim file variables
![Page 11: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/11.jpg)
11
Introduction of Two Methods
Trees Sometimes known as CART (Classification and
Regression Trees) Neural Networks
Will introduce backpropagation neural network
![Page 12: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/12.jpg)
12
Decision TreesDecision Trees
Recursively partitions the data Often sequentially bifurcates the data – but can
split into more groups Applies goodness of fit to select best partition at
each step Selects the partition which results in largest
improvement to goodness of fit statistic
![Page 13: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/13.jpg)
13
Goodness of Fit StatisticsGoodness of Fit Statistics Chi Square CHAID (Fish, Gallagher, Monroe- Discussion
Paper Program, 1990)
Deviance CART
2j
cases j
2 log( ) (categorical)
D= (y ) (or RSS for continuous variables)
i ik ikk
j
D n p
2
,
Observed-Expected2
Expectedi k
![Page 14: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/14.jpg)
14
Goodness of Fit StatisticsGoodness of Fit Statistics
Gini Measure CART i is impurity measure
21 kk
i p
( , ) ( ) ( ) ( )L L R Rt s i t p i t p i t
![Page 15: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/15.jpg)
15
Goodness of Fit StatisticsGoodness of Fit Statistics
Entropy C4.5
2 2( ) log ( ) log ( )EE
I E pN
2log ( )k kk
H p p
![Page 16: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/16.jpg)
16
An Illustration from Fraud data: GINI Measure
Fraud/No FraudLegal Representation No Yes Total
No 626 80 706Yes 269 425 694
Total 895 505 1400Percent 64% 36%
![Page 17: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/17.jpg)
17
First SplitFirst Split
All Claims p(fraud) = 0.36
Legal Rep = Yes
P(fraud) = 0 .612
Legal Rep = No
P(fraud) = 0.113
![Page 18: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/18.jpg)
18
Example cont:
Root Node: 0.461199
Fraud/No FraudNo Yes 1-p(i)^2 Row %
No 0.887 0.113 0.201 50.4%Yes 0.388 0.612 0.475 49.6%
33.7%
Legal
0.337 ,201*.504 .475*.496
.461 .337 0.124improvement
![Page 19: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/19.jpg)
19
Example of Nonlinear FunctionSuspicion Score vs. 1st Provider Bill
1000 3000 5000 7000
Provider Bill
0.00
1.00
2.00
3.00
4.00ne
tfra
ud1
Neural Network Fit of SUSPICION vs Provider Bill
![Page 20: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/20.jpg)
20
|mp1.bill<1279.5
mp1.bill<153
mp1.bill<842.5
mp1.bill<2389
0.3387
1.2850 2.2550
3.6430 4.4270
An Approach to Nonlinear Functions: Fit A Tree
![Page 21: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/21.jpg)
21Provider Bill
Fra
ud
Sco
re P
red
ictio
n
0 5000 10000 15000
12
34
Fitted Curve From Tree
![Page 22: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/22.jpg)
22
Neural NetworksNeural Networks
Developed by artificial intelligence experts – but now used by statisticians also
Based on how neurons function in brain
![Page 23: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/23.jpg)
23
Neural NetworksNeural Networks
• Fit by minimizing squared deviation between fitted and actual values
• Can be viewed as a non-parametric, non-linear regression
• Often thought of as a “black box”• Due to complexity of fitted model it is difficult
to understand relationship between dependent and predictor variables
![Page 24: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/24.jpg)
24
The Backpropagation Neural Network
Three Layer Neural Network
Input Layer Hidden Layer Output Layer(Input Data) (Process Data) (Predicted Value)
![Page 25: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/25.jpg)
25
Neural Network
Fits a nonlinear function at each node of each layer
0 1 10, 1 0 1 1 ( ... )
1( ; ... ) ( ... )
1 n nn n n w w x w xh f X w w w f w w x w xe
![Page 26: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/26.jpg)
26
The Logistic Function
-1.2 -0.7 -0.2 0.3 0.8
X
0.0
0.2
0.4
0.6
0.8
1.0
Logistic Function for Various Values of w1
w1=-10w1=-5w1=-1w1=1w1=5w1=10
![Page 27: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/27.jpg)
27
Universal Function Approximator• The backpropagation neural network with one
hidden layer is a universal function approximator
• Theoretically, with a sufficient number of nodes in the hidden layer, any continuous nonlinear function can be approximated
![Page 28: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/28.jpg)
28
Nonlinear Function Fit by Neural Network
1000 3000 5000 7000
Provider Bill
0.00
1.00
2.00
3.00
4.00
ne
tfra
ud1
Neural Network Fit of SUSPICION vs Provider Bill
![Page 29: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/29.jpg)
29
Interactions
Functional relationship between a predictor variable and a dependent variable depends on the value of another variable(s)
3000 8000 13000 18000
Provider Bill
0.00
2.00
4.00
6.00
0.00
2.00
4.00
6.00
0.00
2.00
4.00
6.00
Ne
ura
l N
et
Pre
dic
ted
inj.type: 01 inj.type: 02
inj.type: 03 inj.type: 04
inj.type: 05
Neural Network Predicted for Provider Bill and Injury Type
![Page 30: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/30.jpg)
30
Interactions
Neural Networks The hidden nodes pay a key role in
modeling the interactions CART partitions the data
Partitions capture the interactions
![Page 31: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/31.jpg)
31
|mp1.bill<1279.5
mp1.bill<153
injtype:abcefghi
injtype:abfgi
injtype:abcefgh
injtype:abcefgi
mp1.bill<2675.5
mp1.bill<2017.5
0.14 0.30
0.68 1.00 2.10
3.20
3.70 4.20
4.80
Simple Tree of Injury and Provider Bill
![Page 32: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/32.jpg)
324000 10000 16000 4000 10000 16000
4000 10000 16000
mp1.bill
2
5
2
5
2
5
resp
onse
injtype: 1 injtype: 2 injtype: 4
injtype: 5 injtype: 6 injtype: 7
injtype: 8 injtype: 10 injtype: 99
![Page 33: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/33.jpg)
33
Missing Data
Occurs frequently in insurance data There are some sophisticated methods for
addressing this (i.e., EM algorithm) CART finds surrogates for variables with missing
values Neural Networks have no explicit procedure for
missing values
![Page 34: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/34.jpg)
34
More Complex Example
Dependent variable: Expert’s assessment of liklihood claim is legitimate A classification application
Predictor variables: Combination of claim file variables (age of claimant, legal representation) red flag variables (injury is strain/sprain only, claimant
has history of previous claim) Used an enhancement on CART known as boosting
![Page 35: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/35.jpg)
35
Red Flag Predictor VariablesRed Flag Variables
Subject Indicator Variable Description Accident ACC01 No report by police officer at scene ACC04 Single vehicle accident ACC09 No plausible explanation for accident ACC10 Claimant in old, low valued vehicle ACC11 Rental vehicle involved in accident ACC14 Property Damage was inconsistent with accident ACC15 Very minor impact collision ACC16 Claimant vehicle stopped short ACC19 Insured felt set up, denied fault Claimant CLT02 Had a history of previous claims CLT04 Was an out of state accident CLT07 Was one of three or more claimants in vehicle Injury INJ01 Injury consisted of strain or sprain only INJ02 No objective evidence of injury INJ03 Police report showed no injury or pain INJ05 No emergency treatment was given INJ06 Non-emergency treatment was delayed INJ11 Unusual injury for auto accident Insured INS01 Had history of previous claims INS03 Readily accepted fault for accident INS06 Was difficult to contact/uncooperative INS07 Accident occurred soon after effective date Lost Wages LW01 Claimant worked for self or a family member LW03 Claimant recently started employment
![Page 36: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/36.jpg)
36
Claim File Variables
Variable Description
AGE Age of claimant
RPTLAG Lag from date of accident to date reported
TREATLAGLag from date of accident to earliest treatment by service provider
AMBUL Ambulance charges
PARTDIS The claimant partially disabled
TOTDIS The claimant totally disabled
LEGALREP The claimant represented by an attorney
Claim Variables Available Early in Life of Claim
![Page 37: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/37.jpg)
37
Neural Network Measure of Variable Importance
• Look at weights to hidden layer
• Compute sensitivities:• a measure of how much the predicted value’s
error increases when the variables are excluded from the model one at a time
![Page 38: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/38.jpg)
38
Variable Importance
Rank Rank Variable Importance
1 LEGALREP 100.0 ||||||||||||||||||||||||||||||||||||||||||2 TRTLAG 69.7 |||||||||||||||||||||||||||||3 AGE 54.5 ||||||||||||||||||||||4 ACC04 44.4 ||||||||||||||||||5 INJ 01 42.1 |||||||||||||||||6 INJ 02 39.4 ||||||||||||||||7 ACC14 35.8 ||||||||||||||8 RPTLAG 32.4 |||||||||||||9 AMBUL 29.3 ||||||||||||
10 CLT02 23.9 |||||||||
![Page 39: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/39.jpg)
39
Testing: Hold Out Part of Sample
• Fit model on 1/2 to 2/3 of data
• Test fit of model on remaining data
• Need a large sample
![Page 40: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/40.jpg)
40
Testing: Cross-Validation
• Hold out 1/n (say 1/10) of data• Fit model to remaining data• Test on portion of sample held out• Do this n (say 10) times and average the
results• Used for moderate sample sizes• Jacknifing similar to cross-validation
![Page 41: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/41.jpg)
41
Results of Classification on Test Data
Fitted TreeActual 0 1
0 77.3% 22.7%1 14.3% 85.7%
Fitted Neural NetworkActual 0 1
0 81.5% 18.5%1 26.7% 73.3%
![Page 42: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/42.jpg)
42
Unsupervised LearningUnsupervised Learning
Common Method: Clustering No dependent variable – records are grouped
into classes with similar values on the variable
Start with a measure of similarity or dissimilarity
Maximize dissimilarity between members of different clusters
![Page 43: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/43.jpg)
43
Dissimilarity (Distance) Dissimilarity (Distance) MeasureMeasure Euclidian Distance
Manhattan Distance
1/ 22
1( ) i, j = records k=variable
mij ik jkkd x x
1
mij ik jkkd x x
![Page 44: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/44.jpg)
44
Binary Variables
Row Variable1 0
1 a b a+b0 c d c+d
a+c b+dCo
lum
n
Var
iab
le
![Page 45: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/45.jpg)
45
Binary Variables
Sample Matching
Rogers and Tanimoto
b cd
a b c d
2( )( ) 2( )
b cd
a d b c
![Page 46: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/46.jpg)
46
Results for 2 Clusters
Cluster Lawyer Back Claim Or Sprain Chiro or PT Prior Claim1 77% 73% 56% 26%2 3% 29% 14% 1%
AverageSuspicious Suspicion
Cluster Claim Score
1 56% 2.992 3% 0.21
![Page 47: Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc](https://reader036.vdocuments.site/reader036/viewer/2022062410/5697bf7a1a28abf838c832ec/html5/thumbnails/47.jpg)
47
Beginners Library
Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997
Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990
Smith, Murry, Neural Networks for Statistical Modeling, International Thompson Computer Press, 1996