america cas predictive modeling seminar september 2005 presented by: rich moncher – bristol west...
TRANSCRIPT
America
CAS Predictive Modeling SeminarSeptember 2005Presented by: Rich Moncher – Bristol West
Tom Hettinger – EMB America
Vehicle RatemakingVehicles Need Class Too
2
Vehicle Ratemaking
OUTLINE
Background
Vehicle Estimator
Initial Estimator
Diagnostics
Tools
Vehicle Symbols
Symbol Relativities
Summary
PURPOSE: To discuss techniques for performing vehicle symbol analysis within the context of multivariate framework, including proper tools and diagnostics
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
3
4
5
6
Potential for Adverse Selection?
Insurer Groups Low High High/LowA 22 0.70 1.99 2.84B NA NA NA NAC 18 0.70 1.84 2.63D 18 0.72 1.60 2.22E 11 0.89 1.13 1.27F 18 0.71 1.38 1.94G NA NA NA NA
Symbol Factor RangesBodily Injury / Property Damage
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
7
Potential for Adverse Selection?
Insurer Groups Low High High/LowA 22 0.53 1.76 3.32B NA NA NA NAC 20 0.48 1.74 3.63D 20 0.53 1.56 2.94E 11 0.78 1.26 1.62F 20 0.56 1.49 2.66G NA NA NA NA
Symbol Factor RangesPersonal Injury / Medical Payments
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
8
Potential for Adverse Selection?
Insurer Groups Low High High/LowA 22 0.59 6.95 11.78B 17 0.54 6.13 11.35C 37 0.48 3.95 8.23D 21 0.53 2.26 4.26E 27 0.54 4.98 9.22F 21 0.52 2.22 4.27G 27 0.94 3.20 3.40
Symbol Factor RangesCollision
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
9
Potential for Adverse Selection?
Insurer Groups Low High High/LowA 22 0.60 6.87 11.48B 17 0.41 18.49 45.10C 37 0.44 6.12 13.91D 19 0.54 3.88 7.19E 27 0.29 5.54 19.10F 19 0.52 3.54 6.81G 27 0.85 7.09 8.34
Symbol Factor RangesComprehensive
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
10
Vehicle Classification/Relativity Analysis
Vehicle is critical as it is a major risk driver and accounts for much of the variation in rates
Two elements:
Symbols
Relativities (Both in terms of Model Year and Symbol)
Historically, focus on relativities and vehicle age BUT not symbols
Initial symbols based on limited data, competitors, bureaus, and judgment
Regular reviews of relativities
• Model Year
• Symbols
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
11
Why a Symbol Review?
Reduce your reliance on third parties.
Produce assignments you can understand and explain internally.
Remove potential bias due to inaccurate assignments.
Customize to meet the experience of your book.
Why does one company up-charge a Ford Taurus after its initial assignment and another down-charge it?
Difference in underlying books?
Difference in methodologies?
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
12
Why a Symbol Review?
Develop better initial assignments.
For example electronic stability control systems (ESCS):
“The safety agency credited much of the reduction in SUV rollover risk to the increasing availability of electronic stability control systems on SUVs.” – Wall Street Journal
“The systems are sometimes offered as standard equipment or, as an option, cost several hundred dollars.” – Wall Street Journal
Could two versions (one with and one without ESCS) of a new SUV that both fall into a original cost new symbol of 22 ($40K-$45K) be different risks?
If your experience showed ESCS enabled vehicles cost x% less to insure, how would you initially assign the symbol?
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
13
The Analysis
Use statistically credible techniques to develop the most appropriate symbol assignments and relativities.
Utilize GLM techniques.
Utilize Smoothing, Credibility Weighting, Clustering techniques.
Allow for User Interaction.
Get the most out of the company’s own data.
How is the company’s vehicle experience different from other company’s or rating agencies underlying databases.
How can known cars’ characteristics help you understand the loss potential where data is thin.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
14
The Analysis
?????• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
15
Issues with Classifying Vehicles
High-dimensionality
- Symbol analysis requires a large number of small vehicle units (VIN) as building blocks.
- VINs are the building blocks of vehicle rating and have little to no experience.
- Most companies only use two types of classifications for vehicles – model year and symbol.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
16
Issues with Classifying Vehicles
High correlation
- Vehicles tends to be highly correlated with other rating variables (e.g., Deductible, location, age, and limit)
- Multivariate framework required to handle highly correlated variables
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
17
Purpose of Predictive Modeling
To predict a response variable using a series of explanatory variables (or rating factors).
Dependent/ResponseLossesClaims
Retention
Independent/PredictorsAge Symbols
Limit Model YearTerritory Credit Score
WeightsClaims
ExposuresPremium
Statistical Model
Model ResultsParameters
Validation Statistics
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
18
Predictive Modeling
Response Variable
Systematic Component
Random Component
= +
Signal:
Function of the Rating Factors/Predictors
Noise:
Reflects stochastic process
Overall Mean“Best” Model
1 parameter for each
observation
Model Complexity
(Number of Parameters)
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
19
Vehicle Symbol/Relativity Analysis
Vehicle symbol/relativity analysis is a multi-stage process.
How do you isolate the signal from the data?
Many techniques available.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Vehicle
Level
Data
- Problems
-Noisy
-Limited Data
Initial
Vehicle
Estimator
- Choices to be made
-Raw
-Standardized
- Isolate the Signal
-By coverage
-Frequency and Severity determined separately.
-Residual corrections.
-Dimensionally smoothed.
Final
Vehicle
Risk
Estimator
20
Vehicle Symbol/Relativity Analysis
Vehicle symbol/relativity analysis is a multi-stage process.
How do you group the data?
Many techniques available.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Vehicle
Symbol
Assignments
Symbol
Relativities
- Estimators combined.
- Overall estimators clustered to form symbols.
- Relativities calculated for each symbol.
21
Determining the Vehicle Risk Estimator
Variety of methods required to decipher different risk drivers by coverage.
For Example: Weight impacts BI/Med/Collision differently.
This helps us better understand and explain differences.
It can also help in creating better symbol assignments in the future.
Recommend determining the estimator at the granular level (i.e., frequency/severity by coverage).• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Initial
Vehicle
Estimator
Final
Vehicle
Risk
Estimator
22
Determining the Vehicle Risk Estimator
Output of this stage is a risk estimator for each vehicle.
Dimensional
Smoothing
Credibility
Weighting
Residual
Correction
TOOLS
GLM
Tests
Hold-Out
Samples
Residual
Analysis
DIAGNOSTICS
P-Values
Determination of final risk estimator is an iterative process.
GLM
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Initial
Vehicle
Estimator
Final
Vehicle
Risk
Estimator
23
Determining the Vehicle Risk Estimator
Dimensional
Smoothing
Credibility
Weighting
Residual
Correction
TOOLS
GLM
Tests
Hold-Out
Samples
Residual
Analysis
DIAGNOSTICS
P-Values
Where do we start?
Analyst has choices for initial estimator.
GLM
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Initial
Vehicle
Estimator
Final
Vehicle
Risk
Estimator
24
25
Observed Data
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
Given the following rating factors
– Age (a)
– Sex (s)
– Limit (l)
– Vehicle: VIN (v)
Then
:ˆ
,,,,,
,,,,,
lsavlsa
lsavlsa
v
i
i
i Exposures
Claims
Y Initial estimate for the ith VIN
America
26
Standardized Observed Data
Limit/Deductible
Territory
…
Policyholder Sex
Vehicle Factors
Standard Policy
Factors
GLM
Current Symbols
Make Model Categories
VIN Groupings
Residuals
Policyholder Age
Data
Final Vehicle Factors
Include basic vehicle factors within GLM model.
Standard policy factors are captured correctly.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
27
28
Data
GLM
Standard Policy
Factors
Policyholder Age
Policyholder Sex
…
Vehicle Age
Vehicle Group
Vehicle Factors
Body Data
Performance Data
Crash/Theft Data
Residuals
Standardized Fitted Data
Final Vehicle Factors
Directly model vehicle estimators within GLM.• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
29
Rescaled Predicted Values - ctyX
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
60%
65%
70%
>= -92, <-91
>= -91, <-90
>= -90, <-89
>= -89, <-88
>= -88, <-87
>= -87, <-86
>= -86, <-85
>= -85, <-84
>= -84, <-83
>= -83, <-82
>= -82, <-81
>= -81, <-80
Model Prediction at Base levels
Reference Prediction at Base levels
Rescaled Predicted Values - ctyY
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
0%
10%
20%
30%
40%
50%
60%
70%
80%
>= 36, <37
>= 37, <38
>= 38, <39
>= 39, <40
>= 40, <41
>= 41, <42
>= 42, <43
>= 43, <44
>= 44, <45
>= 45, <46
>= 46, <47
Model Prediction at Base levels
Reference Prediction at Base levels
Standardized Fitted Data with External Data
Performance and body data differentiates among unique VINs.
Transmission, Curb Weight, Wheelbase, Power, Torque, Engine Size, 0-60 speed, Braking Distance, Turning Circle, etc.
As Curb Weight increases, Property Damage Severity increases.
As Curb Weight increases, Collision Severity decreases.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
30
Differentiating a VIN
Origin
Make
Vehicle Series
Body Style
Engine
Emission
Check Figure
Year
Factory Code
Serial Number
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
31
Differentiating a VIN
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
32
Differentiating a VIN
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
33
Standardized Fitted Data
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
Redefine the vehicle unit into meaningful concepts
AgeSex
LimitVIN
Then
Current SymbolVIN ClusterBody DataPerformance DataCrash/Theft Data
0exp(ˆ ivY
))/(
)(
)(
i
i
i
iv
iv
v
v
v
VINCluster
Symbol
TheftCrashh
ePerformancg
Bodyf
America
34
Determining the Vehicle Risk Estimator
Variety of diagnostics can be used.
Dimensional
Smoothing
Credibility
Weighting
Residual
Correction
TOOLS
GLM
Tests
Hold-Out
Samples
Residual
Analysis
DIAGNOSTICS
P-Values
Need to determine how well the vehicle estimator is performing.
GLM
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Initial
Vehicle
Estimator
Final
Vehicle
Risk
Estimator
35
Hold-Out Samples
Split data into “Training” and “Test”.
Create groupings/estimators with the “Training” data.
Examine “Test” data to see how well groupings perform.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
36
P-Values
p-value = probability that the modeled frequency is at least as extreme as that observed.
P-Values
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Uniform
P-V
alu
e
Theoretical
freq_ThfFittedExtFwd
PVals
Over-fitting
Under-fitting
Under the null hypothesis the p-values should be uniformly spread over [0,1].
Assume smoothed statistic is underlying frequency in each zip code.• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
37
Residual Analysis
Standardize the data for all factors to see if there is any systematic residual variation.
Derive the residuals for each VIN
Apply Multi-dimensional smoothing methods to aid interpretation
Principle Components
Residual Scoring
Looking for systematic patterns in the residuals
Multidimensional Residual Plots using the VIN characteristics
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
38
Determining the Vehicle Risk Estimator
Tests will indicate which tools analyst should consider.
Dimensional
Smoothing
Credibility
Weighting
Residual
Correction
TOOLS
GLM
Tests
Hold-Out
Samples
Residual
Analysis
DIAGNOSTICS
P-Values
A variety of tools are needed to handle different situations.
GLM
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Initial
Vehicle
Estimator
Final
Vehicle
Risk
Estimator
39
Data
GLM
Standard Policy Factors
Vehicle Factors
Residuals
Standardized Fitted Data
Data
GLM
Standard Policy Factors
Vehicle Factors
Residuals
Overall Mean“Best” Model
1 parameter for each
observation
Model Complexity
(Number of Parameters)
Underfit Overfit
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
40
Standardized Fitted Data
Overall Mean“Best” Model
1 parameter for each
observation
Model Complexity
(Number of Parameters)
Underfit
Enhance GLM
Credibility-weight
Residual Correction
Overfit
Revisit GLM
Smoothing
Residual Correction
TOOLS
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
41
Dimensional Smoothing
Uses knowledge of similar vehicles to enhance estimates of the underlying risk.
Similarity characteristics based on the parameters from the GLM
Essentially applying dimension reduction techniques on the VIN characteristics to form a single continuous variable
Similar to scoring routines
Variates can then be applied to the scores to smooth the estimate
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
42
Residual Correction Factors
Check residuals for underlying systematic patterns.
Ideally, enhance underlying GLM to better explain data.
Alternatively
- Band the residuals via smoothing and clustering
- Estimate a correction factor
Effectively creating a new external factor to explain the vehicle residual effect
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
43
Credibility Weighting
May want to control the amount of credibility weighting via max/min credibility constraints.
Can employ standard credibility weighting techniques.
Z * Primary Estimator+ (1 – Z) * Secondary Estimator
Data
GLM
Standard Policy Factors
Vehicle Factors
Residuals
Data
GLM
Standard Policy Factors
Vehicle Factors
Residuals
Standardized Fitted (Underfit) Standardized Observed
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
44
Determining the Vehicle Symbol Assignment
Use techniques to identify similar risk estimators to be group to create a manageable number of symbol assignments.
Many choices are available to do this.
Let statistics help you choose.
Not practical to do in a GLM/Tree/Other environment.
It is impractical to have a symbol assignment for each and every vehicle.• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Final
Vehicle
Risk
Estimator
Vehicle
Symbol
Assignments
45
Creating New Symbol
BI Frequency
BI Severity
PD Frequency
PD Severity
Comp Frequency
Comp Severity
Coll Frequency
Coll Severity
BI Estimator
PD Estimator
Comp Estimator
Coll Estimator
Vehicle Risk Estimators clustered to form symbols.
Combine component estimators to determine a risk measure for each vehicle for use in building symbols.
Coverage estimators can be further combined if desire 1 set of symbols for multiple coverages.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Vehicle
Symbol
Assignments
46
Clustering used to produce groupings that are predictive of the future:
Minimize within-group heterogeneity.
Maximize cross-group heterogeneity.
Commonly-used clustering methods:
Quantiles
Equal Weight
Similarity Methods
• Average Linkage
• Centroid
Wards
Clustering
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
47
Quantiles
Create groups with equal numbers of observations.
Equal Weight
Create groups which have an equal amount of weight.
Similarity Methods:
Rank the data set by the statistic you wish to cluster.
Decide on which pair of records are the ‘most similar.’
Group these records.
Repeat until left with the desired number of groups.
Wards
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
Clustering Methodologies
America
48
49
Determining New Symbol Relativities
GLM model fit using data grouped by new vehicle symbols.
Test relativities using standard GLM tests.
Predictive in GLM model
Consistent over time in GLM model
Predictive when tested against other data
Refine symbols/relativities as appropriate.
Incorporate rules-based restrictions.
Apply actuarial knowledge.
Investigate “neighbors” with very different relativities.
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
Vehicle
Symbol
Assignments
Symbol
Relativities
50
Accurate estimation of underlying risk associated with vehicle is a three stage process
Vehicle Rating - Overview
Step 1
Obtain a separate estimator by claim type and by frequency and severity for each VIN building block. Combine estimators, as appropriate.
BIFrequency
Severity
Estimator
Estimator
PDFrequency
Severity
Estimator
Estimator
CompFrequency
Severity
Estimator
Estimator
CollFrequency
Severity
Estimator
Estimator
BI Estimator
PD Estimator
Comp Estimator
Coll Estimator
Vehicle Symbols
Symbol Relativities
Step 2
Cluster Vehicle building blocks to develop symbols separately by coverage or for several coverages combined
Step 3
Determine by-coverage relativities for each symbol group
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America
51
Summary
Vehicle is a major driver of risk, thus it is critical that companies review symbol assignments and relativities regularly.
Issues exist that create special challenges with regards to symbol analysis.
High-dimensionality
Heavily correlated
Vehicle symbol analysis requires a range of different approaches and tools (as there are different loss drivers by coverage).
Diagnostics needed to ensure best model possible
• Background
• Symbol Relativities
• Vehicle Estimator
– Initial Estimator
• Vehicle Symbols
– Diagnostics
– Tools
• Summary
America