clustering 101 for insurance applications...• partition-based clustering method • relatively...
TRANSCRIPT
April 25, 2019
Tom Kolde, FCAS, MAAA
Linda Brobeck, FCAS, MAAA
Clustering 101 for Insurance Applications
1
About the Presenters
• Linda Brobeck, FCAS, MAAA • Director & Consulting Actuary• San Francisco, CA
• Tom Kolde, FCAS, MAAA• Consulting Actuary• Chicago, Illinois
2
Agenda
• Supervised vs. Unsupervised Learning
• Clustering Algorithms Overview
– Hierarchical Clustering
– K-Means
• Clustering Application Examples
3
Supervised Vs. Unsupervised Machine Learning
Machine Learning
Supervised
Predictive
Target Variable
Task Driven
Regression, Classification
Unsupervised
Descriptive
No Target Variable
Data Driven
Clustering, Pattern Discovery, Dimension Reduction
Reinforcement – Algorithm Learns to React
4
Principal Component Analysis
Clustering
Neural Networks
Polling Question #1
What types of unsupervised learning have you used in the past?
Other
A
B
C
D
DE None….YET
SELECT ALL APPLICABLE
5
Types of Clustering
Clustering Algorithms
Connectivity
Hierarchical
Agglomerative
Divisive
Centroid
K-Means
Fuzzy C-Means
K-Mediods
Distribution
Expectation Maximization
Density
OPTICS
DBSCAN
6
• Additional types of cluster models
– Neural models
– Principal component analysis
• Hard vs. Soft (Fuzzy) clustering
• Finer distinctions
– Strict partitioning (with or without outliers)
– Overlapping
Other Clustering Options
7
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
7 Clusters
A
B
C
D
EF
G
8
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
6 Clusters
A
B
C
D
EF
G
9
Euclidean Distance
A = (x1, y1)
B = (x2, y2)
y1 - y2
d = √ (y1 – y2)2 + (x2 - x1)
2
x2- x1
10
Distance Matrix
a b c d e f
a
b 5.39
c 2.31 4.81
d 2.42 5.32 0.51
e 5.02 5.49 2.71 2.69
f 6.00 6.25 3.70 3.63 1.02
g 6.20 6.53 3.91 3.81 1.26 0.28
x y
a 4.0 5.0
b 6.0 10.0
c 6.3 5.2
d 6.4 4.7
e 9.0 5.4
f 10.0 5.2
g 10.2 5.0
Data Points Euclidean Distances
11
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
6 Clusters
A
B
C
D
EF
G
12
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
5 Clusters
A
B
C
D
EF
G
13
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
4 Clusters
A
B
C
D
EF
G
14
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
3 Clusters
A
B
C
D
EF
G
15
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
2 Clusters
A
B
C
D
EF
G
16
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
1 Cluster
A
B
C
D
EF
G
17
• Bottom Up - Agglomerative
Hierarchical Clustering (HCA)
A
B
C
D
EF
G
B A C D E F G
18
Hierarchical Algorithm
• Advantage
– Easy to understand
– Flexible
• Disadvantages
– Not easily computable for large data sets
– Sensitive to outliers
19
• Partition-based clustering method
• Relatively simple to understand & program
• K-means Algorithm:
1. Start with a random set of k cluster seeds
2. For each data point, calculate the distance to the each cluster seed and assign to the closest seed
3. Once all data points have initially been assigned, calculate the centroid of each cluster
4. Repeat Step 2 using the cluster centroids instead of the initial cluster seed
5. For each new cluster, re-calculate the centroid
6. Repeat steps 3-5 until convergence
Introduction to the K-Means Algorithm
20
• Our example begins with 95 data points
K-Means Cluster Analysis
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
21
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
• Next we assume the data has 3 clusters and randomly generate initial seed centroids for each
K-Means Cluster Analysis
Seed 1
Seed 2
Seed 3
22
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
Cluster 1 Cluster 2 Cluster 3
• Each data point is assigned to the closest seed for its initial cluster
K-Means Cluster Analysis
Seed 1
Seed 2
Seed 3
23
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
Cluster 1 Cluster 2 Cluster 3
• New centroids are calculated for each cluster
K-Means Cluster Analysis
Seed 1
Seed 2
Seed 3
Centroid 1
Centroid 3 Centroid 2
24
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
Cluster 1 Cluster 2 Cluster 3
• Data points are assigned to the nearest centroid
K-Means Cluster Analysis
25
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
Cluster 1 Cluster 2 Cluster 3
• New centroids are calculated for the data points within each cluster
K-Means Cluster Analysis
26
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Y
X
Cluster 1 Cluster 2 Cluster 3
• The process continues with data points being assigned to the nearest cluster centroid until convergence
K-Means Cluster Analysis
27
Advantages/Disadvantages of K-Means Algorithm
• Advantage
– Computationally simple
• Disadvantages
– Number of clusters k must be pre-selected
– Results may not be repeatable when using randomly selected seed centroids
• K-Medians Algorithm
– Less sensitive to outliers
– More processing time (to sort dataset)
28
• Variable reduction for modeling
• Territory analysis for ratemaking
Clustering Applications
29
• 44 Macro Economic Variables
– Unemployment (current, long-term, local, state, countrywide, changes over time)
– Housing Prices (changes over time, local, state, countrywide)
– Treasury Rates (short-term, long-term, yield curve slope, etc.)
– GDP (change over time, duration negative or positive, ratios)
• Correlation Matrix
• PROC VARCLUS in SAS (Oblique Centroid Component Cluster Analysis)
• Variable selection for each cluster
Variable Reduction Example
30
• Treas rate 30 yr• UE 1prior MSA• UE 1prior ST• UE 1prior CW• UE 3prior MSA• UE 3prior ST• UE 3prior CW• UE rel ST• UE rel MSA• UE rel CW• UE ST• UE MSA• UE CW• Yield Curve Slope• GDP current• GDP Prior• GDP dur neg• GDP dur pos• GDP recession• GDP ratio• GDP ratio 1YR• GDP ratio 2YR
Variable Reduction Example
• UE 10 yr MSA• UE 10 yr CW• UE 10 yr ST• UE Delta ST• UE Delta MSA• UE Delta CW• Fixed 30 YR rate• House Price Apprec 2YR ST• House Price Apprec 2YR MSA• House Price Apprec 2YR CW• Home Price Index ST• Home Price Index MSA• Home Price Index CW• Treas rate 3 mo• Treas rate 6 mo• Treas rate 1 yr• Treas rate 2 yr• Treas rate 3 yr• Treas rate 5 yr• Treas rate 7 yr• Treas rate 10 yr• Treas rate 20 yr
31
Portion of the Correlation Matrix
Treas
rate 3 mo
Treas
rate 6 mo
Treas
rate 1 yr
Treas
rate 2 yr
Treas
rate 3 yr
Treas
rate 5 yr
Treas
rate 7 yr
Treas
rate 10 yr
Treas
rate 20 yr
Treas
rate 30 yr
Treas rate 3 mo 1 0.99828 0.99324 0.9728 0.94592 0.88626 0.83375 0.78125 0.21737 0.64186
Treas rate 6 mo 0.99828 1 0.9976 0.97972 0.95364 0.89353 0.83958 0.78627 0.21897 0.64229
Treas rate 1 yr 0.99324 0.9976 1 0.99018 0.96911 0.91428 0.86236 0.8095 0.22962 0.66596
Treas rate 2 yr 0.9728 0.97972 0.99018 1 0.99336 0.9569 0.91471 0.86526 0.26694 0.72931
Treas rate 3 yr 0.94592 0.95364 0.96911 0.99336 1 0.98265 0.95119 0.90702 0.29029 0.78043
Treas rate 5 yr 0.88626 0.89353 0.91428 0.9569 0.98265 1 0.99115 0.96453 0.32941 0.86757
Treas rate 7 yr 0.83375 0.83958 0.86236 0.91471 0.95119 0.99115 1 0.98911 0.34906 0.91986
Treas rate 10 yr 0.78125 0.78627 0.8095 0.86526 0.90702 0.96453 0.98911 1 0.36967 0.96373
Treas rate 20 yr 0.21737 0.21897 0.22962 0.26694 0.29029 0.32941 0.34906 0.36967 1 0.39463
Treas rate 30 yr 0.64186 0.64229 0.66596 0.72931 0.78043 0.86757 0.91986 0.96373 0.39463 1
32
VARCLUS output
Total Proportion Minimum Minimum Maximum
Number Variation of Variation Proportion R-squared 1-R**2
of Explained by Explained Explained for a Ratio for
Clusters Clusters by Clusters by a Cluster Variable a Variable
1 2.0213 0.0459 0.0459 0
2 11.8449 0.2692 0.1105 0 2.3283
3 17.6347 0.4008 0.1212 0 2.0271
4 23.8405 0.5418 0.16 0.0126 1.8387
5 27.7395 0.6304 0.3053 0.0825 1.5727
6 30.1739 0.6858 0.4645 0.1161 1.3948
7 31.5827 0.7178 0.571 0.1292 1.3087
8 32.4495 0.7375 0.5919 0.1292 1.5582
9 33.4705 0.7607 0.6476 0.1292 1.5582
10 35.7483 0.8125 0.71 0.1655 1.5582
11 36.3604 0.8264 0.7369 0.1655 1.5582
12 37.0867 0.8429 0.7459 0.1655 1.5582
13 37.9171 0.8618 0.7898 0.1655 1.5582
33
VARCLUS output
Total Proportion Minimum Minimum Maximum
Number Variation of Variation Proportion R-squared 1-R**2
of Explained by Explained Explained for a Ratio for
Clusters Clusters by Clusters by a Cluster Variable a Variable
1 2.0213 0.0459 0.0459 0
2 11.8449 0.2692 0.1105 0 2.3283
3 17.6347 0.4008 0.1212 0 2.0271
4 23.8405 0.5418 0.16 0.0126 1.8387
5 27.7395 0.6304 0.3053 0.0825 1.5727
6 30.1739 0.6858 0.4645 0.1161 1.3948
7 31.5827 0.7178 0.571 0.1292 1.3087
8 32.4495 0.7375 0.5919 0.1292 1.5582
9 33.4705 0.7607 0.6476 0.1292 1.5582
10 35.7483 0.8125 0.71 0.1655 1.5582
11 36.3604 0.8264 0.7369 0.1655 1.5582
12 37.0867 0.8429 0.7459 0.1655 1.5582
13 37.9171 0.8618 0.7898 0.1655 1.5582
34
VARCLUS output
Total Proportion Minimum Minimum Maximum
Number Variation of Variation Proportion R-squared 1-R**2
of Explained by Explained Explained for a Ratio for
Clusters Clusters by Clusters by a Cluster Variable a Variable
1 2.0213 0.0459 0.0459 0
2 11.8449 0.2692 0.1105 0 2.3283
3 17.6347 0.4008 0.1212 0 2.0271
4 23.8405 0.5418 0.16 0.0126 1.8387
5 27.7395 0.6304 0.3053 0.0825 1.5727
6 30.1739 0.6858 0.4645 0.1161 1.3948
7 31.5827 0.7178 0.571 0.1292 1.3087
8 32.4495 0.7375 0.5919 0.1292 1.5582
9 33.4705 0.7607 0.6476 0.1292 1.5582
10 35.7483 0.8125 0.71 0.1655 1.5582
11 36.3604 0.8264 0.7369 0.1655 1.5582
12 37.0867 0.8429 0.7459 0.1655 1.5582
13 37.9171 0.8618 0.7898 0.1655 1.5582
35
36
VARCLUS OUTPUT
1-R**2
Own Next Ratio
Cluster Closest
Cluster 9 Fixed 30 YR rate 0.9003 0.7565 0.4093
Treas rate 3 mo 0.8434 0.3789 0.2522
Treas rate 6 mo 0.852 0.3861 0.2412
Treas rate 1 yr 0.8793 0.4136 0.2059
Treas rate 2 yr 0.9346 0.4757 0.1247
Treas rate 3 yr 0.9624 0.5218 0.0787
Treas rate 5 yr 0.9746 0.6031 0.0639
Treas rate 7 yr 0.9512 0.6584 0.1427
Treas rate 10 yr 0.9107 0.7231 0.3223
Treas rate 20 yr 0.1655 0.1063 0.9338
Treas rate 30 yr 0.7495 0.7659 1.0701
Cluster 10 UE Delta ST 0.9249 0.2696 0.1028
UE Delta CW 0.9249 0.3729 0.1197
10 Cluster Solution R-squared with
Cluster
1 − 𝑅𝑜𝑤𝑛2
1 − 𝑅𝑛𝑒𝑎𝑟𝑒𝑠𝑡2
37
• Calculated the correlation matrix to be used in VARCLUS• Selected number of clusters based on the proportion of variation
explained by clusters and the minimum R-squared for a variable within the cluster
• Selected the variable with the smallest 1-R2 ratio to represent the cluster– 5 year treasury rate– Prior quarter countrywide unemployment rate– Prior quarter MSA unemployment rate– Ratio GDP current to 2 years prior– Current GDP– GDP recession indicator– State home price index– MSA home price index– Duration of positive GDP growth– Change in unemployment rate by state
Summary of Variable Reduction Clustering
38
• Deriving territory definitions is a common application of cluster analysis in ratemaking
• Goals:
– Loss experience by territory should be actuarially credible
– Balance homogeneity of loss experience within territory while producing a manageable number of territories
– Contiguous territories
• Solution = Hierarchical clustering using Ward’s method with contiguity constraint
Introduction to Territorial Clustering
39
• Each square below represents a zip code in our hypothetical State X
Introduction to Territorial Cluster Analysis
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
40
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• Step 1 – Determine raw pure premium by zip code
Introduction to Territorial Cluster Analysis
Lower PP Higher PP
41
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• Not every zip code is fully credible
• Spatial smoothing allows us to obtain credible results by zip code
Engineering Credible Loss Experience by Zip Code
Lower PP Higher PP
42
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• Determine the credibility for a single zip code
Spatial Smoothing
Credibility = Z0
Pure Premium = PP0
Lower PP Higher PP
43
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• Determine pure premium and credibility for area including surrounding zip codes
Spatial Smoothing
Credibility = Z1
Pure Premium = PP1
Lower PP Higher PP
44
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• Determine pure premium and credibility for area including surrounding zip codes
Spatial Smoothing
Credibility = Z2
Pure Premium = PP2
Lower PP Higher PP
45
• Smoothed PP =
PP0 x Z0 + PP1 x (Z1-Z0) + PP2 x (Z2-Z1) + PPState x (1-Z2)
Spatial Smoothing
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
Lower PP Higher PP
46
• Spatial Smoothing helps uncover patterns hidden within the loss experience
Spatial Smoothing
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
Raw Pure Premium Smoothed Pure Premium
Lower PP Higher PP
47
• Ward’s method seeks to minimize the variance of data characteristics within each cluster
• In territorial cluster analysis this means minimizing the within cluster variance of loss experience metrics, such as frequency or pure premium
• In this case, frequency/pure premium is not viewed as a target variable but rather as a risk characteristic of a zip code
Ward’s Method
48
• The variance measure for combining clusters is the within-cluster sum of squares between a data object and the mean of the cluster:
– Within-cluster sum of squares = ESS = ∑∑ 𝑋𝑖𝑗 − ത𝑋𝑖.2
– Between-cluster sum of squares = BSS = ∑∑ ത𝑋𝑖. − ത𝑋..2
– Total sum of squares = TSS = ∑∑ 𝑋𝑖𝑗 − ത𝑋..2
– TSS = ESS + BSS
Ward’s Method
49
• Begin with each zip code as its own cluster (N=600)
• Evaluate each pair of contiguous zip codes to determine the within-cluster variance
• The pair of zip codes which are most similar (i.e., produce the smallest within-cluster variance) is formed into a cluster
• Next, the clusters from the 1st iteration (N-1=599) are evaluated to find the pair with the minimum within-cluster variance. This pair is combined to form the second cluster
• The process continues until all zip codes are grouped into a single cluster
Territorial Cluster Analysis Using Ward’s Method
50
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• The highlighted pair of zip codes produce the smallest within-cluster variance of any pair of contiguous zip codes
Territorial Cluster Analysis Using Ward’s Method
51
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• The process continues combining zip codes into clusters until all zip codes are combined into a single cluster
Territorial Cluster Analysis Using Ward’s Method
52
l
l
l l
l
l
West Town
North Center
Star City
Central City
South Shore City
Old Town
• The process continues combining zip codes into clusters until all zip codes are combined into a single cluster
Territorial Cluster Analysis Using Ward’s Method
53
0%
20%
40%
60%
80%
100%
120%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of Territories
Percentage of Total Variance Explained by Within-Cluster Variance
• Ward’s Method does not explicitly optimize the number of territories but it can provide insight into the percentage of total variance explained by the within-cluster variance
• A common metric used for this evaluation is ESS/TSS
Determining the Number of Territories
54
0%
20%
40%
60%
80%
100%
120%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of Territories
Percentage of Total Variance Explained by Within-Cluster Variance
• Ward’s Method does not explicitly optimize the number of territories but it can provide insight into the percentage of total variance explained by the within-cluster variance
• A common metric used for this evaluation is ESS/TSS
Determining the Number of Territories
15.2% of the total variance is explained by the within-cluster variance at 12 territories
10.2% of the total variance is explained by the within-cluster variance at 22 territories
55
Territorial Cluster Analysis Results
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 3 3 3 3 3 3 3 2 2 l 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 3 3 2 2
3 3 3 3 3 3 3 3 3 3 3 3 7 l 7 6 6 6 3 6 3 6 3 3 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 6 6 6 6 6 3 3 3 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 6 7 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 6 6 7 6 6 6 6 6 2 2 2 4 2 2 2 2 2 2
3 5 5 5 5 5 3 6 6 3 3 6 7 7 6 6 6 6 2 2 4 4 4 2 2 2 2 2
3 3 5 3 6 6 6 6 6 6 6 6 6 6 6 4 2 2 2 2 2 2
3 3 3 5 l 5 5 3 6 6 6 6 6 6 6 10 6 6 6 4 4 l 4 4 4 2 2 2 2 2
3 3 3 5 5 5 3 3 3 3 3 3 6 10 l 10 6 6 6 6 4 4 4 4 4 2 2 2 2 2
3 3 3 3 5 5 3 3 11 11 11 6 6 6 6 6 6 6 4 4 4 4 2 2 2
11 3 3 3 11 5 3 3 11 11 6 6 6 6 6 10 6 6 6 6 6 2 2 2 2 2 2 2 2 2
11 3 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 6 6 6 2 2 2 2 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 11 11 6 6 6 2 2 2 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 11 11 6 6 6 11 6 2 2 11 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 11 11 11 11 8 8 11 11 2 2 11 12 12 11 11
11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 8 8 11 2 11 l 12 12 11
11 11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
West Town
North Center
Star City
Central City
South Shore City
Old Town
West Town
North Center
Star City
Central City
South Shore City
Old Town
56
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 3 3 3 3 3 3 3 2 2 l 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 3 3 2 2
3 3 3 3 3 3 3 3 3 3 3 3 7 l 7 6 6 6 3 6 3 6 3 3 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 6 6 6 6 6 3 3 3 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 6 7 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 6 6 7 6 6 6 6 6 2 2 2 4 2 2 2 2 2 2
3 5 5 5 5 5 3 6 6 3 3 6 7 7 6 6 6 6 2 2 4 4 4 2 2 2 2 2
3 3 5 3 6 6 6 6 6 6 6 6 6 6 6 4 2 2 2 2 2 2
3 3 3 5 l 5 5 3 6 6 6 6 6 6 6 10 6 6 6 4 4 l 4 4 4 2 2 2 2 2
3 3 3 5 5 5 3 3 3 3 3 3 6 10 l 10 6 6 6 6 4 4 4 4 4 2 2 2 2 2
3 3 3 3 5 5 3 3 11 11 11 6 6 6 6 6 6 6 4 4 4 4 2 2 2
11 3 3 3 11 5 3 3 11 11 6 6 6 6 6 10 6 6 6 6 6 2 2 2 2 2 2 2 2 2
11 3 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 6 6 6 2 2 2 2 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 11 11 6 6 6 2 2 2 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 11 11 6 6 6 11 6 2 2 11 2 2 2 2
11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 11 11 11 11 8 8 11 11 2 2 11 12 12 11 11
11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 8 8 11 2 11 l 12 12 11
11 11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 11 11 11
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
West Town
North Center
Star City
Central City
South Shore City
Old Town
West Town
North Center
Star City
Central City
South Shore City
Old Town
Territory Boundries Overlaid Against Smoothed PP
Lower PP Higher PP
57
• The results of the cluster analysis should be evaluated for:
– Reasonability
– Underwriting and competitive considerations
– Regulatory constraints
• The territories can be used in the context of GLMs or other supervised learning analyses to determine appropriate rating factors and/or further territory refinement.
Further Considerations
58
• Cluster analysis is a broad field with many possibilities for further exploration
• K-means and Hierarchical clustering methods provide the practitioner a starting point for expanding his/her knowledge
• Software packages offer out of the box clustering procedures but custom programming may be required for sophisticated applications to introduce actuarial considerations and constraints
Summary
59
Questions
60
Join Us for the Next APEX Webinar
61
• We’d like your feedback and suggestions
• Please complete our survey
• For copies of this APEX presentation
• Visit the Resource Knowledge Center at Pinnacleactuaries.com
Final notes
62Commitment Beyond Numbers
Thank You for Your Time and Attention
Tom Kolde
Linda Brobeck