predictive analytics - amazon web services
TRANSCRIPT
Actuaries and Consultants
2019 Spring Workshops,Chicago Actuarial Association
Predictive Analytics: A Practical Approach
Andrea Huckaba Rome, FSA, CERA, MAAA
Why Predictive Analytics Matters
Emerging area of interest
Automation will affect the work you do
Actuaries (and other experts) needed to fully benefit from Predictive Analytics
2
Agenda
1. Definition of Predictive Analytics
2. Applicable Business Problems
3. Available Tools
4. Step by step example
3
Definition
4
“Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.”
How is this different from what we do today?• New techniques• Fewer Assumptions*• Less Time
Definition
5
Examples:
Health Risk ScoringNew Techniques
Fewer AssumptionsLess Time
Auto Fraud DetectionNew TechniquesFewer AssumptionsLess Time
Business Problems CriteriaWhen is PA appropriate for a business problem?
1. Clearly defined problem
2. Lots of useful data
3. Prediction will drive action
4. PA is the best option
6
Business Problems Criteria 1. Clearly defined problem
2. Lots of useful data
3. Prediction will drive action
4. PA is the best option
7
Case 1: Some customers have complained about the system they use to contact the company, file claims, and receive information. The customer experience VP wants to create a customized set of protocols for each person that cater to their preferences. We have call center data, demographic data, customer e-mails, and social media data. Is this a viable PA problem?
Case 2: Your company’s head of sales wants to develop and sell a radically new product, and wants to know what type of people might be interested in it. You have current enrollment and claims data for your members, as well as some scattered industry reports on popular products. Is this a viable PA problem?
Business Problems Criteria 1. Clearly defined problem
2. Lots of useful data
3. Prediction will drive action
4. PA is the best option
8
Case 3: Your state department of insurance wants a better way to identify potentially fraudulent claims for further investigation. They have comprehensive claims, enrollment and provider data from each carrier. They also have a database of prior cases that they have investigated for fraud, with the outcomes of each case. Is this a viable PA problem?
Available PA ToolsSoftware
R, Python, SAS, Statistica, …
9
Types of Models
Categorical vs. Numerical
Exploratory vs. Result-Based
Transparent vs. Black Box
Fast vs. Slow (Execution Speed)
Fast vs. Slow (Analytical Time/Effort)
Stability
Some Model Types:• Linear Models• Generalized Linear Models• Decision Trees• Random Forests
• Gradient Boosting Machines• Support Vector Machines• Neural Networks• Genetic Algorithms
Available PA ToolsGeneralized Linear Models
10
Match to common distributions with link functions, using most predictive data variables to shape the curve.
Like a linear regression model, maximized!!
Advantages:• Efficient• Interpretable• Smooth prediction surface
Disadvantages:• Likely to under-fit• Linear parameters
Variation: Stacking/Blending with other model types
Available PA ToolsDecision Trees
11
Splits the feature space into exhaustive and mutually exclusive datasets that best predict the target variable.
Advantages:• Simple• Non-linear• Interpretable• Handles Missing Values in Data
Disadvantages:• Likely to overfit• Unstable• Prediction surface not smooth
Variation: Random Forest or Stacking/Blending with other model types
Available PA ToolsSupport Vector Machines
12
Groups similar data points using decision boundaries. Future data points can be grouped quickly using these boundaries.
Advantages:• Good for online learning• Flexible with non-linear data
Disadvantages:• Likely to overfit• Do not provide probability
estimates, just classification.
Example PA Problem: Health Plan LapseThe Business Problem
Self-insured group wants to know why members are lapsing
1. Clearly Define: Determine drivers of lapse for this healthcare MEWA, so we can predict potential future lapse
2. Lots of Useful Data: Claims, demographic data, membership information, and possible unstructured data from sales team
3. Prediction Drives Action: Characteristics of lapsers may help MEWA prioritize changes and/or use predictions to lessen chance of lapse
4. Most Appropriate Option: Combining multiples characteristics of lapsers and looking for patterns is easier with PA than with traditional analysis.
13
Example PA Problem: Health Plan LapseData Gathering/Exploration
Scrubbing and Common sense checks
ASOP 23 Do data imperfections have a material impact on my results?
Do I understand the definitions of each data field?
Is the data reasonable to use for this analysis?
14
Example PA Problem: Health Plan LapseData Gathering/Exploration
Scrubbing and Common sense checks
Scrubbing Reformat fields
Create new fields
Lower dimensionality on some fields
15
Example PA Problem: Health Plan LapseData Gathering/Exploration
Scrubbing and Common sense checks
Checks Lapse by month
16
Example PA Problem: Health Plan LapseData Gathering/Exploration
Scrubbing and Common sense checks
Checks Lapse by month
Lapse by age
17
Example PA Problem: Health Plan LapseData Gathering/Exploration
Scrubbing and Common sense checks
Checks Lapse by month
Lapse by age
Other checks
18
Example PA Problem: Health Plan LapseData Gathering/Exploration
Splitting the data
Testing / Training/ (Validation)
Avoid overfitting, accurate measure of predictive power
Random vs Stratified Sampling
19
Example PA Problem: Health Plan LapseData Gathering/Exploration
Relationships between variables
Univariate, Multivariate plotting
Other measures of relationships Correlation
Principle Component Analysis
Clustering Analysis (KNN)
20
Example PA Problem: Health Plan LapseFeature Selection
Not all of the raw data will be used in our analysis
Choose and/or modify the raw data
Two methods: Every feature then pare down
A few features then build up
21
Example PA Problem: Health Plan LapseModel Selection
What is Needed: High transparency, Categorical
Decision Tree (to start)
22
Example PA Problem: Health Plan LapseModel Selection- Decision Tree (to start)
23
Initial decision tree:• Unreadable• Too many factors = false precisionSolution= We need to prune the tree
Example PA Problem: Health Plan LapseModel Training/Testing
24
Pruned decision tree:• Better• Makes sense• Some further pruning might be
needed (area-based predictions)• Might consider other features
Finally, use this model to determine the accuracy against our training and testing datasets.
Example PA Problem: Health Plan LapseOther Follow-ups
25
Remove some features to see if others play a more important role:
• Remove renewal as a factor. Aging and discontinued plans were most prominent.
• Remove Aging as a factor. Tier and prior year rate increases were most prominent.
Example PA Problem: Health Plan LapseInterpreting Results
How do we convey this to the stakeholders?
How can this model be implemented?
When should this model be updated?
26
AccuracyTraining Model Value77.2%
Testing Model Value77.8%
Example PA Problem: Health Plan LapseInterpreting Results
27
Exhibit 8.4: Type of Lapse, by Year% of Total Lapsed members for the year
Type of Lapse 2016 2017 Description
Age-Based Lapse 19% 17% Medicare eligible or Dependents to 26
Mid-Year Spouse or Dependent Lapse 5% 3% Child covered by the other parent, spouse gains other
coverage, change in living situation, etc.
Renewal Spouse or Dependent Lapse 1% 3% Employee opts to change tier during renewal
Mid-Year Employee Lapse 43% 35% Employee lapse during any month but December. Late
renewal decisions, change in employement, etc.
Renewal Employee Lapse 12% 11% Employee lapse at the end of December, a decision to
not renew their plan
Mid-Year Group Lapse 15% 18% Group lapse during any month but December
Renewal Group Lapse 5% 13% Group lapse at the end of December, a decision to not
renew group coverage
Example PA Problem: Health Plan LapseWhat else?
28
Given time, I would also do the following:• Incorporate more medical information, and additional demographics• Incorporate unstructured data from sales• Random Forest or Stacking Blending to increase credibility of the
model• Increase the visual appeal
Final Notes for Actuaries
29
Actuarial students are learning this- use their expertise.
These are just tools. More sophisticated tools.
Remember the criteria for business problems.
Thank You!