using risk factors in insurance analytics data driven strategies...

28
Using risk factors in insurance analytics data driven strategies Chaire DAMI workshop, Louvain-la-Neuve Katrien Antonio LRisk - KU Leuven and ASE - University of Amsterdam September 2017 K. Antonio, KU Leuven & UvA 1 / 28

Upload: others

Post on 25-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Using risk factors in insurance analyticsdata driven strategies

Chaire DAMI workshop, Louvain-la-Neuve

Katrien AntonioLRisk - KU Leuven and ASE - University of Amsterdam

September 2017

K. Antonio, KU Leuven & UvA 1 / 28

Page 2: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Motivation

Claim frequency and claim severity

as function of

nominal / numeric ∼ ordinal / spatial

features

K. Antonio, KU Leuven & UvA Motivation 2 / 28

Page 3: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Research questions

I Generalized Linear Models (GLMs) for frequency (∼ Poisson) andseverity (∼ gamma).

I How to:

(1) select risk factors or features?

(2) cluster (or bin or fuse) levels within a risk factor?

age groups / postal code clusters / clusters of car models

I Procedure should be data driven, scalable to large (big) data.

I End product is interpretable, within actuarial comfort zone.

K. Antonio, KU Leuven & UvA Motivation 3 / 28

Page 4: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

A data driven strategyfor the construction of insurance tariff classes

Henckaerts, Antonio, et al., 2017 (online)

Page 5: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

GAM as a starting point

I Generalized Additive Model with predictor:

ηi = g(µi ) = β0 +

p∑j=1

βjxdij +

q∑j=1

fj(xcij ) +

r∑j=1

fj(xsij , y

sij).

I Information criteria:

AIC = −2 · logL+ 2 · EDF

BIC = −2 · logL+ log(n) · EDF,

balancing goodness of fit and complexity.

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 5 / 28

Page 6: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

GAM as a starting point

I MTPL data set from Denuit & Lang (2004), 163 231 records.

I Lowest BIC among exhaustive search with 1 024 fitted models:

log(E(nclaims)) =log(exp) + β0 + β1coveragePO + β2coverageFO + β3fueldiesel+

f1(ageph) + f2(power) + f3(bm) + f4(ageph, power) + f5(long, lat).

which combines offset and

categorical ∼ nominal continuous ∼ ordinal

interactions spatial

risk factors.

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 6 / 28

Page 7: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

GAM as a starting point

−0.25

0.00

0.25

0.50

25 50 75

ageph

f̂ 1

−1

0

1

2

0 50 100 150 200 250

powerf̂ 2

0.0

0.4

0.8

0 5 10 15 20

bm

f̂ 3

0

50

100

150

200

250

25 50 75

ageph

pow

er

−0.5

0.0

0.5f̂4

−0.4

−0.2

0.0

0.2

f̂5

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 7 / 28

Page 8: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Bin smooth GAM effects - spatial

I Bin or cluster f̂5(longi , lati ) for i ∈ {1, ..., 1 146}.

I We use: (also see classint package in R)

• equal intervals;

• quantile binning;

• complete linkage (see Kaufman & Rousseeuw, 1990)

• Fisher’s natural breaks (see Fisher, 1958 and Slocum et al., 2005).

I Tune number of clusters towards lowest AIC for the GAM with abinned spatial effect

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 8 / 28

Page 9: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Bin smooth GAM effects - spatial

Equal

[−0.48,−0.32)

[−0.32,−0.15)

[−0.15,0.012)

[0.012,0.18)

[0.18,0.34]

Quantile

[−0.48,−0.18)

[−0.18,−0.092)

[−0.092,−0.019)

[−0.019,0.067)

[0.067,0.34]

Complete

[−0.48,−0.32)

[−0.32,−0.079)

[−0.079,0.047)

[0.047,0.23)

[0.23,0.34]

Fisher

[−0.48,−0.27)

[−0.27,−0.14)

[−0.14,−0.036)

[−0.036,0.11)

[0.11,0.34]

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 9 / 28

Page 10: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Bin smooth GAM effects - continuous

I Bin f̂1(ageph), f̂2(power), f̂3(bm), f̂4(ageph, power).

I Use evolutionary tree (evtree, Grubinger et al, 2014) e.g. on:

Covariate: ageph Response: f̂1(ageph) Weight: w

18 0.495 1619 0.459 11620 0.424 393

I Evaluate tree with MSE + α · complexity penalty,

MSE =

∑max(ageph)i=min(ageph)

wagephi(f̂1(agephi )− f̂ b1 (agephi ))

2∑max(ageph)i=min(ageph)

wagephi

and α tuning parameter.

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 10 / 28

Page 11: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Bin smooth GAM effects - continuous

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 11 / 28

Page 12: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

0

50

100

150

200

250

25 50 75

ageph

pow

er

−0.5

0.0

0.5f̂4

0

50

100

150

200

250

25 50 75

ageph

pow

er

GLMcoefficients

−0.07

−0.021

0

0.035

0.064

−0.4

−0.2

0.0

0.2

f̂5GLMcoefficients

−0.329

−0.204

−0.155

0

0.199

K. Antonio, KU Leuven & UvA Henckaerts, Antonio, et al., 2017 12 / 28

Page 13: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Sparse modeling of risk factors in insurance analytics

Devriendt, Antonio, et al., 2017 (in progress)

Page 14: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Lasso

I Less is more: (Hastie, Tibshirani & Wainwright, 2015)

a sparse model is easier to estimate and interpret than a dense model.

I Regularize (with budget constraint t, or regularization parameter λ):

minβ0,β{− logL} subject to ‖β‖1 ≤ t,

or equivalenty

minβ0,β

− logL+ λ ·p∑

j=1

|βj |

.

Shrinks coefficients and even sets some to zero.

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 14 / 28

Page 15: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Lasso and friends

I Adjust lasso regularization to the type of risk factor:

• Determine type (nominal / numeric ∼ ordinal / spatial);

• Allocate logical penalty.

I Thus, for J risk factors, each with regularization term Pj(.), we wantto optimize:

− logL (β1, . . . ,βJ) + λ ·J∑

j=1

Pj (βj).

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 15 / 28

Page 16: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Matching regularization to type of risk factor

I Ordinal risk factors: fused lasso∑i

wi |βi+1 − βi |.

I Nominal risk factors: generalized fused lasso∑i>k

wi ,k |βi − βk |.

I Spatial risk factor: graph guided fused lasso∑(i ,k)∈G

wi ,k |βi − βk |.

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 16 / 28

Page 17: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Unified GLM framework with multiple type of penalties

I Gertheiss & Tutz (2010) and Oelker & Gertheiss (2017):

• GLMs with various penalties.

• R package available: gvcm.cat (not maintained).

I Uses local quadratic approximations of penalties and PIRLS:

• non-exact selection or fusion;

• computationally intensive.

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 17 / 28

Page 18: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Unified GLM framework with multiple type of penalties

I Our contribution:

• implements an efficient algorithm (with proximal operators);

• scalable to big data (splits into smaller sub-problems);

• flexibility of regularization

penalty takes type of risk factor into account;

works for all popular penalties;

• unifies penalty-specific (machine learning) literature with statistical (or:actuarial) literature.

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 18 / 28

Page 19: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

MTPL claim frequency with multiple type of penalties

I Fit Poisson GLM with different penalties.

I Settings:

• Incorporate adaptive and standardization weights for better consistencyand predictive performance.

• Tune λ with BIC (λ̂BIC = 784).

I Re-estimate the final sparse GLM with standard GLM routines (from387 to 30 parms.).

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 19 / 28

Page 20: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

MTPL claim frequency with multiple type of penalties

1e−01 1e+01 1e+03 1e+05

0.00

0.05

0.10

0.15

0.20

0.25

0.30

period

lambda

perio

d pa

ram

eter

s

BICAICGCV BIC.reestAIC.reestGCV.reest

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 20 / 28

Page 21: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

MTPL claim frequency with multiple type of penalties

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 21 / 28

Page 22: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

MTPL claim frequency with multiple type of penalties

−0.25

0.00

0.25

0.50

25 50 75

age

β̂

−1.00

0.00

1.00

2.00

0 50 100 150 200 250

power

β̂

−4.00

−3.00

−2.00

−1.00

0.00

0 10 20 30 40 50

carage

β̂

0.00

0.30

0.60

0.90

0 5 10 15 20

bm

β̂

GAM pGLM pGLM refit

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 22 / 28

Page 23: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

MTPL claim frequency with multiple type of penalties

−0.20

0.00

0.20

0.40

fleet2 fourbyfour1 fuel2 monovolume1 sex2 sport2 use2

Binary variables

β̂

−0.10

0.00

0.10

0.20

0.30

coverage2 coverage3 period2 period3 period4

Coverage & Period variables

β̂

−0.25

0.00

0.25

0.50

100A

612

132

362

680

A4

ACC

OR

ALM

ER

AS

CO

NAU

TRE

AX

SA

XB

RAV

O BX

CA

RIN

CIV

ICC

LAS

SC

LIO

−C

OLT

CO

RO

LC

OR

SA

CX

XM

FIE

ST

GA

LAN

GO

LFG

RA

NA

JETT

AK

AD

ET

LAG

UN

LAN

CE

ME

GA

NM

ICR

AO

ME

GA

OR

ION

P10

0P

200

P30

0P

400

PAS

SA

PO

LOP

RIM

ER

25−S

S30

0S

500

SE

RIE

SIE

RR

STA

RL

SW

IFT

UN

O−P

VIS

AZX

SA

R

Model variable

β̂

GAM pGLM pGLM refit

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 23 / 28

Page 24: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Wrap-up

I From multi-step to less is more.

I To do:

• further improve algorithm efficiency;

• implement penalties for spatial information, interaction effects;

• R package and paper in progress.

K. Antonio, KU Leuven & UvA Devriendt, Antonio, et al., 2017 24 / 28

Page 25: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

More information

For more information, please visit:

LRisk website, www.lrisk.be;

my homepage, www.econ.kuleuven.be/katrien.antonio.

Thanks to

Ageas Continental Europe Argenta

K. Antonio, KU Leuven & UvA More information 25 / 28

Page 26: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

References

Henckaerts, R., Antonio, K., Clijsters, M. and Verbelen, R. (2017)A data driven strategy for the construction of insurance tariff classes.Submitted.

Wood, S. (2006)Generalized additive models: an introduction with R.Chapman and Hall/CRC Press.

Gertheiss, J. and Tutz, G. (2010).Sparse modeling of categorial explanatory variables.The Annals of Applied Statistics, 4(4), 2150-2180.

Oelker, M. and Gertheiss, J. (2017).A uniform framework for the combination of penalties in generalizedstructured models.Advances in Data Analysis and Classification, 11(1),97-120.

K. Antonio, KU Leuven & UvA List of references 26 / 28

Page 27: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

References

Grubinger, T., Zeileis, A., and Pfeiffer, K.-P. (2014).evtree: Evolutionary learning of globally optimal classification andregression trees in R.Journal of Statistical Software, 61(1), 1-29.

Bivand, R. (2015).classInt: Choose Univariate Class Intervals.R package version 0.1-23.

Parikh, N. and Boyd, S. (2013).Proximal algorithms.Foundations and Trends in Optimization, 1(3):123-231.

Hastie, T., Tibshirani, R. and Wainwright, M. (2015)Statistical learning with sparsity: the Lasso and generalizations.Chapman and Hall/CRC Press.

K. Antonio, KU Leuven & UvA List of references 27 / 28

Page 28: Using risk factors in insurance analytics data driven strategies …chaire-dami.fr/files/2015/12/KatrienAntonio.pdf · 2017-09-18 · evtree: Evolutionary learning of globally optimal

Extra plots

−4 −2 0 2 4

−0.

10−

0.05

0.00

0.05

0.10

0.15

log(λ)

Coe

ffici

ents

10 10 10 10 10

−7 −6 −5 −4 −3 −2

−0.

20−

0.15

−0.

10−

0.05

0.00

0.05

0.10

log(λ)

Coe

ffici

ents

10 9 9 9 8 2

0 5 10 15 20

−0.

050.

000.

050.

100.

150.

20

ordinal penalty example

λ

Coo

rdin

ates

of β

var 1var 2var 3var 4var 5

var 6var 7var 8var 9var 10

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

−0.

050.

000.

050.

100.

150.

20

nominal penalty example

λ

Coo

rdin

ates

of β

var 1var 2var 3var 4var 5

var 6var 7var 8var 9var 10

K. Antonio, KU Leuven & UvA Extra 28 / 28