Download - Actuariatdel’AssuranceNon-Vie#9 - Hypotheses.org · 2016-11-28 · Arthur CHARPENTIER - Actuariat de l’Assurance Non-Vie, # 9 Modèle individuel oumodèle collectif ? LaloiTweedie

Arthur CHARPENTIER - Actuariat de l’Assurance Non-Vie, # 9

Actuariat de l’Assurance Non-Vie # 9

A. Charpentier (UQAM & Université de Rennes 1)

ENSAE ParisTech, Octobre 2015 - Janvier 2016.

http://freakonometrics.hypotheses.org

@freakonometrics 1


Fourre-Tout sur la Tarification

• modèle collectif vs. modèle individuel

• cas de la grande dimension

• choix de variables

• choix de modèles

@freakonometrics 2


Modèle individuel ou modèle collectif ? La loi Tweedie

Consider a Tweedie distribution, with variance function power p ∈ (1, 2), mean µand scale parameter φ, then it is a compound Poisson model,

• N ∼ P(λ) with λ = φµ2−p

2− p

• Yi ∼ G(α, β) with α = −p− 2p− 1 and β = φµ1−p

p− 1

Consversely, consider a compound Poisson model N ∼ P(λ) and Yi ∼ G(α, β),then

• variance function power is p = α+ 2α+ 1

• mean is µ = λα

β

• scale parameter is φ = [λα]α+2α+1−1β2−α+2

α+1

α+ 1

@freakonometrics 3


Modèle individuel ou modèle collectif ? La régression Tweedie

In the context of regression

Ni ∼ P(λi) with λi = exp[XTi βλ]

Yj,i ∼ G(µi, φ) with µi = exp[XTi βµ]

Then Si = Y1,i + · · ·+ YN,i has a Tweedie distribution

• variance function power is p = φ+ 2φ+ 1

• mean is λiµi

• scale parameter is λ1

φ+1−1i

µφφ+1i

(φ

1 + φ

)

There are 1 + 2dim(X) degrees of freedom.

@freakonometrics 4


Modèle individuel ou modèle collectif ? La régression Tweedie

Remark Note that the scale parameter should not depend on i.

A Tweedie regression is

• variance function power is p ∈ (1, 2)

• mean is µi = exp[XTi βTweedie]

• scale parameter is φ

There are 2 + dim(X) degrees of freedom.

@freakonometrics 5


Double Modèle Frquence - Coût Individuel

Considérons les bases suivantes, en RC, pour la fréquence1 > freq = merge (contrat , nombre _RC)

pour les coûts individuels1 > sinistre _RC = sinistre [( sinistre $ garantie =="1RC")&( sinistre $cout >0)

,]

2 > sinistre _RC = merge ( sinistre _RC , contrat )

et pour les co ûts agrégés par police1 > agg_RC = aggregate ( sinistre _RC$cout , by=list( sinistre _RC$ nocontrat )

, FUN=’sum ’)

2 > names (agg_RC)=c(’nocontrat ’,’cout_RC ’)

3 > global _RC = merge (contrat , agg_RC , all.x=TRUE)

4 > global _RC$cout_RC[is.na( global _DO$cout_RC)]=0

@freakonometrics 6


Double Modèle Frquence - Coût Individuel1 > library ( splines )

2 > reg_f = glm(nb_RC~zone+bs( ageconducteur )+carburant , offset =log(

exposition ),data=freq , family = poisson )

3 > reg_c = glm(cout~zone+bs( ageconducteur )+carburant , data= sinistre _RC

, family = Gamma (link="log"))

Simple Modèle Coût par Police1 > library ( tweedie )

2 > library ( statmod )

3 > reg_a = glm(cout_RC~zone+bs( ageconducteur )+carburant , offset =log(

exposition ),data= global _RC , family = tweedie (var. power =1.5 , link.

power =0))

@freakonometrics 7


Comparaison des primes1 > freq2 = freq

2 > freq2 $ exposition = 1

3 > P_f = predict (reg_f, newdata =freq2 ,type=" response ")

4 > P_c = predict (reg_c, newdata =freq2 ,type=" response ")

5 prime1 = P_f*P_c

1 > k = 1.5

2 > reg_a = glm(cout_DO~zone+bs( ageconducteur )+carburant , offset =log(

exposition ),data= global _DO , family = tweedie (var. power =k, link. power

=0))

3 > prime2 = predict (reg_a, newdata =freq2 ,type=" response ")

1 > arrows (1:100 , prime1 [1:100] ,1:100 , prime2 [1:100] , length =.1)

@freakonometrics 8


Impact du degré Tweedie sur les Primes Pures

@freakonometrics 9


Impact du degré Tweedie sur les Primes Pures

Comparaison des primes pures, assurés no1, no2 et no 3 (DO)

@freakonometrics 10


‘Optimisation’ du Paramètre Tweedie

1 > dev = function (k){

2 + reg = glm(cout_RC~zone+bs( ageconducteur )+

carburant , data= global _RC , family =

tweedie (var. power =k, link. power =0) ,

offset =log( exposition ))

3 + reg$ deviance

4 + }

@freakonometrics 11


Tarification et données massives (Big Data)

Problèmes classiques avec des données massives

• beaucoup de variables explicatives, k grand, XTX peut-être non inversible

• gros volumes de données, e.g. données télématiques

• données non quantitatives, e.g. texte, localisation, etc.

@freakonometrics 12


La fascination pour les estimateurs sans biais

En statistique mathématique, on aime les estimateurs sans biais car ils ontplusieurs propriétés intéressantes. Mais ne peut-on pas considérer des estimateursbiaisés, potentiellement meilleurs ?

Consider a sample, i.i.d., {y1, · · · , yn} with distribution N (µ, σ2). Defineθ = αY . What is the optimal α? to get the best estimator of µ ?

• bias: bias(θ)

= E(θ)− µ = (α− 1)µ

• variance: Var(θ)

= α2σ2

n

• mse: mse(θ)

= (α− 1)2µ2 + α2σ2

n

The optimal value is α? = µ2

µ2 + σ2

n

< 1.

@freakonometrics 13


Linear Model

Consider some linear model yi = xTi β + εi for all i = 1, · · · , n.

Assume that εi are i.i.d. with E(ε) = 0 (and finite variance). Writey1...yn

︸︷︷︸y,n×1

=

1 x1,1 · · · x1,k...

.... . .

...1 xn,1 · · · xn,k

︸︷︷︸

X,n×(k+1)

β0

β1...βk

︸︷︷︸β,(k+1)×1

+

ε1...εn

︸︷︷︸ε,n×1

.

Assuming ε ∼ N (0, σ2I), the maximum likelihood estimator of β is

β = argmin{‖y −XTβ‖`2} = (XTX)−1XTy

... under the assumtption that XTX is a full-rank matrix.

What if XTiX cannot be inverted? Then β = [XTX]−1XTy does not exist, but

βλ = [XTX + λI]−1XTy always exist if λ > 0.

@freakonometrics 14


Ridge Regression

The estimator β = [XTX + λI]−1XTy is the Ridge estimate obtained as solutionof

β = argminβ

n∑i=1

[yi − β0 − xTi β]2 + λ ‖β‖`2︸︷︷︸

1Tβ2

for some tuning parameter λ. One can also write

β = argminβ;‖β‖`2≤s

{‖Y −XTβ‖`2}

Remark Note that we solve β = argminβ{objective(β)} where

objective(β) = L(β)︸︷︷︸training loss

+ R(β)︸︷︷︸regularization

@freakonometrics 15


Going further on sparcity issues

In severall applications, k can be (very) large, but a lot of features are just noise:βj = 0 for many j’s. Let s denote the number of relevent features, with s << k,cf Hastie, Tibshirani & Wainwright (2015),

s = card{S} where S = {j;βj 6= 0}

The model is now y = XTSβS + ε, where XT

SXS is a full rank matrix.

@freakonometrics 16

https://www.crcpress.com/Statistical-Learning-with-Sparsity-The-Lasso-and-Generalizations/Hastie-Tibshirani-Wainwright/9781498712163



Define ‖a‖`0 =∑

1(|ai| > 0). Ici dim(β) = s.

We wish we could solve

β = argminβ;‖β‖`0≤s

{‖Y −XTβ‖`2}

Problem: it is usually not possible to describe all possible constraints, since(s

k

)coefficients should be chosen here (with k (very) large).

Idea: solve the dual problem

β = argminβ;‖Y −XTβ‖`2≤h

{‖β‖`0}

where we might convexify the `0 norm, ‖ · ‖`0 .

@freakonometrics 17


Regularization `0, `1 et `2

@freakonometrics 18



On [−1,+1]k, the convex hull of ‖β‖`0 is ‖β‖`1

On [−a,+a]k, the convex hull of ‖β‖`0 is a−1‖β‖`1

Hence,β = argmin

β;‖β‖`1≤s{‖Y −XTβ‖`2}

is equivalent (Kuhn-Tucker theorem) to the Lagragian optimization problem

β = argmin{‖Y −XTβ‖`2+λ‖β‖`1}

@freakonometrics 19


LASSO Least Absolute Shrinkage and Selection Operator

β ∈ argmin{‖Y −XTβ‖`2+λ‖β‖`1}

is a convex problem (several algorithms?), but not strictly convex (no unicity ofthe minimum). Nevertheless, predictions y = xTβ are unique

? MM, minimize majorization, coordinate descent Hunter (2003).

@freakonometrics 20

http://sites.stat.psu.edu/~dhunter/papers/mmtutorial.pdf


Optimal LASSO Penalty

Use cross validation, e.g. K-fold,

β(−k)(λ) = argmin

∑i6∈Ik

[yi − xTi β]2 + λ‖β‖

then compute the sum of the squared errors,

Qk(λ) =∑i∈Ik

[yi − xTi β(−k)(λ)]2

and finally solve

λ? = argmin{Q(λ) = 1

K

∑k

Qk(λ)}

Note that this might overfit, so Hastie, Tibshiriani & Friedman (2009) suggest thelargest λ such that

Q(λ) ≤ Q(λ?) + se[λ?] with se[λ]2 = 1K2

K∑k=1

[Qk(λ)−Q(λ)]2

@freakonometrics 21

http://statweb.stanford.edu/~tibs/ElemStatLearn/


1 > freq = merge (contrat , nombre _RC)

2 > freq = merge (freq , nombre _DO)

3 > freq [ ,10]= as. factor (freq [ ,10])

4 > mx= cbind (freq[,c(4 ,5 ,6)],freq [ ,9]=="D",

freq [ ,3]% in%c("A","B","C"))

5 > colnames (mx)=c( names (freq)[c(4 ,5 ,6)],"

diesel ","zone")

6 > for(i in 1: ncol(mx)) mx[,i]=( mx[,i]-mean(

mx[,i]))/sd(mx[,i])

7 > names (mx)

8 [1] puissance agevehicule ageconducteur

diesel zone

9 > library ( glmnet )

10 > fit = glmnet (x=as. matrix (mx), y=freq [ ,11] ,

offset =log(freq [ ,2]) , family = " poisson

")

11 > plot(fit , xvar=" lambda ", label =TRUE)

LASSO, Fréquence RC

−10 −9 −8 −7 −6 −5

−0.

20−

0.15

−0.

10−

0.05

0.00

0.05

0.10

Log Lambda

Coe

ffici

ents

4 4 4 4 3 1

1

3

4

5

@freakonometrics 22


LASSO, Fréquence RC

1 > plot(fit , label =TRUE)

2 > cvfit = cv. glmnet (x=as. matrix (mx), y=freq

[ ,11] , offset =log(freq [ ,2]) ,family = "

poisson ")

3 > plot( cvfit )

4 > cvfit $ lambda .min

5 [1] 0.0002845703

6 > log( cvfit $ lambda .min)

7 [1] -8.16453

• Cross validation curve + error bars

0.0 0.1 0.2 0.3 0.4

−0.

20−

0.15

−0.

10−

0.05

0.00

0.05

0.10

L1 Norm

Coe

ffici

ents

0 2 3 4 4

1

3

4

5

−10 −9 −8 −7 −6 −5

0.24

60.

248

0.25

00.

252

0.25

40.

256

0.25

8

log(Lambda)

Poi

sson

Dev

ianc

e

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 2 1

@freakonometrics 23


1 > freq = merge (contrat , nombre _RC)

2 > freq = merge (freq , nombre _DO)

3 > freq [ ,10]= as. factor (freq [ ,10])

4 > mx= cbind (freq[,c(4 ,5 ,6)],freq [ ,9]=="D",

freq [ ,3]% in%c("A","B","C"))

5 > colnames (mx)=c( names (freq)[c(4 ,5 ,6)],"

diesel ","zone")

6 > for(i in 1: ncol(mx)) mx[,i]=( mx[,i]-mean(

mx[,i]))/sd(mx[,i])

7 > names (mx)

8 [1] puissance agevehicule ageconducteur

diesel zone

9 > library ( glmnet )

10 > fit = glmnet (x=as. matrix (mx), y=freq [ ,12] ,

offset =log(freq [ ,2]) , family = " poisson

")

11 > plot(fit , xvar=" lambda ", label =TRUE)

LASSO, Fréquence DO

−9 −8 −7 −6 −5 −4

−0.

8−

0.6

−0.

4−

0.2

0.0

Log Lambda

Coe

ffici

ents

4 4 3 2 1 1

1

2

4

5

@freakonometrics 24


LASSO, Fréquence DO

1 > plot(fit , label =TRUE)

2 > cvfit = cv. glmnet (x=as. matrix (mx), y=freq

[ ,12] , offset =log(freq [ ,2]) ,family = "

poisson ")

3 > plot( cvfit )

4 > cvfit $ lambda .min

5 [1] 0.0004744917

6 > log( cvfit $ lambda .min)

7 [1] -7.653266

• Cross validation curve + error bars

0.0 0.2 0.4 0.6 0.8 1.0

−0.

8−

0.6

−0.

4−

0.2

0.0

L1 Norm

Coe

ffici

ents

0 1 1 1 2 4

1

2

4

5

−9 −8 −7 −6 −5 −4

0.21

50.

220

0.22

50.

230

0.23

5

log(Lambda)

Poi

sson

Dev

ianc

e

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

4 4 4 4 3 3 3 3 3 2 2 1 1 1 1 1 1 1 1

@freakonometrics 25


Model Selection and Gini/Lorentz (on incomes)

Consider an ordered sample {y1, · · · , yn}, then Lorenz curve is

{Fi, Li} with Fi = i

nand Li =

∑ij=1 yj∑nj=1 yj

The theoretical curve, given a distribution F , is

u 7→ L(u) =∫ F−1(u)−∞ tdF (t)∫ +∞−∞ tdF (t)

see Gastwirth (1972, econpapers.repec.org)

@freakonometrics 26

http://econpapers.repec.org/RePEc:tpr:restat:v:54:y:1972:i:3:p:306-16



1 > library (ineq)

2 > set.seed (1)

3 > (x<-sort( rlnorm (5 ,0 ,1)))

4 [1] 0.4336018 0.5344838 1.2015872 1.3902836

4.9297132

5 > Lc.sim <- Lc(x)

6 > plot(Lc.sim)

7 > points ((1:4) /5,( cumsum (x)/sum(x))[1:4] , pch

=19 , col="blue")

8 > lines (Lc.lognorm , parameter =1, lty =2)0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Lorenz curve

p

L(p)

●

●

●

●

@freakonometrics 27



Gini index is the ratio of the areas AA + B . Thus,

G = 2n(n− 1)x

n∑i=1

i · xi:n −n+ 1n− 1

= 1E(Y )

∫ ∞0

F (y)(1− F (y))dy

1 > Gini(x)

2 [1] 0.4640003●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

p

L(p)

●

●

●

●A

B

@freakonometrics 28


Comparing Models

Consider an ordered sample {y1, · · · , yn} of incomes,with y1 ≤ y2 ≤ · · · ≤ yn, then Lorenz curve is


nand Li =


We have observed losses yi and premiums π(xi). Con-sider an ordered sample by the model, see Frees, Meyers& Cummins (2014), π(x1) ≥ π(x2) ≥ · · · ≥ π(xn), thenplot


nand Li =


0 20 40 60 80 100

020

4060

8010

0

Proportion (%)

Inco

me

(%)

poorest ← → richest

0 20 40 60 80 100

020

4060

8010

0

Proportion (%)

Loss

es (

%)

more risky ← → less risky

@freakonometrics 29

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2438129

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2438129


Choix et comparaison de modèle en tarification

See Frees et al. (2010) or Tevet (2013).

@freakonometrics 30

https://www.casact.org/education/spring/2010/handouts/C17-Frees2.pdf

http://www.casact.org/newsletter/index.cfm?fa=viewart&id=6540

Download - Actuariatdel’AssuranceNon-Vie#9 - Hypotheses.org · 2016-11-28 · Arthur CHARPENTIER - Actuariat de l’Assurance Non-Vie, # 9 Modèle individuel oumodèle collectif ? LaloiTweedie

Top Related