toward a unified approach to fitting loss models

33
Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004

Upload: jariou

Post on 10-Jun-2015

504 views

Category:

Documents


3 download

DESCRIPTION

A systematic approach to fitting loss models.

TRANSCRIPT

Page 1: Toward a Unified Approach to Fitting Loss Models

Toward a unified approach to fitting loss models

Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004

Page 2: Toward a Unified Approach to Fitting Loss Models

Overview

What problem is being addressed? The general idea The specific ideas

Models to considerRecording the dataRepresenting the dataTesting a modelSelecting a model

Page 3: Toward a Unified Approach to Fitting Loss Models

The problem

Too many modelsTwo books – 26 distributions!Can mix or splice to get even more

Data can be confusingDeductibles, limits

Too many tests and plotsChi-square, K-S, A-D, p-p, q-q, D

Page 4: Toward a Unified Approach to Fitting Loss Models

The general idea

Limited number of distributions Standard way to present data Retain flexibility on testing and selection

Page 5: Toward a Unified Approach to Fitting Loss Models

Distributions

Should beFamiliarFewFlexible

Page 6: Toward a Unified Approach to Fitting Loss Models

A few familiar distributions

ExponentialOnly one parameter

GammaTwo parameters, a mode if

LognormalTwo parameters, a mode

ParetoTwo parameters, a heavy right tail

Page 7: Toward a Unified Approach to Fitting Loss Models

Flexible

Add by allowing mixtures That is,

where

and all Some restrictions:

Only the exponential can be used more than once.

Cannot use both the gamma and lognormal.

1 1( ) ( ) ( )k kf x a f x a f x

1 1ka a 0ja

Page 8: Toward a Unified Approach to Fitting Loss Models

Why mixtures?

Allows different shape at beginning and end (e.g. mode from lognormal, tail from Pareto).

By using several exponentials can have most any tail weight (see Keatinge).

Page 9: Toward a Unified Approach to Fitting Loss Models

Estimating parameters

Use only maximum likelihoodAsymptotically optimalCan be applied in all settings, regardless of

the nature of the dataLikelihood value can be used to compare

different models

Page 10: Toward a Unified Approach to Fitting Loss Models

Representing the data

Why do we care?Graphical tests require a graph of the

empirical density or distribution function.Hypothesis tests require the functions

themselves.

Page 11: Toward a Unified Approach to Fitting Loss Models

What is the issue?

None if,All observations are discrete or groupedNo truncation or censoring

But if so,For discrete data the Kaplan-Meier product-

limit estimator provides the empirical distribution function (and is the nonparametric mle as well).

Page 12: Toward a Unified Approach to Fitting Loss Models

Issue – grouped data

For grouped data, If completely grouped, the histogram

represents the pdf, the ogive the cdf. If some grouped, some not, or multiple

deductibles, limits, our suggestion is to replace the observations in the interval with that many equally spaced points.

Page 13: Toward a Unified Approach to Fitting Loss Models

Review

Given a data set, we have the following:A way to represent the data.A limited set of models to consider.Parameter estimates for each model.

The remaining tasks are:Decide which models are acceptable.Decide which model to use.

Page 14: Toward a Unified Approach to Fitting Loss Models

Example

The paper has two example, we will look only at the second one.

Data are individual payments, but the policies that produced them had different deductibles (100, 250, 500) and different maximum payments (1,000, 3,000, 5,000).

There are 100 observations.

Page 15: Toward a Unified Approach to Fitting Loss Models

Empirical cdfKaplan-Meier estimate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1000 2000 3000 4000 5000 6000

loss

F-e

mp

(x)

Page 16: Toward a Unified Approach to Fitting Loss Models

Distribution function plot

Plot the empirical and model cdfs together. Note, because in this example the smallest deductible is 100, the empirical cdf begins there.

To be comparable, the model cdf is calculated as

( ) ( )( )

1 ( )d

F x F dF x

F d

Page 17: Toward a Unified Approach to Fitting Loss Models

Example model

All plots and tests that follow are for a mixture of a lognormal and exponential distribution. The parameters are

1

lognormal: 7.109459, 0.254236

exponential: 1839.174

0.238301a

Page 18: Toward a Unified Approach to Fitting Loss Models

Distribution function plotDistribution function plot

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1000 2000 3000 4000 5000 6000

loss

F(x

) F-emp

F-model

Page 19: Toward a Unified Approach to Fitting Loss Models

Confidence bands

It is possible to create 95% confidence bands. That is, we are 95% confident that the true distribution is completely within these bands.

Formulas adapted from Klein and Moeschberger with a modification for multiple truncation points (their formula allows only multiple censoring points).

Page 20: Toward a Unified Approach to Fitting Loss Models

CDF plot with bounds

CDF plot with 95% bounds

00.10.20.30.40.5

0.60.70.80.9

1

0 1000 2000 3000 4000 5000 6000

loss

F(x

)

F-emp

F-model

lower

upper

Page 21: Toward a Unified Approach to Fitting Loss Models

Other CDF pictures

Any function of the cdf, such as the limited expected value, could be plotted.

The only one shown here is the difference plot – magnify the previous plot by plotting the difference of the two distribution functions.

Page 22: Toward a Unified Approach to Fitting Loss Models

CDF difference plot

CDF difference plot

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000 6000

loss

Difference

lower

upper

Page 23: Toward a Unified Approach to Fitting Loss Models

Histogram plot

Plot a histogram of the data against the density function of the model.

For data that were not grouped, can use the empirical cdf to get cell probabilities.

Page 24: Toward a Unified Approach to Fitting Loss Models

Histogram plot

Histogram plot

0

0.0001

0.0002

0.0003

0.0004

0.0005

0.0006

0.0007

0 1000 2000 3000 4000 5000 6000

loss

hist

model

Page 25: Toward a Unified Approach to Fitting Loss Models

Hypothesis tests

Null-model fits Alternative-it doesn’t Three tests

Kolmogorov-SmirnovAnderson-DarlingChi-square

Page 26: Toward a Unified Approach to Fitting Loss Models

Kolmogorov-Smirnov

Test statistic is maximum difference between the empirical and model cdfs. Each difference is multiplied by a scaling factor related to the sample size at that point.

Critical values are way off when parameters estimated from data.

Page 27: Toward a Unified Approach to Fitting Loss Models

Anderson-Darling

Test statistic looks complex:

where e is empirical and m is model. The paper shows how to turn this into a

sum. More emphasis on fit in tails than for K-S

test.

22 [ ( ) ( )]

( )( )[1 ( )]

ue m

mdm m

F x F xA f x dx

F x F x

Page 28: Toward a Unified Approach to Fitting Loss Models

Chi-square test

You have seen this one before. It is the only one with an adjustment for

estimating parameters.

Page 29: Toward a Unified Approach to Fitting Loss Models

Results

K-S: 0.5829 A-D: 0.2570 Chi-square p-value of 0.5608 The model is clearly acceptable.

Simulation study needed to get p-values for these tests. Simulation indicates that the p-values are over 0.9.

Page 30: Toward a Unified Approach to Fitting Loss Models

Comparing models

Good picture Better test numbers Likelihood criterion such as Schwarz

Bayesian. The SBC is the loglikelihood minus (r/2)ln(n) where r is the number of parameters and n is the sample size.

Page 31: Toward a Unified Approach to Fitting Loss Models

Several models

Model Loglike A-D K-S Chi-sq SBC

Exp -628.23 1.2245 0.9739 0.1054 -630.53

Ln -626.26 0.6682 0.9375 0.2126 -630.87

Gam -627.35 0.8369 1.0355 0.2319 -631.96

L/E -623.77 0.2579 0.5829 0.5608 -632.98

G/E -623.64 0.2804 0.5773 0.5260 -632.85

L/E/E -623.39 0.1484 0.4494 0.3472 -637.21

G/E/E -623.26 0.1353 0.4652 0.3348 -637.08

Page 32: Toward a Unified Approach to Fitting Loss Models

Which is the winner?

Referee A – loglikelihood rules – pick gamma/exp/exp mixture This is a world of one big model and the best is the

best, simplicity is never an issue.

Referee B – SBC rules – pick exponential Parsimony is most important, pay a penalty for extra

parameters.

Me – lognormal/exp. Great pictures, better numbers than exponential, but simpler than three component mixture.

Page 33: Toward a Unified Approach to Fitting Loss Models

Can this be automated?

We are working on software Test version can be downloaded at

www.cbpa.drake.edu/mixfit. MLEs are good. Pictures and test

statistics are not quite right. May crash. Here is a quick demo.