assessing the use of indeterminates for scorecard model ......scorecard model development james...
TRANSCRIPT
Assessing the use of indeterminates for scorecard model development
James Tebboth & Manoel GadiSantander Analytics
Credit Scoring and Credit Control XIUniversity of Edinburgh27 August 2009
Risk Division
Santander Analytics
Just use goods and bads – don’t use indeterminates!
Risk Division
Santander Analytics
3
Assessing the use of indeterminates
• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application
4
Assessing the use of indeterminates
• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application
5
323,911
202,244
4,908
1,247
92,684
2,945
Banco Santander
Total profit: € 8.9bn
Customer loans: € 620bn
AMERICAS
UNITED KINGDOM
CONTINENTAL EUROPE
75% Retail banking
21% Wholesale banking
4% Asset management / insurance
6
Santander Analytics
� In 2007, the retail bank covered eleven countries, but only three had the capability to develop models
� Santander Analytics was created in 2007 as a group function for the management and development of credit risk scorecards
� The three existing units were combined to cover the entire retail bank of Santander
To contribute to achieving the best lending decisions for Santander through the development and monitoring of decision models, and ensuring the most
effective use of analytical resources available worldwide
7
Scorecard development and management
� One of our objectives was to:
- to ensure a consistent approach
- to establish best practice
- to be dynamic: a focus for learning, capturing and building on the experience throughout the group
develop and implement a corporate methodology and process to develop and manage decision models
8
Scorecard development and management
Gap analysis� Three corporate centres discussed and compared approaches
� Found subtle but significant differences in the three approaches:- reject inference
- use of indeterminates
- model formulation
- involvement of internal stakeholders
- statistics used to assess model performance
- reports used for monitoring models
- methods used to manage and revise models
- …
9
Assessing the use of indeterminates
• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application
10
What are indeterminates?
Non-default
Default 3+ cycles
2 cyclesIndet
GoodIncreasing arrears status at performance point
Bad
Good
Bad
Indet
Good
Bad
REALITY CLASSIFICATION FOR MODEL BUILD
3+ cycles
1, 2 cycles
3+ cycles
� Model built on goods and bads only – indeterminates, if present, excluded from model build
� Considering indeterminates here as “intermediates”
OR OR
11
Industry context
Historical background� Account for mistakes and inconsistencies in manual calculations of arrears
� Allow for judgemental treatment in collections
� With limited computing power, focus model build on small samples of clear-cut goods and bads
Diverse industry views…� Anderson, 2007: Logic for indeterminate range is that (i) seemingly bad
behaviour may be the result of technical arrears, or company strategies; and (ii) good and bad are more clear cut, which should hopefully aid identification of truly problematic accounts
� Scallan, 2008: Avoid indeterminates if at all possible: (i) extra discrimination is spurious; (ii) makes models less sensitive to borderline cases; (iii) statistical estimation more complex; (iv) complicates strategy setting
� Hand, 2003: Proper way to handle indeterminates, intermediate between other classes, is as a three class problem
R Anderson, 2007: The credit scoring toolkit: Theor y and practice for retail credit risk management an d decision automationG Scallan, 2008: Building Better ScorecardsD J Hand, 2003: Good practice in retail credit scor ecard assessment
12
What are we trying to predict?
� Identify accounts that default … at some point
� Want recent development samples, so use proxy definition, such as three payments in arrears after 12 months
� Choose proxy definition to maximise correlation with default
� So purpose of scorecard is to identify “bads”, and want to distinguish bads from not-bads
13
Assessing the use of indeterminates
• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application
14
Evaluation framework
� For each test case, two models built: one using a G/B definition; another using a G/I/B definition
� Models built and validated in consistent way
� Results assessed using Gini and K-S statistics
� In order to compare results, we’ll measure and assess performance using a consistent G/B definition
15
Evaluation data
Three data sets used
� Use existing data sets, from previous model developments
� Brazil, Spain, UK
� Auto-loans, customer score, unsecured personal loan
� Four comparisons tested, representing four G/B or G/I/B definitions used in practice over the three data sets
� Volume of observations: 15,700 32,500 169,100
� Number of independent variables: 124 216 257
� Impossible to cover all variants, but aim for diverse samples toincrease robustness of results
16
Performance evaluation
Jan-04
Feb-04
Mar-04
Apr-04
May-04
Jun-04
Jul-04
Aug-04
Sep-04
Oct-04
Nov-04
Dec-04
Jan-05
Feb-05
Mar-05
Apr-05
May-05
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
1 Jan-04 X X X X X X X X X X X X X X X X X X X
2 Feb-04 X X X X X X X X X X X X X X X X X X X
3 Mar-04 X X X X X X X X X X X X X X X X X X X
4 Apr-04 X X X X X X X X X X X X X X X X X X X
5 May-04 X X X X X X X X X X X X X X X X X X X
6 Jun-04 X X X X X X X X X X X X X X X X X X X
7 Jul-04 X X X X X X X X X X X X X X X X X X X
8 Aug-04 X X X X X X X X X X X X X X X X X X X
9 Sep-04 X X X X X X X X X X X X X X X X X X X
10 Oct-04 X X X X X X X X X X X X X X X X X X X
11 Nov-04 X X X X X X X X X X X X X X X X X X X
12 Dec-04 X X X X X X X X X X X X X X X X X X X
13 Jan-0514 Feb-0515 Mar-0516 Apr-0517 May-0518 Jun-0519 Jul-0520 Aug-0521 Sep-0522 Oct-0523 Nov-0524 Dec-0525 Jan-0626 Feb-0627 Mar-0628 Apr-0629 May-0630 Jun-0631 Jul-06 X X X X X X X X X X X X X X X X X X X
32 Aug-06 X X X X X X X X X X X X X X X X X X X
33 Sep-06 X X X X X X X X X X X X X X X X X X X
34 Oct-06 X X X X X X X X X X X X X X X X X X X
35 Nov-06 X X X X X X X X X X X X X X X X X X X
|- 5m O
OT
-||- distance =
19 months -|
|- 12m develop. -|
Out-of-time sample: mimic when model would have been in use
Development / holdout sample to build model
Observation (scoring) point
Performance point
17
Assessing the use of indeterminates
• 1. Organisational context• 2. Use of indeterminates in scorecards• 3. Empirical assessment• 4. Findings and application
18
Results
OOT SampleDev Sample
G/BG/BTest 4
G/BG/BTest 3
G/BG/BTest 2
G/BG/BTest 1
Table 1
OOT Sample
-0.4%
-5.0%
-1.4%
-2.7%
Table 2
� Table 1: better performing model, out of G/B and G/I/B models
� Table 2: relative drop in Gini, G/I/B model relative to G/B model- all results assessed using G/B definition
19
Conclusions
Results clearly show that indeterminates do not add power to models
Recommend building models on G/B definition, ie, good = not bad
� Results apply to indeterminates as intermediates
� In certain situations may still be relevant to exclude observations where performance genuinely indeterminate
- eg where genuine customer performance unavailable or compromised
- better thought of as exclusions
20
Convincing the stakeholders
A significant amount of discussion with stakeholders followed!� Evaluation framework had been agreed prior to work
� Results and conclusions circulated
� Many questions from stakeholders from all three centres followed- how the models were built
- why the performance statistics varied as they did
- had the model taken account of this or that particular detail in the data
� While no test can ever be perfect or comprehensive, the test framework used and the robustness of the results obtained meant that the conclusions were adopted
� Conclusions also follow principle of not introducing complexity unless warranted
- and also avoids complexity in other areas, eg, reject inference
21
Implementation
� End result was an agreed approach to build models on a G/B definition
� Incorporated into corporate methodology
� Methodology implemented through model building code, to help give consistency and efficiency
22
Further work
� Address all issues raised by gap analysis
� … but not all through empirical testing!
� Incorporate scorecard areas and techniques from recently acquired companies
- Banco Real
- Alliance & Leicester
- Sovereign
� Primary objective remains helping the bank to make the best decisions through the best models
� Recognise that standard techniques are not suitable for all situations
� So plenty of opportunities to continue learning
Risk Division
Santander Analytics
Thank you for your attention
Any questions?