alexander suprun, cibc august 2018torsas.ca/attachments/file/20190621/suprun-lgdmodel.pdf ·...
TRANSCRIPT
Alexander Suprun, CIBC
August 2018
1
Introductory Notes The presentation discussion and examples are applied
to LGD models.
However, the same methodology could be applied to any account level target in the range of [0,1] and
Any model which consists of several pools containing accounts with similar values of the target.
An obvious example of this model is a pool wise exposure at default (EAD) model, where EAD is defined as a ratio of default balance to limit.
AUROC and RCAP codes are at the end pages.
2
Revised AUROC AUROC - Area Under Receiver Operation Characteristic Applied to a binary version of LGD (see below)
Model AUROC calculation For pool wise sorted LGD(p), where p=1 to 𝑝𝑛 Calculate cumulative percent of “bad” and “good” accounts:
𝑐𝑁𝐵𝑝 = σ𝑖=1𝑝
𝑁𝐵𝑖/σ𝑖=1𝑝𝑛 𝑁𝐵𝑖 and
𝑐𝑁𝐺𝑝 = σ𝑖=1𝑝
𝑁𝐺𝑖/σ𝑖=1𝑝𝑛 𝑁𝐺𝑖
Graph 𝑐𝑁𝐵𝑝 vs 𝑐𝑁𝐺𝑝 and calculate area under the curve
Perfect or account AUROC calculation Sort account level LGD and arrange them in a large number of
pools=ROUND(aLGD/aLGDmax/0.01) , that produces around 100 pools, and aLGD is an account LGD value
Use the above algorithm to calculate Perfect AUROC
Revised AUROC=Model AUROC/Perfect AUROC
3
RCAP Revised Cumulative Accuracy Profile (Revised CAP)
Applied directly to LGD
Model CAP calculation: For pool wise sorted LGD(p), where p=1 to 𝑝𝑛
Calculate cumulative percent of 𝑐𝐿𝐺𝐷𝑝 = σ𝑖=1𝑝
𝐿𝐺𝐷𝑖/σ𝑖=1𝑝𝑛 𝐿𝐺𝐷𝑖 and
𝑐𝑃𝑝 = σ𝑖=1𝑝
𝑖/𝑝𝑛 Graph 𝑐𝐿𝐺𝐷𝑝 vs 𝑐𝑃 and calculate area under the curve
Perfect CAP calculation: Sort account LGD and apply the above method considering each
account as a separate pool. This value is a highest possible CAP for a given data
RCAP=Model CAP/Perfect CAP
4
AUROC for Binary Version of LGD Binomization:
Let LGD=0.75 then
Create two records:
BAD=1 with freq=75 and BAD=0 with freq=25
Perfect or Account Level AUROC:
Create pools by grouping accounts having the same value of pool=ROUND(aLGD/aLGDmax/0.01), where aLGD is account level LGD
Calculate AUROC based on these ~100 of pools.
5
Model AUROC Calculate AUROC for LGD model with 8 pools
Results:
r_AUROC=mod_AUROC/acc_AUROC
DWLGD stands for dollar weighted LGD
6
DWLGD r_AUROC mod_AUROC acc_AUROC0.4843 0.6925 0.6496 0.9380
Account Level & Model ROC Graph r_AUROC=0.6925 acc_AUROC=0.9380 mod_AUROC=0.6496
7
AUROC for Simple Binary LGD Binomization:
If LGD > t_LGD then BAD=1 else BAD=0, where t_LGDis some threshold LGD
Model LGD AUROC
AUROC for model LGD pools has been calculated for t_LGD=0.05 to 1.00 by 0.05
8
AUROC for Simple Binary LGD cont. Results
All threshold values produceroughly the same AUROC
t_LGD=0.4843 is a dollar weighed LGDvalue for the model
9
t_LGD r_AUROC mod_AUROC acc_AUROC
0.4843 0.7172 0.6727 0.9380
0.0500 0.7188 0.6743 0.9380
0.1000 0.7202 0.6756 0.9380
0.1500 0.7206 0.6759 0.9380
0.2000 0.7208 0.6761 0.9380
0.2500 0.7199 0.6753 0.9380
0.3000 0.7192 0.6747 0.9380
0.3500 0.7189 0.6743 0.9380
0.4000 0.7180 0.6735 0.9380
0.4500 0.7175 0.6731 0.9380
0.5000 0.7168 0.6724 0.9380
0.5500 0.7160 0.6716 0.9380
0.6000 0.7152 0.6708 0.9380
0.6500 0.7140 0.6698 0.9380
0.7000 0.7131 0.6689 0.9380
0.7500 0.7117 0.6676 0.9380
0.8000 0.7100 0.6660 0.9380
0.8500 0.7076 0.6638 0.9380
0.9000 0.7038 0.6602 0.9380
0.9500 0.6984 0.6552 0.9380
1.0000 0.6691 0.6276 0.9380
Revised Cumulative Accuracy Profile Model LGD Pools: rCAP=model_CAP/perfect_CAP
rCAP=0.2637 perfect_CAP=0.4201 model_CAP=0.1108
10
Conclusions Suggested revised AUROC is a better representation of
model AUROC value
AUROC for simple binary LGD model does not practically depend on LGD threshold value
AUROC for binary LGD model is somewhat lower than the one for simple binary model
11
Q & A
12
Area Under Receiver Operating Characteristic (ROC) MacroArea Under Receiver Operating Characteristic (ROC) Macro/************************************************************/;
/* HIGHER POOL NUMBERS SHOULD CORRESPOND HIGHER PDs */;
/* FOR BAD=1 MEANING DEFAULT, BAD=0 NONDEFAULT */;
/************************************************************/;
%macro AUC_ST (dat= /* Input Dataset */
,res= /* Output AUC Time Series */
,pool= /* Pool Variable*/
,bad= /* Bad Flag */
,YearMonth= /* Date or YearMonth */
,wt= /* Weight Variable */
,AUCT_ONLY=0 /* If AUCT_ONLY=1 Then &auct ONLY is Calculated */
/* Otherwise AUC Time Series and &auct are Calculated */
,print=1 /* Print=1 means output into LOG */ );
%global auct set;
/* Clear Output Datasets */;
%if %sysfunc(exist(WORK._somerst))=1 %then %do; proc SQL; drop table _somerst; quit; %end; %if %sysfunc(exist(WORK._somers)) =1 %then %do; proc SQL; drop table _somers; quit; %end;
/* Total */;
proc freq data=&dat noprint; output out=_somerst SMDRC SMDCR; table &pool*&bad/noprint measures; %if "&wt" NE "" %then %do; weight &wt; %end; run;
/* AUC Total*/
proc SQL noprint; select MAX((1+_SMDRC_)/2,1-(1+_SMDRC_)/2) as auct, E_SMDRC/2 as set into :auct trimmed, :set trimmed from _somerst; quit;
%if &print = 1 %then %do; %put AUC_ST: auct=&auct set=&set; %end;
%if &AUCT_ONLY = 0 %then %do;
/* Monthly */;
proc sort data=&dat out=_a(keep=&pool &Bad &YearMonth &wt); by &YearMonth; run;
proc freq data=_a noprint; output out=_somers SMDRC SMDCR; table &pool*&bad/noprint measures; %if "&wt" NE "" %then %do; weight &wt; %end; by &YearMonth; run;
data &res;
retain AUC SE; set _Somers; AUC=(1+_SMDRC_)/2; SE=E_SMDRC/2; format AUC 6.4 SE 8.6;
label AUC="AUC based on Somers' D R|C"; label SE="StdErr of AUC based on Somers'D R|C StdErr";
run;
%end;
%mend AUC_ST;
13
Revised Cumulative Accuracy Profile (RCAP) Macro%macro rcap(data=,y=,score=,tbin=0.05,gbin=0.001,graph=Y,table=Y);
%* data =input data set name (account level data with pool assigned to each account) ***;
%* y =variable for observed target value for an account ***;
%* score =pool variable ***;
%* tbin =bin size for table, 0.05 means 20 bins in total (5% accounts per bin) ***;
%* gbin =bin size for graph, 0.001 means 1000 points in total ***;
%* output=_rcap&data (model cap, perfect_cap and RCAP=model_cap/perfect_cap) ***;
%* output=_g2&data CAP curve for a graph ***;
%* output=_g2p&data CAP curve based on model pools ***;
%let out2=%scan(&data, 2);
%if %length(&out2)=0 %then %let data=%scan( &data, 1 );
%else %let data=&out2;
proc sort data=&LGDoct..&data out=_temp&data; by &score; run;
data _null_; set _temp&data end=eof; if _n_=1 then do; tot_n=0; totosum_y=0; end; tot_n+1; totsum_y+&y;
if eof then do; call symput('tot_n',put(tot_n,12.)); call symput('totsum_y',put(totsum_y, 24.12)); end; run;
proc summary data=_temp&data nway missing; var &y; class &score; output out=_grp&data(drop=_type_) mean(&y)=_grp_&y; run;
data _null_; set _grp&data end=eof; if eof then do; call symput('Parameters',put(_n_,12.)); end; run;
proc sort data=_grp&data out=ForScore; by _grp_&y; run;
data score_grp&data; set ForScore; score=_n_; run;
proc sort data=score_grp&data; by descending score; run;
data _grp2&data (keep=depth prop_y rename=(prop_y=grp_prop_y)); set score_grp&data end=eof; retain cum1 0 depth prop_y;
if _n_ = 1 then do; depth=0; prop_y=0; output _grp2&data; end;
depth+(_freq_/&tot_n); cum1+(_grp_&y*_freq_); prop_y=cum1/&totsum_y; output _grp2&data;
run;
proc sort data=score_grp&data; by &score; run;
data _temp&data; merge _temp&data score_grp&data; by &score; run;
proc sort data=_temp&data; by descending score; run;
data _t&data (keep=depth score bin_score binavg_y bintot binsum_y prop_y) _g&data(keep=obs depth prop_y); set _temp&data end=eof;
retain cum1 0 cumtot 0 max_ks 0 obs binsum_y 0 bintot tsize gsize bin_scoresum cap prev_depth prev_propy;
if _n_=1 then do;
prev_depth=0; prev_propy=0; depth=0; prop_y=0; tsize=ceil(&tbin*&tot_n); gsize=ceil(&gbin*&tot_n); cap=0; obs=0; output _g&data;
end;
depth=_n_/&tot_n; binsum_y+_grp_&y; bintot+1; bin_scoresum+score; cum1+_grp_&y; cumtot+1; prop_y=cum1/&totsum_y;
if mod(_n_,tsize)=0 or depth=1 then do;
binavg_y=binsum_y/bintot; bin_score=bin_scoresum/bintot; output _t&data; binsum_y=0; bintot=0; bin_scoresum=0;
end;
if mod(_n_,gsize)=0 or depth=1 then do;
obs+1; cap+0.5*(prop_y-depth+prev_propy-prev_depth)*(depth-prev_depth); output _g&data; prev_depth=depth; prev_propy=prop_y;
end;
if eof then do; call symput('cap',put(cap,8.6)); end;
run;
14
Revised Cumulative Accuracy Profile (RCAP) Macro (Continued)%if %upcase(&table)=Y %then %do;
proc print data=_t&data; var depth score bin_score binavg_y bintot binsum_y prop_y; format depth percent6. score bin_score binavg_ypercent8.2;
run;
%end;
proc sort data=&LGDOct..&data out=_temp&data; by descending &y; run;
/* Perfect Cap */;
data _p&data(keep=obs depth perfect_y); set _temp&data end=eof; retain cum1 0 cumtot 0 gsize pcap obs prev_depth prev_perfect_y;
if _n_ = 1 then do;
prev_depth=0; prev_perfect_y=0; depth=0; perfect_y=0; gsize=ceil(&gbin*&tot_n); pcap=0; cum1=0; cumtot=0; obs=0; output _p&data;
end;
depth=_n_/&tot_n; cum1+&y; cumtot+1; perfect_y=cum1/&totsum_y;
if mod(_n_, gsize)=0 or depth=1 then do;
obs+1; pcap=pcap+0.5*(perfect_y-depth+prev_perfect_y-prev_depth)*(depth-prev_depth); output _p&data; prev_depth=depth; prev_perfect_y=perfect_y;
end;
if eof then do;
rcap=&cap/pcap; call symput('pcap',put(pcap,8.6)); call symput('rcap',put(rcap,8.6));
end;
run;
data _g2&data; merge _g&data _p&data _grp2&data; by depth; run;
%if %upcase(&graph)=Y %then %do;
title "RCAP Chart for &data"; title2 "Model CAP=[&cap], Account-Level CAP=[&pcap], Pool-Level RCAP=[&rcap]";
title3 "# of Unique Scores=[¶meters]"; Legend1 value=(color=black height=1 "Segment-Level" "Account-Level" "Random" "Pools");
proc gplot data=_g2&data;
symbol1 v=none c=black i=join; symbol2 v=none c=red i=join line=20; symbol3 v=none c=black i=join line=20; symbol4 v=dot c=black i=none ;
plot (prop_y perfect_y depth grp_prop_y)*depth / overlay legend=legend1;
run; quit; title; title2;
%end;
data _rcap&data; model_cap=∩ perfect_cap=&pcap; rcap=&rcap; num_uni_scores=¶meters; run;
%mend rcap;
15