ofsaa doc template...following are the two hyper parameters associated with woe classifier: 1....

12
Oracle Financial Services AML Event Scoring WOE Logistic Regression Reference Guide Release 8.0.7.0.0 April 2019 F12027-02

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Oracle Financial Services AML Event Scoring WOE Logistic Regression

Reference Guide

Release 8.0.7.0.0

April 2019

F12027-02

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 2

Copyright Β© 2019 Oracle and/or its affiliates. All rights reserved.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are β€œcommercial computer software” pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.

For information on third party licenses, click here.

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 3

Document Control

Version

Number Revision Date Changes Done

1.0 Created April 2019 Created the document.

Created by:

Brijesh

Reviewed by:

Anil/ Nagesh

Approved by:

Viru/Shandar

PREFACE

ABOUT THIS GUIDE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 4

Table of Contents

1 Preface .......................................................................................................................... 5

1.1 About this Guide ................................................................................................................................. 5

1.2 Audience ............................................................................................................................................ 5

1.3 Related Documents ............................................................................................................................ 5

1.4 Acronyms used in this guide .............................................................................................................. 5

2 WOE based Logistic Regression ................................................................................ 7

2.1 Binning ............................................................................................................................................... 7

2.2 Checking Multicollinearity ................................................................................................................... 7

2.3 Interpreting WOE................................................................................................................................ 8

PREFACE

ABOUT THIS GUIDE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 5

1 Preface

Oracle Financial Services (OFS) Anti Money Laundering Event Scoring (AMLES) application scores alerts that are generated from Anti Money Laundering (AML). WOE based Logistic Regression involves formulas and methods used in OFS AMLES.

1.1 About this Guide

This document explains concepts about the statistical methods and formulas used in WOE based Logistic Regression. This is a reference guide and is a supplement to existing user documents for OFS AMLES.

1.2 Audience

This document is for advanced users of OFS AMLES who are interested in understanding and analyzing the details behind WOE based Logistic Regression.

1.3 Related Documents

This section provides a list of additional documents related to OFS AMLES Application Pack. You can access the Oracle documentation online library for AMLES from the Oracle Help Center (OHC).

Additionally, you may see the following documents for OFS AAI related information on the OHC:

Oracle Financial Services Advanced Analytical Applications Infrastructure (OFS AAAI) Application Pack 8.0.7.0.0 Installation and Configuration Guide

Oracle Financial Services Analytical Applications Infrastructure User Guide

Oracle Financial Services Analytical Applications Infrastructure Administration Guide

Oracle Financial Services Analytical Applications (OFSAA) Licensing Information

Oracle Financial Services Analytical Applications (OFSAA) Generic Documents

To find additional information about how Oracle Financial Services solves real business problems, see our Web site at www.oracle.com/financialservices.

1.4 Acronyms used in this guide

Table 1 - Acronyms

Acronym Description

OFS Oracle Financial Services

OFS AMLES Oracle Financial Services Anti Money Laundering Event Scoring

OHC Oracle Help Center

OFS AAAI Oracle Financial Services Advanced Analytical Applications Infrastructure

OFSAA Oracle Financial Services Analytical Applications

AML Anti Money Laundering

PREFACE

ACRONYMS USED IN THIS GUIDE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 6

Acronym Description

WOE Weight of Evidence

WOE BASED LOGISTIC REGRESSION

BINNING

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 7

2 WOE based Logistic Regression

This guide explains the hyper parameters associated with WOE based Logistic Regression. The following are the two hyper parameters associated with WOE classifier:

1. binning

2. collinearity (defaults to bad, and you cannot change this hyper parameter).

The following is the format with which you can add the WOE classifier using the orecv API. The following also displays an example of the accepted values:

cls <- ORECVwoelr( binning = c("interval(25)","quantile(25)","auto(25)") );

The subsections in this topic discuss about binning and collinearity in detail.

2.1 Binning

This parameter controls the binning of the numeric (continuous or discrete) variables within the raw dataset and the conversion to factor variables. The following list provides details for the binning parameter:

interval - Equal width binning based on the number of bins (default is 20) specified.

quantile - Equal frequency binning based on the number of bins (default is 20) specified or less number of bins (if there are not enough data points).

auto -

a. If the specified number of bins (default is 20) is greater than the number of unique levels of input feature, then the number of bins is equal to the number of unique levels. Else the pre-specified number of bins is considered.

b. It attempts to split the data using the quantile method. If there are not enough data points to split the data into the number of bins determined in the previous step, then it uses the interval method.

2.2 Checking Multicollinearity

The following steps explain how the multicollinearity and insignificant variables are handled while fitting WOE based logistic regression:

Apply the condition for the loop: Loop (exit when End Condition = True)

1. Fit Logistic Regression model with all remaining variables and obtain the predicted p value (that is, the score).

2. Fit Weighted Linear Regression (use p*(1-p) as weight) and compute VIF, Colin, and DW statistics.

VIF+Colin+DW (remove only bad multicollinearity)

Table 2 - Collinearity Diagnostics

WOE BASED LOGISTIC REGRESSION

INTERPRETING WOE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 8

No. VIF Eigen

Value

Cond

Idx

Prop

interc

ept

Prop

v1

Prop

v2

Pro

p

v3

Pro

p

v4

Prop

v5

Prop

v6

1 758 0.86 1 0.000 0.000 0.0001 0.00 0.1107

0.0001

0.0000

2 399 0.082 9 0.000 0.000 0.16 0.8 0.1 0.0001

0.0000

3 3 0.045 12 0.000 0.000 0.0001 0.00 0.000

0.34 0.4754

4 33 0.01 25 0.000 0.000 0.0001 0.1 0.000

0.0001

0.01

5 1999 0.000 230 0.000 0.1 0.0001 0.1 0.1 0.0001

0.01

6 135 0.000 1048 0.0001 0.1 0.8305 0.00 0.1 0.0001

0.5046

7 0.000 43275 0.9999 0.9998 0.0001 0.00 0.5893

0.6505

0.000

Identify (green) high VIF variables (VIF β‰₯ 10).

Identify (yellow) potential degrading dependency situations (Condition Index β‰₯ 30).

Identify (red) high variance proportions (prop β‰₯ 0.5).

Look at each row, the marked variables in each row have degrading multicollinearity. Remove one variable from the first degrading multicollinearity group by likelihood ratio test (that is, TYPE3 test) and go to step 1.

In the example shown in the previous table, there are two degrading multicollinearity groups, v2&v6; intercept&var1&var4&var5;.

3. Remove insignificant variables.

After running the previous steps 1 and 2 to address multicollinearity, look at Analysis of Maximum Likelihood Estimates. Remove one insignificant variable identified by TYPE3 (Likelihood Ratio) test and go to step 1.

Set End_Condition=True, when all variables are significant at preset significant levels (that is, Ξ±=0.05).

4. End Loop.

2.3 Interpreting WOE

Naive Bayes model can be written as ln𝑃(π‘Œ=1|𝑋1, 𝑋2, …𝑋𝑝)

𝑃(π‘Œ=0|𝑋1, 𝑋2, …𝑋𝑝) = ln

𝑃(π‘Œ=1)

𝑃(π‘Œ=0) + βˆ‘ ln

𝑓(𝑋𝑖|π‘Œ=1)

𝑓(𝑋𝑖|π‘Œ=0)

𝑝𝑖=1

Where,

ln𝑃(π‘Œ=1|𝑋1, 𝑋2, …𝑋𝑝)

𝑃(π‘Œ=0|𝑋1, 𝑋2, …𝑋𝑝) is the conditional log odds, or the logit from the logistic regression model.

ln𝑃(π‘Œ=1)

𝑃(π‘Œ=0) is the overall log-odds.

WOE BASED LOGISTIC REGRESSION

INTERPRETING WOE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 9

βˆ‘ ln𝑓(𝑋𝑖|π‘Œ=1)

𝑓(𝑋𝑖|π‘Œ=0)

𝑝𝑖=1 is the summation of probability densities functions (f(Xi|Y=y)=WOE(Xi)) for each

Xi.

WOE = ln𝑓(𝑋𝑖|π‘Œ=1)

𝑓(𝑋𝑖|π‘Œ=0)= ln

𝑃(π‘Œ=1|𝑋𝑖)

𝑃(π‘Œ=0|𝑋𝑖) ln

𝑃(π‘Œ=1)

𝑃(π‘Œ=0)

1. WOE should be interpreted as a log-odds ratio, for each Xi, after controlling for all other predictors.

WOE>0 - odds of event (bad) at the j level of Xi exceed the overall odds by a factor of exp(WOE).

WOE=0 - odds of event (bad) at the j level of Xi is at the overall odds.

WOE<0 - odds of event (bad) at the j level of Xi is below overall odds by a factor of exp(WOE).

2. IV measures the strength of the correlation.

𝐼𝑉 = ∫ (𝑓(𝑋i|π‘Œ = 1) βˆ’ 𝑓(Xi|π‘Œ = 0)∞

βˆ’βˆž ln

𝑓(𝑋|π‘Œ=1)

𝑓(𝑋|π‘Œ=0)

The preceding formula provides a universal measure of strength for both categorical and continuous predictors.

3. Start with Bayes’ rule, derive a generalized additive model using logit transform, and the following is derived:

ln𝑃(π‘Œ=1|𝑋1, 𝑋2, …𝑋𝑝)

𝑃(π‘Œ=0|𝑋1, 𝑋2, …𝑋𝑝) ln

𝑃(π‘Œ=1)

𝑃(π‘Œ=0) βˆ‘ 𝛽𝑖 βˆ— ln

𝑓(𝑋𝑖|π‘Œ=1)

𝑓(𝑋𝑖|π‘Œ=0)

𝑝𝑖=1

The preceding result is the additive logistic regression model (WOE being the non-parametric function on predictors). The addition of the scaler Ξ²i to NB model partly relaxed the NB’s assumption that all predictors in the model are independent.

WOE = ln(event distribution/non-event distribution)

Where,

event distribution = total event in bin/total event

non-event distribution = total non-event in bin/total non-event

IV = ( event distribution – non-event distribution)*WOE

Total IV for a variable = Sum(IV) over all bins of a variable =

n

iiIV

The following formula is used for ln(0) or division by 0 special situation:

1. Find out the average number of good obs. needed to contain one bad obs. in the population.

x = trunc (total_good_obs/total_bad_obs)

2. For var(i) bin(j),

If(total_obs < x) and (total_bad = 0), then

Woe_var(i)_bin(j) = 0

Iv_var(i)_bin(j) = 0

Else if (total_obs β‰₯ x) and (total_bad = 0), then

Total_bad = 1

Woe_var(i)_bin(j) = standard formula

WOE BASED LOGISTIC REGRESSION

INTERPRETING WOE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 10

Iv_var(i)_bin(j) = standard formula

Else if (total_good = 0) then

Woe_var(i)_bin(j) = max(woe_var(i))

Iv_var(i)_bin(j) = max(iv_var(i))

Else

Woe_var(i)_bin(j) = standard formula

Iv_var(i)_bin(j) = standard formula

WOE BASED LOGISTIC REGRESSION | INTERPRETING WOE

ORACLE FINANCIAL SERVICES AML EVENT SCORING WOE LOGISTIC REGRESSION | 11

Send Us Your Comments

Oracle welcomes your comments and suggestions on the quality and usefulness of this publication. Your input is an important part of the information used for revision.

Did you find any errors?

Is the information clearly presented?

Do you need more information? If so, where?

Are the examples correct? Do you need more examples?

What features did you like most about this manual?

If you find any errors or have any other suggestions for improvement, indicate the title and part number of the documentation along with the chapter/section/page number (if available) and contact the Oracle Support.

Before sending us your comments, you might like to ensure that you have the latest version of the document wherein any of your concerns have already been addressed. You can access My Oracle Support site which has all the revised/recently released documents.