techathon idea paper

21
Reversing Customer Attrition with BigData Engine and Predictive Model Analytics 1 Reversing Customer Attrition with Big data Engine and Predictive Model for Banking and Financial Domain Dillip Kumar Majhi BAO CoC Bigdata Architect Email : [email protected] Date : 12- Nov- 2013

Upload: dillip-kumar

Post on 16-Apr-2017

333 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

1

Reversing Customer Attrition with Big data Engine and Predictive Model

for Banking and Financial Domain

Dillip Kumar Majhi

BAO CoC Bigdata Architect

Email : [email protected]

Date : 12- Nov- 2013

Page 2: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

2

Table of Contents

1.0 Abstract ..........................................................................................................................................................................3

2.0. Introduction ....................................................................................................................................................................3

3.0 Customer Churn..............................................................................................................................................................4

3.1. The customer lifetime value concept ...................................................................................................................4

3.2 Customer Attrition..................................................................................................................................................4

3.3. Potential Cause ......................................................................................................................................................5

3.4 Factors for Analysis...............................................................................................................................................5

4.0 BigData and Churn Analysis ..........................................................................................................................................6

4.1 Why Churn Analysis on BigData.....................................................................................................................................6

5.0 IBM’s Bigdata...................................................................................................................................................................6

6.0 Solution Approach for Churn and NBA.........................................................................................................................7

6.1 Solution Architecture ..........................................................................................................................................8

7.0 Predictive Model for Churn................................................................................................................................................9

7.1 The Tree-Based Method........................................................................................................................................9

7.2 The artificial neural network method..................................................................................................................12

7.3 Fault Measure ......................................................................................................................................................12

7.4 Logistic Regression.............................................................................................................................................14

7.4.1 Logistic Regression Model Build ...............................................................................................................15

7.4.2 Sample Data Input Format ...........................................................................................................................16

7.4.3 Sample Data Output.....................................................................................................................................17

7.5 GLM( Generalized Linear Model) – NBA.............................................................................................................17

7.5.1 GLM Model Build –NBA ..............................................................................................................................18

7.5.2 Sample Data Output.....................................................................................................................................18

8.0 Result ...............................................................................................................................................................................19

9.0 Reference.........................................................................................................................................................................20

10 .0 Abbreviations ................................................................................................................................................................21

Page 3: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

3

1.0 Abstract

Customer value analysis is critical for a good marketing and a customer relationship management strategy. An important component of this strategy is the customer retention rate. Customer retention rate has a strong impact on the customer lifetime value, and understanding the true value of a possible customer churn will help the company in its customer relationship management. The goal of this study is to apply logistic regression techniques to predict a customer churn and analyze the churning and no churning customers.

Conventional statistical methods were very successful in predicting a customer churn so far, However, since the nature of data is very high and very much unstructured in todays’ Social Networking Age where

customer's view and rating play’s a vital role , Hence it is time to analyze the churn and retention data with the help of Bigdata and Predictive Model.

This paper presents a customer churn study and NBA(Next best action) in Banking and Finance Industry.

Biginsights, The IBM Bigdata Technology stack and Logistic Regression Model is being coherently used to arrest this problem. The Results of the Study, and Future directions, are then discussed

2.0. Introduction

The subject of customer retention, loyalty, and churn is receiving attention in many industries. This is important in the customer lifetime value context.

A company will have a sense of how much is really being lost because of the customer churn and the scale of the efforts that would be appropriate for retention campaign. The mass marketing approach cannot succeed in the diversity of consumer business today. Customer value analysis along with customer churn predictions will help marketing programs target more specific groups of customers.

Companies experience what is labeled “churn”, losing customers only to replace them with new Customers.

Companies can often experience churn in excess of 50-90%, yet do little or nothing about it. Such companies maintain volume and revenues by paying the costs necessary to constantly acquire new customers.

Some do not even know whether these new customers are in fact returning customers. This will inevitably be a more costly proposition than retaining the same number of existing customers instead of losing them.

Retaining customers maintains volume and revenues at lower cost, increases customer lifetime value,

and should create loyalty from limited experience and habit Not spending to recruit new customers saves money

Some companies believe they can avoid the planning, effort, and upfront expense of such prospective modeling by waiting until the customer defects, and then studying the defectors. Such a process, based on reviewing events associated with defection and surveying defectors is better than doing nothing, but is plagued by glaring and potentially fatal design flaws.

First, getting information from defectors is likely to be more difficult than getting information from customers. Defectors have already acted on their negative orientation(s) toward your company.

Why should they be willing to provide you with free or low-cost information about their experiences with and evaluations of you and your offerings?

Second, information provided by defectors may be biased, unreliable, or invalid. Their ill will is likely to provide a biased description of their experiences and orientation to your company, products,

and customer service. People generalize and stereotype. If unaffected enough to defect,

Page 4: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

4

many defectors are likely to project their current evaluation or orientation onto many or all other and earlier aspects of their relationship with your company.

Third, the type of initial contact with customers matters, accounting for three additional years of average customer lifetime (more than two times longer than others). When an “inquiry” customer first buys, she is likely to remain a customer 3.98 times longer than a “neighborhood” customer, whereas a “telemarketing” recruit is likely to remain a customer 1.22 times longer than a “neighborhood” customer. This indicates a substantial advantage to signing and keeping customers who initiate inquiry compared to all others.

Unfortunately the company had no such initiative in place. An “inquiry” customer has an expected duration 2.35 times longer than a “telemarketing” customer. This relationship remains after adjusting for the other factors tested.

3.0 Customer Churn

The focus on customer churn is to determinate the customers who are at risk of leaving and if possible on the analysis whether those customers are worth retaining. The churn analysis is highly dependent on the definition of the customer churn. The business sector and customer relationship affects the outcome how

churning customers are detected. Example in credit card business customers can easily start using another credit card, so the only indicator for the previous card company is declining transactions. the company must address the value of a potential loss of a customer. The customer lifetime value analysis will help to face the challenges.

3.1. The customer lifetime value concept

The customer lifetime value is usually defined as the total net income, expenditure Capacity from the customer over his lifetime This type of customer analysis is done under several terms: customer value, customer lifetime value, customer equity, and customer profitability. The underlying idea in LTV concept is simple and measuring the lifetime value is easy after the customer relationship is over. The challenge in this concept is to define and measure the customer lifetime value during, or even before, the active stage of customer relationship.

For example Hoekstra et al. defines a conceptual LTV model as follows :

LTV is the total value of direct contributions and indirect contributions to overhead and profit of an individual customer during the entire customer life cycle, that is from start of the relationship until its projected ending.

Most LTV models stem from the basic equation, although there are also many other LTV models having various application areas. The components of the basic

LTV model are :

• Customer net present value and Transaction over time (revenue and cost).

• Retention

• Rate or length of service (LoS).

• Discount factor.

3.2 Customer Attrition

The customer attrition can be categorized into two types; unavoidable and avoidable attrition.

Page 5: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

5

Unavoidable customer attrition includes, Economic downturn, shifts in consumer preferences or poor business management on the part of customers.

Avoidable customer attrition is related to the relationship between customer and service provider. The banking context for identification of Attrition is given in below.

1. Identification of preferred customers, those who have longer expected durations with, and therefore greater lifetime value for, the company.

2. Modeling risk of defection to identify potential defectors in advance and take customer specific remedial actions that will increase retention.

3. Modeling potential to, strategies for, and relative value of recovering lost customers and keeping them once recovered.

Annual % attrition rate =

(Number of Customers one year ago ---- Number of Customers from same group Today)

Number of Customers in Portfolio one year ago

3.3. Potential Cause

Below is the potential reason of any churning/attrition of customer for banking/Financial Industry

• Customer Service issues and Price

• Over Promising and under Delivering

• SLA of Time delivery and Mismatch

• Poor Customer Service Interaction

• Switched to New Service Provider

• Customer changed mind/opened in error

• No Longer Processing and Account become Dormant

• Rates/Fees are highs

• Emotions – This plays greater role than quality or price in the decision to defect

3.4 Factors for Analysis

The key factor in customer attrition analysis is conversation between customer and service provider. This source of unstructured data i.e. conversations of customer with service provider is a strong leading indicator of future customer behavior.

Followings are few pointers which are important while doing Attrition Analysis

• Is the percentage of attrition trending up or down?

• Who is leaving you? (Include name, address, and SIC and MCC codes.)

• What types of customers are leaving, and from what neighborhoods or regions?

• What was the time between signing and leaving? How long had they been with you?

• Who trained the customers who left?

Page 6: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

6

• What terminals and applications were lost customers using?

• Who was the processor?

• Who is responsible for customer service and help desk for the customers that left?

• How long had each customer who left been in business?

• What was each customer's dollar and transaction volume over the past 12 months, and was there a three- to five-year sales volume trend?

• What was each customer's profit during specified periods, and was the trend in profit up or down?

• What are business life expectancy norms for a given customer's industry?

• What are the norms within geographic areas covered in the portfolio, and how are the norms trending?

• How long does it take for new customers in particular industries and geographic regions, and even at certain volumes, to become profitable?

4.0 BigData and Churn Analysis

In 2009, the McKinsey Global Institute estimated that U.S. banks and capital markets firms collectively had more than 1 exabyte -- or one quintillion bytes -- of stored data. In the subsequent years, that figure surely has grown exponentially, as banks continue to amass massive amounts of data on customers. This entire data means there is a prime opportunity for banks to glean information to improve their customer experience and retention . So far only 30 % of data is being used to provide information and 70% of data is un noticed and there is no such tool or product who can process this kind of data and provide valuable business information. Hence there is demand of New technology who can have MPP Processing logic with capable to take care below problems. NoSQL Database,Text Mining,Machine Learning capability ,Elastic Search ,Taxonomy search are the additional capability plugged with Bigdata.

Bigdata tries to solve the upcoming issues in form of 4V( Volume, Velocity, Variety, Veracity)

1) Volume-High Volume of Data ( Over Peta byte)

2) Velocity- Often time sensitive, Streaming data into enterprise system

3) Variety – Extends beyond Structured Data( RDBMS,DWH etc) including unstructured data like text, audio, video, click streams, log files, email, PDF and many more..

4) Veracity – 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

4.1 Why Churn Analysis on BigData

The huge amount of data available on social media like customer view, opinions which is in unstructured form can be analyzed to identify what customer exactly looks for. The customer sentiment plays important role in unstructured data which is key point to identify whether customer satisfaction.

Mining big data and leveraging analytics play a vital role in customer retention and to enhance the customer service. It is best utilization of big data by harnessing it to improve service, detecting customer attrition behavior and customer experience. Hence when we do churn Predictive analysis over above nature of Data[4.0], it is imperative to use BigData Ecosystem to hold ,Process and support Predicative Analysis and Visualization

5.0 IBM’s Bigdata

IBM has come up with suit of Product for Bigdata to cater various business sector like energy,Telco,Finance ,Media,Retail and many more.. Along with IBM has many accelerator and Use case based solution pertaining to various customer pain areas. I am giving very high level overview of those Prodcuts on Bigdata on this paragraph before going specific use of those few product for this solution.

• Infosphere BigInsights :

Page 7: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

7

This is Core of the Bigdata Product for Storing ,Processing for high volume of data powered with Apache Hadoop , an open source distributed computing Platform. It has various componenet like HDFS,Mapreduce, JAQL, BigIndex, Hbase, Hive and Pig.there are component for extracting data from external source and load into the BigInsights File system. For more details,

visit : https://www.ibm.com/developerworks/community/blogs/ibm-big-data/entry/ibm_big_data_where_do_i_start?lang=en

• Infosphere Streams:

IBM® InfoSphere® Streams is an advanced computing platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. The solution can handle very high data throughput rates, up to millions of events or messages per second. It is real time information system which analyze data in motion ,For More Details,

visit : http://www-03.ibm.com/software/products/us/en/infosphere-streams/

• BigSheets :

This is the extension of mashup paradigm which explore and visualize in specific user defined contexts and shows the result like spread sheet and Graphics. Since the spread sheet is commonly user friendly calculation sheet, hence BigSheets is widely accepted and very much handy for user to work on it.

This is the again a component of Infosphere BigInsights. For more details,

visit: http://www-01.ibm.com/software/ebusiness/jstart/bigsheets/

• SPSS Modeling Tool :

This is IBM Statistics and Modeling tool where you can do analytical model like ( Regression, Predictive Model, Liner Mixed Model etc) to bring business meaning of data and can applied to any industry like health care, Telco ,Banking and Finance , Media etc..

6.0 Solution Approach for Churn and NBA

Below is the IBM’s Bigdata technology selection for conducting this test. I have taken the dataset for retail Banking whose data input format is given in section[ ].

1) Infosphere BigInsights

2) Sqoop

3) HDFS

4) Mapreduce

5) Hbase

6) Pig

7) R

8) Pentaho

Below is the Process flow for this solution

• The Data is fed to BigInsights Hadoop file system by UDF/SQOOP from various Input system show in the Architecture section [6.1 ]. GNIP, Social mention, Real time data can feed to this file system in form of JSON/XML with help of any connector. However, I am loading data from CSV file into the HDFS through SQOOP

• Data is being cleansed by PIG and invalid records are being dropped in this stage

• Look up and major dimensional data stores in Hbase for validation purpose of incoming data

• Final good set up data is maintained in HDFS which pass through for various purpose

• SPSS statistical modeler hits this HDFS File system to process the Data Set for Analytic purpose

• The model we are using LR is built with this set up Data set. Refer section [7.4 ].

• The model is being validated over subset of this data as well ,Refer section [7.4.1 ].

• After Validation of model data , which is close proximate, fed to second Model for NBA, Refer Section [7.5 ]

• The data out put from first model and Second model is being stored in output file of HDFS File system

Page 8: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

8

• MapReduce Process this data based on the user requirement ,Slice and Dice and any reporting purpose

• BigSheets invokes Mapreduce to present this final data for various reporting and spread sheet format

• Business users use BigSheets for view ,create, analyze and bring value of this reports.

6.1 Solution Architecture

Over view of the solution Architecture that combines BigInsights Hadoop Ecosystem with R Predictive model .

(Figure 1)

Architecture of Churn - NBA Solution

ActionUI Presentation Layer

ActionUI Presentation Layer

AcquisitionCollection of Data

AcquisitionCollection of Data

MarshallingProcessing/Storing

Data

MarshallingProcessing/Storing

Data

AnalysisPredictive Modelling

AnalysisPredictive Modelling

XML/CSV

FILE

XML/CSV

FILEUnstructured

data

Unstructured

dataDWHDWHInput SourcesInput Sources

HDFS File

system

HDFS File

systemPIG

Transformation

PIG

Transformation

MapReduceMapReduce

HBASE

1 2 3

SPSSSPSS 4

> User action

--- - Data flow

Legend

1 5

1

Biginsights Hadoop System

5

Page 9: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

9

7.0 Predictive Model for Churn

Followings are some data mining Methods to construct model to estimate Customer Attrition. ‘Fault Measure’ is the annotation technology which is one of independent variable for below models

• Logistic Regression ( Cox )

• The Tree based Classification Method

• More recently, The artificial neural network Method

I am providing high level overview for Tree based Classification and ANNM Method here before going to Logistic Regression in details which is used as the Predictive model for this Solution

7.1 The Tree-Based Method

The tree-based method employs the technique of recursive partitioning of data with respect to the variable of interest. In the case of predicting customer attrition, it identifies subgroups of customers who are relatively homogeneous with respect to the risk of service termination. Unlike the regression-based method, recursive partitioning identifies subject subgroups based on Boolean combinations of variables. A branching, algorithm-like "tree" is created, with the "trunk" (the entire sample) or major branches split into two or more smaller branches based on the value of the single variable that minimizes a measure of within-group heterogeneity. The tree terminates in two or more "nodes," each of which defines a subgroup of relatively similar subjects with respect to the outcome of interest.

Page 10: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

10

The iterative process used by model to split the data into two partitions as Churn and Non-Churn is as follows:

(Figure -2)

Check leaf

Data of

one class?

Iterate Next leaf

Node

End

Y

N

Start

Split Observation

at examined Node to create child nodes

Each Node being

examined

Determine the best

Split

Page 11: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

11

(Figure -3)

The above tree splits the dataset based on one of values of variables, which help us to distinguish between the different classes of interest. The same process is repeated for each variable until we reach to condition that there are no more attributes on which further classification can be done.

The classification tree builds the rule which for response variable.

Model Outputs:

• The decision tree • Predicted response that is whether customer will churn or non churn • Misclassification rate • Estimated importance of each variable which is used to build tree The significant parameters from fitted model are as follows: 1. Sentiment Score 2. Number of Complaints 3. Credit Limit Utilized 4. Transaction Number Decrease 5. Transaction Size Change 6. Customer Age 7. Gender

Model Output

view in Tree

Fashion(

Details

mentioned

below)

Page 12: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

12

7.2 The artificial neural network method

This method automatically identifies patterns in data by a computer procedure. The artificial neural network method distinguishes two types of self-learning processes: supervised and unsupervised learning. Unsupervised learning is used to identify patterns in data, such as clustering customers based on certain criterion variables. In supervised learning, the goal is to predict one or more target variables from one or more input variables. Supervised learning is usually some form of nonlinear regression or discriminate analysis. The artificial neural network method has been increasingly used in business modeling as a powerful tool to examine large data sets with many variables. The neural Network model calculates weights for each covariate. The calculated weights express the effect of each covariate on response variable. The covariates having weights close to 0 are insignificant variables, whereas the covariates with higher weights are most significant. The model can be used to predict the probabilities of churn and non-churn. The significant parameters obtained from fitted Neural Network Model are as follows:

1. Customer Age 2. Customer Bank Age 3. Gender 4. Change in Occupation 5. Income 6. Change in Income 7. Transaction Number Decrease 8. Transaction Size Decrease 9. Credit Limit Utilized 10. Change in Marital Status 11. Number of Complaints 12. Sentiment Score

7.3 Fault Measure

Organizations use natural language to communicate with employees, customers, partners and public, as well as to organize information internally for future reference. Over 80% of useful information is being stored in the form of text. It is not affordable to ignoring the dominant portion of data. Emails, web pages, memos, call center transcripts, survey responses, claims notes, legal cases, patent descriptions, research articles, and incident reports - all hold valuable pieces of knowledge that enable analysts to discover patterns, trends and inconsistency. This knowledge predict outcome of future situation, and makes better business decisions. Obtaining knowledge from unstructured text represents a major technological challenge.

The solution offered by ‘Big Data’ technology for automated knowledge discovery in large volumes of text based on a combination of sentiment analysis and text mining.

Sentiment analysis means, to extract information from annotations of eyewitnesses and parties involved. The ‘Sentiment Score’ computed by assigning positive & negative weights to each annotation forms a part of the fault measure.

Page 13: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

13

‘Fault Measurement Life Cycle’

(Figure -4)

The ‘Fault Measure’, plays important role in predicting probabilities of success of Churn in first predictive model. In first predictive model, ‘fault Measure’ treated as independent variable, so it has an effect on predicted probabilities. It also supports to predict settlement amount in second predictive model via ‘Fault Measure’ and predicted probabilities as independent variables.

The following graph show how we can reach to utmost accuracy of predictive models by using Big Data

Page 14: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

14

(Figure -5)

7.4 Logistic Regression

This is most popular method to build models to predict customer leaving. It regresses the outcome of the variable of interest (such as attrition) on a number of variables that may co-vary with (predict) the outcome variable. It defines the probability of an outcome by the magnitude of the score obtained by adding or subtracting the coefficients assigned to the predicting variables.

Model Description:

The customer data is very large which may include unwanted variables, i.e. variables which are not responsible for customer churn. So in order to find most significant variables we fitted stepwise regression model. The survival model, Cox Proportional Hazard model is fitted for churn prediction.

This is survival model that incorporate the regression component, because regression model can be used to examine the effect of predictors on event time. It relates the time that passes before some event occurs to one or more covariates that may be associates with response.

The Sentiment Score which is nothing but customer sentiments about service, products etc. add value in predicting survival probabilities and other factors. As most of customers report their complaints, dissatisfaction before leaving service via mails, phone calls which is source of unstructured data.

The Model used for Churn Analytics is as below.

logit(Y) = ln(π/1-π)=ά+B1x1+B2X2……..

Π(probability)=(Y |X1)=x1.X1+x2.X2

=e ά+ᵝ1x1 + ᵝ2x2/(1+ e ά+ᵝ1x1 + ᵝ2x2)

Here X1,X2,X3 are Independent Variable like Transaction amount, Transaction Decrease etc, The details Variables are defined in the below table 1

Π = Probability

Alpha(ά) is the Y intercept

Beta s (ᵝ) are regression Coefficient

Page 15: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

15

Predictive Model for Churn Probability

Predictive Model Logistic Regression Model

Data Type Churn Data for Customer Complaint and Transaction

Family Binomial

Link Function Logit

Response Variable ‘Churn’s Status

Independent Variable 1) Customer Age 2) Customer Bank Age 3) Gender 4) Income 5) Change in Income 6) Change in Marital Status 7) Transaction Number Decrease 8) Transaction Size Change 9) Credit Limit Utilization 10) Number of Complaints

Additional Variable Sentiment Score from ‘Fault Measure’

Predictive Output Probability of Success or Failure

( Table 1)

7.4.1 Logistic Regression Model Build

( TextArea -1 )

fit_CASA<- glm(CASA_EVENT~

AGE_CATG+RES_ST+INCOME_CATG+OCCUP_CATG+ACT_COND+CASA_AGE_CATG

+GENDER_CATG+MAR_ST_CHNG+CASA_TRAN_NO_DEC_CATG+

CASA_AMT_DEC_CATG+CASA_SENT_CATG,data

=Model_Input_Data, family=binomial("logit"))

coefficient_CASA<- (fit_CASA$coefficient)

coefficient_CASA<- data.frame(coefficient_CASA)

summary(fit_CASA)

step_CASA<-step(fit_CASA,direction="backward")

coefficient_step_CASA<- data.frame(step_CASA$coefficients)

variables_CASA <- rownames(coefficient_CASA)

coeff_CASA <- data.frame(variables_CASA)

coeff_CASA$coefficient<- NA

coefficient_CASA$variables<- variables_CASA

variables_CASA<- rownames(coefficient_step_CASA)

coefficient_step_CASA$varaibles<- variables_CASA

threshold1=0.5

esti.res1=NULL

esti.res1[which(pp.fitted_CASA>=threshold1)]=1

esti.res1[which(pp.fitted_CASA<threshold1)]=0

PP_Estimates_CASA<-

data.frame(CASA_EVENT=Model_Input_Data$CASA_EVENT,EST_EVENT_CASA=esti.res

1, EST_PROB_CASA=pp.fitted_CASA)

Page 16: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

16

7.4.2 Sample Data Input Format

Attribute Name Description

Cust_Name Name of Customer

Cust_id Customer Id starting from 5001

Gender Gender of customer (F-Female, M-Male)

Cust_Age Customer current age

Maritial_Status Customer current marital status

Product1 Checking Account

Product2 Credit Card

Last_Trans_Amt Last Transaction amount in $

Curr_Trans_Amt Current transaction amount in $

Tran_Amt_decr The decrease in transaction amount as compared to last transaction amount

(Last_Trans_Amt - Curr_Trans_Amt)

Credit_Limit Credit limit for customer in $

Credit_Limit_Utilized It is the number of transactions instances with credit limit exploited.

Avg_Mnth_Bal_Amt Monthly average balance in customer account in $

Avg_Mnth_Purch_Amt Monthly average purchase amount in $

No_Complaints_CC Number of complaints for credit card

No_Complaints_CA Number of complaints for checking account

Total_No_Complaints Total number of complaints submitted by customer

Sent_Score_CA Sentiment score of CA complaint

Sent_Score_CC Sentiment score of CC complaint

Total_Sent_Score Total sentiment score =( Sent_Score_CA+ Sent_Score_CC)

( Table -2 )

Page 17: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

17

7.4.3 Sample Data Output

Cust ID Customer Name

No of Complaints

Trans Amt Decrease

Sentiment Score

Churn Probability

RAG

5003 Alice Levine

1 $224 0 34 %

5005 Jill Ayears 3 $3050 1 95.56%

5006 Daisy leavy 6 $8900 1 99.79%

5009 Sergio Cole 4 $1115 0 67.23%

(Table 3)

7.5 GLM( Generalized Linear Model) – NBA

We have defined our LR model for Churn Probability. Now I am using GLM for identification of Best offering as per Business rules for churn out or near to churn out customer. There will be various offering from business side ( Service, Product, Consulting etc) to various classification of Customer need. The details of GLM is given as per Below. However, Logistic Regression can be implemented based on the Key factors need for this offering selection.

Predictive Model for Churn Probability

Predictive Model Generalized Linear Model

Data Type Churn Data for Customer Complaint and Transaction

Family Gamma

Link Function Identity

Response Variable ‘Business Offering’

Independent Variable 1) Customer Age 2) Income 3) Change in Income 4) Change in Marital Status 5) Number of valid Complaints 6) Product Offering Catalogue 7) CB Score 8) Product Propensity( Intermediate Calculation)

Additional Variable ‘Fault Measure’ and Churn Probability

Predictive Output Product/Service Offering

( Table -4 )

Page 18: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

18

7.5.1 GLM Model Build –NBA

( TextArea -2 )

7.5.2 Sample Data Output

Cust ID Customer Name

CB Score Product Propensity

Sentiment Score

Product Offering

5003 Alice Levine 1 LOW 0 Do not

offer

5005 Jill Ayears 3 HIGH 1 Gold

Credit

Card

5006 Daisy leavy 6 HIGH 1 Gold

Credit

Card

5009 Sergio Cole 4 HIGH 0 Standard

Credit

Card

( Table -5 )

churn$CHURN_LIKELIHOOD[offer[which(churn$LIKELIHOOD[offer]>0.5)]] =

"HIGH"

churn$CHURN_LIKELIHOOD[offer[which(churn$LIKELIHOOD[offer]<=0.5)]] =

"LOW"

churn$ATTRITION_RATE<- NA

churn$ATTRITION_RATE[attrition[which(churn$LIKELIHOOD[attrition]>0.5)]

] = "HIGH"

churn$ATTRITION_RATE[attrition[which(churn$LIKELIHOOD[attrition]<=0.5)

]] = "LOW"

churn$PROPENSITY<- PROPENSITY

churn$CLTV<- NA

Page 19: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

19

8.0 Result

Below (Figure -6) is the Dashboard of Churn PoC/PoV being developed to prototype the Concept.

Here the customer who are already churned out are flagged with ‘Red’ Color with ‘Likelihood of churn’ .

Below (Figure -7) shows the Churn Probability with Various contribution factor for churning .

(Figure -8) shows the NBA offering based on the Analysis

(Figure -6)

Figure -7

Shows Churn

Probability

with

progression

of Time

frame.

(99.91 %)

Indicates

the

Customer is

Churning

Page 20: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

20

( Figure -7)

(Figure -8)

9.0 Reference:

� Customer Churn Prediction and Prevention

http://www.optimove.net/churn-prediction-prevention.aspx

� On Customer Attrition By Vladimir Stojanovski

http://it.toolbox.com/blogs/crm-realms/on-customer-attrition-29522

� What is Customer Attrition

http://www.wisegeek.com/what-is-customer-attrition.htm

� With Big Data Comes Great Responsibility

http://www.banktech.com/business-intelligence/with-big-data-comes-great-

responsibility/232600252

� Allison, Paul D. 1984 Event History Analysis Regression for Longitudinal Event Data

Sage

University Papers Series on Quantitative Applications in the Social Sciences 07-046 Beverly

Hills and London: Sage Publications

� Campbell, Donald t. and Julian Stanley 1963 Experimental and Quasi-Experimental

Designs for ResearchBoston: Houghton Mifflin Company

� Przeworski, Adam and Henry Teune The Logic of Comparative Social Inquiry1970 New

York: Wiley-Interscience

� Findings from the Field – Emily McRae, Research Analyst, The Olinger Group

Offering

based on NBA Modeling

Page 21: Techathon Idea Paper

Reversing Customer Attrition with BigData Engine and Predictive Model Analytics

21

10 .0 Abbreviations

NBA -Next Best Action

LR - Linear Regression

GLM - Generalized Linear Model

ANNM- Artificial Neural Network Method

LTV - LifeTime Value

HDFS - Hadoop Distributed File System