predictive cross-sell and up-sell analytics in mutual fund ... · pdf filebusiness challenge...

Predictive Cross-Sell and Up-SellAnalytics in Mutual Fund Industry

Business ChallengeCross-sell and up-sell strategies are a common practice observed across industries for intro-ducing customers to better or complimentary products. In mutual fund industry, cross-sell & up-sell is essentially the expansion of the portfolio of an investor through additional new or complementary products and services. In an industry where distributors are interacting with financial advisors regularly, cross-sell and up-sell strategies are effective ways of strengthen-ing the distribution channel through information trickle down effects. Machine-driven algorith-mic decision making platforms for recommending cross-sell and up-sell productsdeveloped on the information such as transaction behavior, demographic, and scheme features, for providing the next best product recommendation to an investor can bolster (and replace in the long-run) the existing human-driven decision making processes.

With a six-fold growth in assets of the mutual fund industry in the recent years, the Asset Man-agement Companies (AMCs) in India face an increasing amount of competitive pressure to their share of wallet. However, the increasing challenges arise not only in the area of acquiring new investors, but also in maintaining the vintage investors. The traditional penetration channels such as IFAs(Independent Financial Advisors) play a crucial role in maintaining the existing investor relationships as they interact with the investors on a regular basis. Forward guidance through predictive and association learning analytics for recommending the next best schemes/plans to the channel agents will increase the retention of the existing investors. In the contemporary system, due to the existence of extreme information asymmetry, the IFAs and other channels fail in identifying the precise scheme that could be sold to an existing investor.

Product IntroductionRecommendation analytics utilizes an organization’s data by applying sophisticated algorith-mic techniques to mine patterns and insights from the raw data for cross-sell and up-sell strat-egies. These recommendations developed leveraging big data analytics platforms are inves-tor-specific in nature. In the mutual fund industry, recommendation analytics can be applied to financial transaction data, demographic factors, and scheme level features, to analyze the areas such as the growth in the net assets under management and customer acquisition &re-tention, through efficient analytics platforms for prediction and optimization problems.

Karvy’s in-house analytics team recently deployed a cross-sell and up-sell analytics platform-where scientific pairing of mutual funds is carried out by predicting the nearest scheme to another for a particular investor, with the help of algorithms to identify the likelihood of a fund to be bought with another fund. The predictive recommendation model was based on factors such as the financial transaction behavior of an investor, his demographic factors, and the scheme-level features.

The purpose of this white paper is to illustrate a few select capabilities of Karvy’s analytics domain in assisting the AMCs with respect to their investor behavior. Specifically designed for the asset management side; RecommendationGUIDETM is a product that is engineered with advanced machine learning techniques to intelligently provide insights to the mutual fund distributors about their investors’ purchase behavior patterns and to retain their investments. The solution process includes multi-dimensional client clustering, identifying investor specific scheme purchase behavior, and periodic consolidated analysis reports for forward guidance.

Solution Design: Predicting the Next Best ProductThe process started by analyzing descriptively the investor transactions for the period 2013 – 2015, for different segments of the investors. A cluster analysis was initially carried out where acluster chart is created, as illustrated in figure 1, to graphically analyze the clustering patterns. Clustering analysis identifies mathematically the closest observations or a group of observa-tions. In this example, it identifies the closest investors on the basis of investor transaction behavior, demographic factors, and scheme and plan-level features. The three clusters repre-sent three groups of investors on the basis of their behavioral factors. A cluster chart is a common data visualization technique used to represent similarities between the study objects by investigating the distance between these objects empirically.

Figure 1: Cluster chart visualizes the mathematical relationship between different funds

The next step was to analyze these observations for each cluster separately. The association between the data observations in each cluster is measured through inverse lift using Apriori algorithm. Inverse lift is a numerical representation of the probabilistic likelihood of purchasing Scheme N, when Scheme A is already held by an investor. For example in Table 1, inverse lift is a measure of investors who hold Scheme A are to purchase Scheme B. An Inverse lift of 2.001 compared to an Inverse lift of 2.004 means an investor holding Scheme A would prefer the Scheme N with the Inverse lift of 2.004 with respect to a Scheme N with Inverse lift of 2.001.

Scheme A Scheme N (potential paring) Inverse lift

1881 1686 2.004

1881 3499 2.003

1881 3397 2.002

1881 1657 2.001

Table 1: Inverse lift values for fund combination and pairing

Methodology Framework

Pre-Processing:

In association rule learning, lift is a measure of the performance of an observation’s occurrence with another observation, or the likelihood of an event happening with another event. However, in case of high Sparsity in data, inverse lift is used for the same purpose. Inverse lift essentially represents the same as a lift.

The heterogeneous investor data is collected from the database and was pre-processed to remove irregularities. The raw data that is generated in the databases is time-series in cross-sectional format, which is not very valuable as these data points cannot be applied directly for developing predictive models. These raw measurements are not useable for data modeling, and the issues with the raw data are as follows: - Disturbances due to noises- Missing observations in the data points- Different scales of the data- Extreme outliers- Imbalanced distribution of data- Sparsity in the data

These irregularities in the data were removed through preprocessing.

Clustering Analysis:Unsupervised learning techniques, such as associations rule learning and clustering algo-rithms, make no assumptions about a target data. Alternatively, they allow the data mining algorithm to find associations and clusters in the data independent of any a priori defined objective. For finding natural groupings within the data, clustering analysis on the entire data is carried out. Investors of a cluster are more like each other than they are like members of a different cluster. In our current investor data, three major clusters were identified on the basis of their behavioral factors.

Association Rule Mining: Association rule algorithms were applied to find the frequently co-occurring items, for identify-ing the cross-sell and up-sell products for various investors. In our data, once the clustering is carried out, association mining is carried out each cluster. The application of association mining algorithm Apriori provides in finding the closest schemes. An example of an actual output of first nine rows generated for a few select schemes is as illustrated below:

Algorithm OverviewCluster is a collection of data objects similar to one another within the same cluster and dissim-ilar to the objects in other clusters. It is basically an unsupervised learning technique. Various approaches are available for the cluster analysis as follows:Partitioning algorithms: If a database has ‘n’ objects. Then the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k ≤ n. It means that it will classify the data into k groups. Examples: k-means, k-medoids, k-prototypes, etc.

Hierarchical Methods: This method creates ahierarchical decomposition of the given set of data objects. We canclassify hierarchicalmethods on the basis of how the hierarchical decomposition is formed. There are two approaches for hierarchical methods: Agglomerative-Approach and DivisiveApproach. Examples: Diana, Agnes.

Density-based Method: This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighborhood exceeds some threshold, i.e., for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points.Examples: DBSCAN, optics.

Grid-based Method: In this method, the objects together form a grid. The object space is quantized into finite number of cells that form a grid structure. The major advantage of this method is fast processing time.Examples: Sting, wave cluster, etc.

Constraint-based Method: In this method, the clustering is performed by the incorporation of user or application-oriented constraints. A constraint refers to the user expectation or the properties of desired clustering results.

Many data mining applications require partitioning of data into homogeneous clusters from which interesting groups may be discovered. The proposed clustering model is an efficient model for partitioning of large heterogeneous data set into homogeneous groups or clusters, with effective interpretation of clusters.data sets in the range of a hundred thousand records, described by some 17 numeric and categorical attributes.

In association rule learning, lift is a measure of the performance of an observation’s occurrence with another observation, or the likelihood of an event happening with another event.However, in case of high Sparsity in data, inverse lift is used for the same purpose. Inverse lift essentially represents the same as a lift.

Categorical variables are those variables that fall into a particular category. Agent type, city, state are all catego-ries that could have lists of categorical levels. As they are non-numeric in nature, several methods applied for numerical data fail in analyzing such variables.

Recommender Design:The main objective of the recommender model is to draw the most frequent patterns out of the data. Data mining is the process of discovering interesting patterns from large amounts of data. As our transactional database consists of various influential variables such as Transactional amount, Scheme level data, segment/plan which exhibitextremely prevailing patterns that need to be identified, in the proposed model the complex problem of discovering patterns from huge database of heterogeneous data is solved using association rule based - Apriori algo-rithm. Apriori also has excellent scale-up properties. The model scales up linearly with the number of transactions.

The recommender model essentially extracts the frequent patterns that occur together by generating some association rules. It calculates the rules that express probabilistic relation-ships between items in frequent item-sets. For example a rule derived from frequent item-sets containing scheme A, segment B and scheme C might state that if scheme A and segment B are included in a transaction then scheme C is likely to also be included. In order to understand the algorithm it is important to understand some important terms:

1. Support

2. Confidence

Let’s say X and Y are 2 items, and assume if X occurs, Y occurs with certain probability. Then:

1. Support: The rule holds with support (sup) in T (the transaction data set) if sup% of trans-actions contain XUY.

sup = Pr(X U Y) = count(X U Y) / total transaction count

conf = Pr(Y | X) = count(X U Y) / count(X)

(conf), if 2.Confidence: The rule holds in T with confidence ,if conf % of transactions that hold X also contain Y.

The model will eliminate any rules below the support and confidence.

Delivery FrameworkKarvy can act as a service provider and a reliable man power provider, running staff augmen-tation and also managed services for key clients. As a high-end database management pro-vider for the Asset Management Companies, we handle a major chunk of the transaction data of these clients. By leveraging on its resource strength in analytics and data management services, Karvy can provide its clients with staffing, technical infrastructure, andin-house end-to-end service, across the delivery.

Authors: Sarang Venukala is a senior data scientist with Karvy Analytics, where he combines his expe-rience in machine learning techniques, macroeconomic modeling, econometrics, and finan-cial analytics to design and develop predictive and prescriptive analytics solutions predomi-nantly for the BFSI domain. He holds a Master’s of Science in Econometrics.

Technical Team:

Sarang Venukala Sr. Data Scientist, Karvy Analytics

Lakshmi Rekha K Data Scientist, Karvy Analytics

Chandra Sekhar RData Scientist, Karvy Analytics

Nanda Kishore BSr. Business Analyst, Karvy ComputerShare

Maintenance

Model validation and

continuous improvement of predictive

models

Project Initiation:

AMC’s Consent for Data Access

Delivery Model

Deployment Support

Karvy’s infrastructure supports in-

house deployment of

models

Quarterly and monthly

reports on the existing

investors

Technical support from

Karvy’s big data analytics team

Internal Database

Maintained by Karvy for the

AMCs

Karvy’s in-house teams support in delivering the model platform, the technical infrastructure, database management, and the maintenance

REGISTERED ADDRESS"Karvy House", 46 Avenue 4, Street No. 1, Banjara Hills, Hyderabad 500 034, India. Tel No :( +91-40) 23312454, 23320751 Fax No :( +91-40) 23311968 E-mail: [email protected]: www.karvyanalytics.com

US OFFICE115 Broadway, Suite 1506 New York, NY 10006 Tel: 212 267 4334 Fax: 212 267 4335

predictive cross-sell and up-sell analytics in mutual fund ... · pdf filebusiness challenge...

Documents