modeling with sap infinite insight - wordpress.com · value model that was discussed in my previous...
TRANSCRIPT
Modeling with SAP Infinite Insight Since SAP’s acquisition of KXEN, announced in September 2013, many SAP Predictive Analysis users have
been eager to preview the features of KXEN Infinite Insight that will eventually be incorporated into SAP
Predictive Analysis. In this blog, we’ll step through the same analysis of building a Customer Lifetime
Value model that was discussed in my previous Customer Lifetime Value case study blog.
Configuring the Analysis Infinite Insight has a much sparser interface, with relatively few options available. Unlike SAP Predictive
Analysis, there is no algorithm selection, apart from describing the type of model that the user wants to
create (Classification/Regression, Clustering, Time Series, or Association Rules). These line up with the
algorithms that are most predominant in the current Predictive Analysis tool. In this case, we will select
“Create a Classification/Regression Model”.
We now browse to and select the dataset that the model will be built on.
Infinite Insight automatically detects the data types found in the file and adds its own key.
The algorithm set up screen allows the user to set the target (dependent) and explanatory
(independent) variables. By default, Infinite Insight will attempt to use all variables as explanatory
factors in the model, but the user can designate specific variables to exclude. Variables that have no
relevance to the target (like keys) and variables that are dependent on or determined by the target
variable should be excluded from the model.
Infinite Insight displays a summary of the analysis to be completed. Click “Run” to execute the analysis.
Evaluating the Predictive Model Unlike other tools, in which the user has to select the proper algorithm based on whether it predicts a
binary/categorical prediction (classification) or a numeric prediction (regression) and then further select
a methodology and variables to include, Infinite Insight automatically detects the model type and
algorithm used for best fit based on the target variable selected by the user. Another feature of Infinite
Insight is that it automatically decides whether input should be continuous or categorical and
determines the most appropriate binning for variables. This reduces a lot of the data preparation and
model testing that is usually done manually when building a predictive model. Infinite Insight also
automatically creates Training and Validation datasets for model evaluation.
Once the analysis has been run, Infinite Insight returns a set of standard reports, with information on
the directional impact of each variable level, as shown in the charts below.
Infinite Insight has automatically binned the Age variable, and this chart shows that the older ages have
positive impact on the target (customer value), while lower ages have negative impact. This helps
modelers and decision makers better understand their data and the model predictions.
Another valuable chart pictured below shows the predictive contribution of each variable in the model,
which shows that Age is the most influential factor, followed by Income and whether they purchased
ProductX.
Finally, perhaps the most important chart is pictured below—the actual vs. predicted customer lifetime
value. The green line is a “perfect” model that exactly predicts customer lifetime value, where the blue
line is the model prediction, which in this case is quite close to the actual values. The gray shaded area
represents variance in the data, where the gray area is the expected range for most observations.
Automatically generating model description and evaluation charts like the above examples is a huge
benefit for modelers and business analysts. These charts would otherwise have to be constructed
manually.
Exporting the Scoring Function My very favorite feature of Infinite Insight is the model scoring source code export. By clicking on
“Generate Source Code”, I can generate code to create a scoring function in many different languages.
As shown in the screen shot below, Infinite Insight can generate VB, C, C++, SQL, HTML Javascript, or
Java Code that takes in the input predictors and return the model result.
Generating DB2 SQL code for this CLV model yields a SQL function that can be called to calculate the
Customer Lifetime Value for any existing or new record. This allows the scoring calculation to be housed
within the database and return with minimal response time.
With the variety of options available in Infinite Insight, a user can export the scoring function in a format
that could easily integrate with most databases and applications, minimizing the development effort to
deploy models and facilitating real-time scoring.
The long term vision SAP has shared is to bring the unique KXEN functionality discussed in this article
into the core SAP Predictive Analysis product. This will speed development of simple models and allow
users with less statistical knowledge to create models. Combining these features with the flexibility of
SAP Predictive Analysis’s custom R components and the power of SAP HANA will truly make SAP
Predictive Analysis a powerful tool for all levels of predictive users.
Hillary Bliss, Analytics Practice Lead
Decision First Technologies
twitter @HillaryBlissDFT
Hillary Bliss is the Analytics Practice Lead at Decision First Technologies, and specializes in data
warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients
and vendors to integrate business analysis and predictive modeling solutions into the organizational
data warehouse and business intelligence environments based on their specific operational and
strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.