ii 65 techspecs spec en

62
SAP InfiniteInsight® 6.5 SP5 Technical Specifications Specifications Document Version: 2.0 – 2014-04-25 CUSTOMER

Upload: nachiketa-iyengar

Post on 21-Jul-2016

62 views

Category:

Documents


0 download

DESCRIPTION

II 65 Techspecs Spec En

TRANSCRIPT

SAP InfiniteInsight® 6.5 SP5 Technical Specifications

Specifications Document Version: 2.0 – 2014-04-25

CUSTOMER

CUSTOMER SAP InfiniteInsight® 6.5 SP5 ii © 2014 SAP AG or an SAP affiliate company. All rights reserved- Predictive Analysis / Data Mining Process

Table of Contents

1 Predictive Analysis / Data Mining Process ................................................................................................ 4 1.1 Predictive Model Training ................................................................................................................................ 5 1.2 Predictive Model Apply .................................................................................................................................... 6 1.3 Analytical Data Sets and Data Connectivity .................................................................................................... 7

2 Introduction .................................................................................................................................................... 9

3 General Architecture ................................................................................................................................... 10 3.1 Authenticated Server ...................................................................................................................................... 11 3.2 Stand Alone-Workstation ................................................................................................................................ 12 3.3 Remote Access................................................................................................................................................13 3.4 CORBA Client/Server .....................................................................................................................................14

4 Technology.................................................................................................................................................... 16 4.1 Supported Platforms....................................................................................................................................... 17 4.2 Expected Behavior on Multi-Processors Architecture ...................................................................................18

4.2.1 SAP InfiniteInsight® Threading Policy .............................................................................................18

5 Sizing Modeling Servers.............................................................................................................................. 19 5.1 Training Phase ................................................................................................................................................. 21

5.1.1 RAM Sizing ....................................................................................................................................... 21 5.1.2 Data Transfer ................................................................................................................................... 21 5.1.3 Temp Disk Space ............................................................................................................................ 22 5.1.4 Disk Space in a Year ....................................................................................................................... 22

5.2 Apply Phase .................................................................................................................................................... 22 5.2.1 RAM Sizing ...................................................................................................................................... 22 5.2.2 Data Transfer .................................................................................................................................. 22 5.2.3 Temp Disk Space ............................................................................................................................ 22 5.2.4 Disk Space in a Year ....................................................................................................................... 22

5.3 Sizing Tool ...................................................................................................................................................... 23

6 Network Requirements ............................................................................................................................... 24 6.1 RDBMS Connectivity ...................................................................................................................................... 24 6.2 Client / Server Connectivity ........................................................................................................................... 24

7 Data Access Management .......................................................................................................................... 25 7.1 Access Rights for Files / RDBMS ................................................................................................................... 25

7.1.1 Rights Definition ............................................................................................................................. 26 7.1.2 Data Access Processes .................................................................................................................. 26

7.2 Unicode for RDBMS ....................................................................................................................................... 27

8 Other Software Requirements ................................................................................................................... 28 8.1 Standalone Application Mode ........................................................................................................................ 28 8.2 Client/Server Mode........................................................................................................................................ 29

9 InfiniteInsight® Modeler - Data Encoding Technical Specifications .....................................................30 9.1 Features.......................................................................................................................................................... 30

10 InfiniteInsight® Modeler - Regression/Classification Technical Specifications ................................. 32 10.1 Features.......................................................................................................................................................... 32 10.2 Notes .............................................................................................................................................................. 33

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Predictive Analysis / Data Mining Processiii © 2014 SAP AG or an SAP affiliate company. All rights reserved- iii

11 InfiniteInsight® Modeler - Segmentation/Clustering Technical Specifications ................................. 34 11.1 Features.......................................................................................................................................................... 34 11.2 Notes .............................................................................................................................................................. 35

12 InfiniteInsight® Modeler – Time Series Technical Specifications ......................................................... 36 12.1 Features.......................................................................................................................................................... 36 12.2 Notes .............................................................................................................................................................. 37

13 InfiniteInsight® Modeler - Association Rules Technical Specifications................................................ 38 13.1 Features.......................................................................................................................................................... 38 13.2 Notes .............................................................................................................................................................. 39

14 InfiniteInsight® Explorer - Event Logging Technical Specifications .................................................... 40 14.1 Features.......................................................................................................................................................... 40

15 InfiniteInsight® Explorer - Sequence Coding Technical Specifications ................................................ 41 15.1 Features...........................................................................................................................................................41

16 InfiniteInsight® Explorer - Text Coding Technical Specifications ......................................................... 42 16.1 Features.......................................................................................................................................................... 42

17 InfiniteInsight® Explorer - Semantic Layer Technical Specifications .................................................. 44 17.1 Features.......................................................................................................................................................... 44

18 InfiniteInsight® Social Technical Specifications ...................................................................................... 45 18.1 Features.......................................................................................................................................................... 45

19 Scorer Technical Specifications ................................................................................................................ 47 19.1 Features.......................................................................................................................................................... 47

19.1.1 Without Date Variables ................................................................................................................... 48 19.1.2 With Date Variables ........................................................................................................................ 49

19.2 Notes .............................................................................................................................................................. 50

20 InfiniteInsight® Access ................................................................................................................................ 51 20.1 ODBC ............................................................................................................................................................... 51

20.1.1 Platform: A Definition ...................................................................................................................... 51 20.1.2 Reproducibility Issue ....................................................................................................................... 51 20.1.3 List of Platforms Reproduced and Tested ..................................................................................... 52

21 Flat Files ........................................................................................................................................................ 57 21.1 Supported Data Formats ............................................................................................................................... 57 21.2 Note about Date and Datetime Variables ...................................................................................................... 58

22 SAS Files ....................................................................................................................................................... 59 22.1 Supported Data Formats ............................................................................................................................... 59

23 Annex ............................................................................................................................................................ 60 23.1 Open Source Software Used in InfiniteInsight® ............................................................................................. 60 23.2 List of Available Binaries ................................................................................................................................ 60

CUSTOMER SAP InfiniteInsight® 6.5 SP5 4 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Predictive Analysis / Data Mining Process

1 Predictive Analysis / Data Mining Process

In order for the information team to understand the constraints of using a predictive analytics software, we think it is good they know a little bit about the data mining process.

Here is a definition of predictive analytics from Jeff Liebl, from Multichannel Merchant.

"Predictive analytics is a process, based on statistical and data mining techniques, that models current and historical customer performance data and traits to make 'predictions' about future outcomes and customer behaviors. These predictions can be expressed as numerical values, or scores, that correspond to the likelihood of a particular occurrence or behavior taking place in the future. In corporate America, predictive scores are typically used to determine the risk or opportunity associated with a specific customer or transaction. These evaluations assess the relationships between many variables to estimate risk or response."

The numerical values or scores are generated by mathematical equations resulting from Predictive Models. This notion of predictive model has nothing to do with data models which is a term used to represent the structure of data as seen from tables’ schema in relational databases.

A simple definition of a Predictive Model may be a mathematic model which describes the relationship between some input data (or variables, or attributes: for example, demographic information known about customers) and some output data (the target variable: for example, the fact that a given customer has bought a product following a marketing campaign or not). There are many application domains for predictive analytics and, if you read this document, it means that your corporation has decided to use it in order to optimize some of their business processes.

This is an extract from Wikipedia (section "Predictive Analytics (http://en.wikipedia.org/wiki/Predictive_analytics)") on a very common application of predictive analytics:

"Direct marketing: Product marketing is constantly faced with the challenge of coping with the increasing number of competing products, different consumer preferences and the variety of methods (channels) available to interact with each consumer. Efficient marketing is a process of understanding the amount of variability and tailoring the marketing strategy for greater profitability. Predictive analytics can help identify consumers with a higher likelihood of responding to a particular marketing offer. Models can be built using data from consumers’ past purchasing history and past response rates for each channel. Additional information about the consumers demographic, geographic and other characteristics can be used to make more accurate predictions. Targeting only these consumers can lead to substantial increase in response rate which can lead to a significant reduction in cost per acquisition. Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of products and marketing channels that should be used to target a given consumer."

IN THI S CHA P TE R

Predictive Model Training ....................................................................................................................................... 5 Predictive Model Apply ........................................................................................................................................... 6 Analytical Data Sets and Data Connectivity............................................................................................................ 7

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Predictive Analysis / Data Mining Process5 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 5

1.1 Predictive Model Training

The first resource intensive phase of the data mining process is the model training.

How does in work in practice? Predictive models are generally built using some training samples: lines of data for which the expected value is known. This expected value may be known because it has been collected in the past: for example, the training data set contains people that were active customers at the beginning of the year, and the collected target contains the fact that some of these customers bought a particular product or not in the first 6 months of the year. The expected value may be known because a specific experiment has been run: for example, a first mailing has been sent to a sample of the active customers, and data has been collected to flag which of these active customers have responded favorably to this mailing in a period of three months.

How does this training phase impact the IT resource? Predictive models need access to training "Analytical Data Sets" (ADS) to be built. Using InfiniteInsight®, predictive models are built on a modeling server or workstation (on which InfiniteInsight® has been installed). It is very common to have Training Analytical Data Sets containing a sample of the customer population (in the range of 50,000 to 500,000 lines to fix ideas – even if some clients are using training analytical data set of more than 30 million lines). Each customer in this case is represented by a line containing attribute values. It is very common to have Training Analytical Data Sets containing from 100 to 2,000 attributes to describe each customer – even if some clients are going to the 20,000 attributes range..

For the training phase, Infini teInsight® must be installed on a computer with access to the data sources, with reasonable bandwidth to support exchange of 4 Giga-Bytes (500,000 lines t imes 2,000 co lumns t imes 4 bytes). This bandwidth estimation is just provided as a start ing data point and should be revised for each install.

Inf ini teInsight® V5.0 has introduced the notion of cache in o rder to minimize the number of data transfers between the data source and the modeling server o r workstation.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 6 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Predictive Analysis / Data Mining Process

1.2 Predictive Model Apply

The goal of a predictive model is to be applied on data that has not been used for training, providing an estimation of the expected (and yet unknown) target value (hence the name predictive).

To follow the example developed above: once the model has been trained on the data collected through a first mailing wave done on 50,000 sampled customers, it can be applied to the 20 million customers in your customer data base, using the same attributes to describe these customers, in order to ‘score’ these customers and compute their probability to answer favorably to this mailing campaign. InfiniteInsight® provides some tools for marketing services in order to take into account probability of positive answer, costs of contacts, and expected revenues to optimize the actual list of people to be contacted for this mailing.

How does it work in practice? Predictive models, once trained, can be seen as ‘scoring equations’, transforming input attribute values into a score, a probability, an estimated value, or a segment. To be applied, the scoring equation needs to be fed with an Apply Analytical Data Set. In InfiniteInsight®, there are three options that can be used to apply predictive models:

The most frequent option used by our customers is called the in-database apply, which is automatically called upon when the input data and the scores are to be extracted from and generated within a relational database: In this case, InfiniteInsight® uses its InfiniteInsight® Scorer to automatically generate the scoring equation in SQL (or UDF) and execute this equation into the data base. It is the option to be preferred when the input data is in a relational data base. It is also the option to be preferred when there are millions of scores to generate.

Another possibility is to export the scoring equation through specific module called InfiniteInsight® Scorer. The exported scoring code can then be integrated in the scoring environment (examples of export codes are: SAS, PMML, C, C++, Java, SQL, UDF, JavaScript, and more). It is the option to be preferred when the input data is in a specific format such as SAS, or when the scoring environment is real-time through integration of Java, C++ or PMML scoring equation.

The input data can be transferred to the modeling server or workstation, the scoring equation can be estimated on this machine through InfiniteInsight® software to return the score which can be transferred back to the score consumer environment. This option uses the batch apply service of InfiniteInsight®.

How does this training phase impact the IT resource? Of course, the three different options lead to three different operational constraint sets.

The in-database apply option requires licensing for InfiniteInsight® Scorer. The transfer between the data source and the modeling server is limited to the character string containing the scoring code, no data transfer occurs in this option. The entire bulk processing, including the computation of the scores is done within the relational database.

The export code option requires licensing for InfiniteInsight® Scorer. The management of the generated code is then left to the integration team. The processing is done where the scoring code will be (sometimes compiled and) executed.

The batch apply service option does not require any extra licensing but may require transfer of large data sets between the data source and the modeling server or workstation. Taking back the example of 20 million lines and the 2,000 attributes would generate a transfer of ~150Giga-bytes. This large figure may be lowered by InfiniteInsight® feature for automatic selection of input variables for classification and regression module: A model built from 2,000 attributes my be finally using only 50 of them without loosing any predictive performance, thus, the apply data set is only required to contain the 50 attributes used in the scoring equation, leading to smaller transfer in a range back to 4 Giga-bytes.

All connectivity with relational databases systems are performed through ODBC. This requires proper install of ODBC drivers on the modeling server o r workstation.

It must be noted that KMX generates scoring equations dealing only with the used attributes. When used in conjunction with Inf initeInsight® Explo rer - Semantic Layer (ADM), the in-database option wi ll generate the apply data sets containing only the required attributes, thus saving some processing power in the relational database management system.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Predictive Analysis / Data Mining Process7 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 7

1.3 Analytical Data Sets and Data Connectivity

The data mining process is presented in the section below:

CUSTOMER SAP InfiniteInsight® 6.5 SP5 8 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Predictive Analysis / Data Mining Process

The phase (1) represents the creation of the Training Analytical Data Set, which is then used through phase (2) in order to build (or train) the predictive model. The Apply Analytical Data Set is then created in phase (3) in order to be fed to the built model for apply. The phase (4) applies the model to the ‘Apply Analytical Data Set’ to generate the scores, probabilities, or estimated values.

InfiniteInsight® natively reads data from Text files, and most relational databases through ODBC drivers. Through licensing of a specific module InfiniteInsight® Access, InfiniteInsight® can use analytical data sets from SAS, SPSS, Excel files and more.

The Data Extraction process is covered by SAP InfiniteInsight® through specific modules such as InfiniteInsight® Explorer - Event Logging, InfiniteInsight® Explorer - Sequence Coding, InfiniteInsight® Explorer - Text Coding, InfiniteInsight® Social, and a specific management system for relational databases called InfiniteInsight® Explorer - Semantic Layer.

InfiniteInsight® Explorer - Event Logging/InfiniteInsight® Explorer - Sequence Coding/InfiniteInsight® Explorer - Text Coding/InfiniteInsight® Social features are subject to specific licenses. They are provided as services in InfiniteInsight®, thus running in the modeling server or workstation. They create internal variables within SAP InfiniteInsight® predictive models in order to improve their predictive power:

InfiniteInsight® Explorer - Event Logging is used to create thousands of possible aggregates through multiple time periods, pivoted on categories, using relative reference dates. For example, it can be used to create the average amount bought by each customer, each month for six months starting from the date at which this customer has bought a specific product, for each product brand. InfiniteInsight® Explorer - Event Logging can thus be used to explore which aggregates are useful to improve the predictive models performance.

InfiniteInsight® Explorer - Sequence Coding is used to create variables to collect transition between events. For example, it is often used to compute transitions between pages on a web site for each web session, allowing predictive models to use this information to predict shopping behavior during the web session.

InfiniteInsight® Explorer - Text Coding is used to automatically transform textual fields such as email body, or free form text fields in marketing survey into ‘root words’ that can also be used by predictive models. For example, an insurance claim letter or email can be decomposed into words to detect if some words are predictive for fraud.

InfiniteInsight® Social is used to extract information from graphs such as found in social network sites or in calling patterns for telecommunications or email exchange between individuals. InfiniteInsight® Social can generate variables collecting information on the direct first circle of customers or even detect communities and compute variables profiles on these circles or communities. For example, the ratio of churners in the community may be used by a predictive model to predict the probability of churn for a given telecommunication operator customer.

Infini teInsight® Explorer - Event Logging, Inf initeInsight® Explorer - Sequence Coding, and Infini teInsight® Social bui ld internal representations in o rder to compute the values of their generated variables. They require large RAM configurations. This is part icularly true fo r Inf initeInsight® Social which has been used to represent graphs representing links between tens of mi llions telecommunications customers.

Infini teInsight® Explorer - Event Logging, Inf initeInsight® Explorer - Sequence Coding and Infini teInsight® Social are better sui ted for 64 bits archi tecture.

The Analytical Data Management module follows a different design. It is based on a SQL generator technology, which is optimized for each supported relational database (such as Oracle, Teradata, SQL-server, DB2, MySQL, and more). It provides all functions required for data manipulation in order to create analytical data sets, plus a patent pending technique in order to manage the evolution of these analytical data sets through time. ADM is provided to customers if they have licensed either InfiniteInsight® Explorer - Event Logging or InfiniteInsight® Explorer - Sequence Coding.

Since SAP Infini teInsight® V5.1, in order to avoid misuse of complex queries by data mining p rocesses or users, we have included the usage of ‘explain’ features from the major databases in o rder to use complexity estimates of the generated queries to allow our customers to implement po licies based on this complexi ty.

Other examples of features that we have implemented in o rder to minimize the load on the relational database is the optimization of the queries for each specif ic SQL (using OLAP extensions, sub queries, or co rrelated queries when bui lding aggregates), using temporary tables when needed in o rder to minimize some computation overlap when a given variable is used in multiple expressions, o r even generating specif ic queries when the user wants to see the data manipulation results on the f i rst 100 lines or compute the descriptive statistics on the f i rst 2,000 lines for example.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Introduction9 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 9

2 Introduction

This document presents SAP InfiniteInsight® information dedicated to IT team. It allows preparing the installation and configuration of SAP InfiniteInsight® within the information systems of our prospects.

InfiniteInsight® software provides services usually described in analysts reports as Predictive Analytics/ Data Mining. The first section describes the Data mining process with a focus on the IT constraints and resource, such as the data access and the modeling server usage.

The second section introduces the different architectures that can be chosen to install InfiniteInsight®, since InfiniteInsight® can be installed following client server architecture or on workstations.

The third section describes the supported operating systems and details some of the technological elements around InfiniteInsight®.

The fourth section describes some sizing consideration and the rationale to be used to size modeling servers or workstations.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 10 © 2014 SAP AG or an SAP affiliate company. All rights reserved- General Architecture

3 General Architecture

SAP InfiniteInsight® can be deployed as a stand-alone process (which may be remotely accessed), or as a true client-server.

SAP InfiniteInsight® provides a user interface written in Java (version greater than 1.6 in <KxAFShortName latest versions) called InfiniteInsight® modeling assistant. The same user interface is used in all deployment architectures, and is not subject to specific license.

IN THI S CHA P TE R

Authenticated Server ............................................................................................................................................ 11 Stand Alone-Workstation ...................................................................................................................................... 12 Remote Access .................................................................................................................................................... 13 CORBA Client/Server ........................................................................................................................................... 14

SAP InfiniteInsight® 6.5 SP5 CUSTOMER General Architecture11 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 11

3.1 Authenticated Server

Starting with version 4.0, InfiniteInsight® preferred client-server architecture is based on a master process managing authentication (check of login/password) and impersonation (creation of one process per connected client session) controlling process created for each client session.

Communication between the server and the data uses ODBC, native file system, SAP InfiniteInsight® advanced Access for SAS, SPSS, Matlab files and more, or customized.

Communications between the clients (running graphical user interfaces written in Java called the InfiniteInsight® modeling assistant) and the SAP InfiniteInsight® server are based on CORBA-IIOP which uses SSL encryption. Clients are aware of the server locations through access to CORBA Yellow Pages mechanism called ‘NameServer’.

In this solution, the SAP InfiniteInsight® Modeling Assistant only performs graphical operations: heavy processing is performed on the server. SAP InfiniteInsight® Modeling Assistant user interfaces must be installed on these clients (and updated when a new SAP InfiniteInsight® version is installed). SAP InfiniteInsight® provides a Java Web Start infrastructure to allow IT organizations to disseminate clients when a new SAP InfiniteInsight® version is available.

A single license file (containing the list of modules purchased by the customer) for the Modeling server is required.

Besides the SAP InfiniteInsight® Modeling Assistant, SAP InfiniteInsight® provides a graphical user interface called SAP InfiniteInsight® Admin Console in order to provided external sessions management.

SAP InfiniteInsight® Authenticated server main features are:

Authentication of users: users are required to enter a valid user/password to start a modeling session. Start SAP InfiniteInsight® Instance per session: once a user is authenticated, a SAP InfiniteInsight®

instance process is started for the modeling session. Communication between client and server is encrypted to enforce security (password and data are not

send as plain text through the network).

CUSTOMER SAP InfiniteInsight® 6.5 SP5 12 © 2014 SAP AG or an SAP affiliate company. All rights reserved- General Architecture

The advantages of this solution are multiple:

First the network configuration is light, as only two ports (one for the ‘NameServer’, and one for the Authenticated Server) must be opened since all communications from clients are directed to the Authenticated Server.

Also, each modeling session can use the maximum process memory size, without sharing it with other client process.

A user may decide to close its SAP InfiniteInsight® Modeling Assistant while a long running session still occurs (such as a long model training for example), and connect back to this session later.

Any problem occurring for one user session does not impact the other users. The memory resources allocated to one such session is released once the session is terminated

(when the client GUI is closed for example). Operating system rights can now be used to check access to the different resources (modeling data,

...). One license is required on the modeling server for any number of clients. User activity monitoring and logging is possible and activated by default.

The constraints for this solution are related to the installation process:

Configure the authentication system within SAP InfiniteInsight®. Set some specific rights for the account under which the SAP InfiniteInsight® Authenticated Server

process will run.

On UNIX operating systems, the authentication is managed through PAM (Pluggable Authentication Module) that allows the authentication to be deported on operating system authentication or any other such as LDAP based systems.

3.2 Stand Alone-Workstation

As a stand-alone process, SAP InfiniteInsight® is a 2-Tier architecture.

Communication between the server and the data uses ODBC, native file system, SAP InfiniteInsight® advanced Access for SAS, SPSS, Matlab files and more, or customized.

The main advantage is the simplicity of this architecture and the absence of competition on shared resource (such as CPU on a server). Of course, it requires enough CPU and memory resource on the workstation. It is possible to use this architecture in conjunction with remote access file protocols (such as Samba, Windows Services for UNIX, and so on) to access remote files on a server (for example, accessing remote SAS files on a data server).

Each workstation requires its own license file (since license files are node-locked, which means they are specific for each machine on which InfiniteInsight® is installed).

The communication between the Graphical User Interface and SAP InfiniteInsight® engines is carried out through Java Native Interface (JNI) in the same process.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER General Architecture13 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 13

The advantages of this solution are:

5 minutes installation process.

The constraints for this solution are:

InfiniteInsight® will use as much CPU power it can when a training session will start, thus making the workstation difficult to use for other applications (This is less a problem on multi-core workstations).

One license is required for each workstation, which can be seen as a major issue when managing a large number of workstations.

SAP InfiniteInsight® upgrades must be done on each workstation. Activity Logging is spread on each workstation in the case of multiple workstations

3.3 Remote Access

A common alternative is to deploy the SAP InfiniteInsight® stand-alone version on a server, and allow remote access to the server through a standard "remote access" product such as Windows Remote Desktop, PC Anywhere, Citrix, XWin-32, Exceed, or VNC as shown in the following figure.

The advantages of this solution are:

The server CPU and memory is shared, while freeing used resource after each user session, and still allowing authentication to be performed on the server.

SAP InfiniteInsight® upgrades have only to be installed on the server. The login accounts on the server are to be used to filter access to data sources.

The constraints for this solution are:

Installation of a third party 'remote access' tool which is compliant with the security policy of the enterprise.

Check the compatibility of this third party tool with applications developed in Java 1.6 to make sure that the communications are not done at a very low graphical level (bitmap transfers for example)

CUSTOMER SAP InfiniteInsight® 6.5 SP5 14 © 2014 SAP AG or an SAP affiliate company. All rights reserved- General Architecture

3.4 CORBA Client/Server

The original SAP InfiniteInsight® client-server is a 3-Tier architecture.

Communication between the server and the data uses ODBC, native file system, SAP InfiniteInsight® advanced Access for SAS, SPSS, Matlab files and more, or customized.

Communications between the clients (running graphical user interfaces written in Java called the InfiniteInsight® modeling assistant) and server are based on CORBA-IIOP. Clients are aware of the server locations through access to CORBA Yellow Pages mechanism called ‘NameServer’.

The following figure illustrates SAP InfiniteInsight® 3-Tier architecture.

This architecture is just provided for sake of completeness, since it was the first provided in InfiniteInsight® but is considered as deprecated.

In this solution, InfiniteInsight® modeling assistant only performs graphical operations: heavy processing is performed on the server. InfiniteInsight® modeling assistant user interfaces must be installed on these clients (and updated when a new SAP InfiniteInsight® version is installed).

The main disadvantage is that the communication between the client and the server use CORBA protocol (IIOP) which must be allowed through two ports through firewalls on the server (which could be against internal security policies). Another disadvantage of this architecture is that the same UNIX process is used by all users throughout time, thus reaching process memory size limitations more easily.

This is why some SAP InfiniteInsight® customers are now implementing another architecture based on several SAP InfiniteInsight® CORBA processes on a single physical server as shown in the following figure.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER General Architecture15 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 15

The main advantage of this architecture is to allow bigger modeling activity per user (without reaching process memory limitations). The disadvantage is that the mapping between clients and SAP InfiniteInsight® CORBA processes must be done at installation time.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 16 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Technology

4 Technology

SAP InfiniteInsight® is written in C++. SAP InfiniteInsight® is provided as an API under several formats (C++ loadable library, CORBA server on all supported platforms, COM library and DCOM server for Windows).

IN THI S CHA P TE R

Supported Platforms ............................................................................................................................................. 17 Expected Behavior on Multi-Processors Architecture ........................................................................................... 18

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Technology17 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 17

4.1 Supported Platforms

SAP InfiniteInsight® is written in C++ and is provided as an API under several formats (C++ loadable library, CORBA server on all supported platforms, COM library and DCOM server for Windows).

The supported platforms to date are:

Platform OS Processors Arch. Access JNI PAM SSL

Windows Windows (2000, 2003, XP, NT 4, Vista, 7)

Intel 32-bits (IA32)

32-bits x x x

Windows Windows (Server 2003, Vista, 7, Server 2008) Intel 64-bits (X64) 64-bits x x x

Sun Sparc Solaris 2.8 and later version

Sparc v9 64-bits x x x x

Sun Intel Solaris 10 and later version

Intel 64-bits (X64) 64-bits x

x x x

IBM AIX 5.3.0 PowerPC 64-bits x x x x

IA-UX 11.23 Intel® Itanium 2 64-bits x x x

RedHat ES5 (ELSMP)

Linux-kernel 2.6 ELSMP Intel 64-bits (X64) 64-bits x x x x

The last five columns of this table indicate if the platform supports the following features:

Access InfiniteInsight® Access provides access to various external data format such as SAS® files, SPSS®, Matlab®, Microsoft Excel files and more.

JNI Java Native Interface

SAP InfiniteInsight® provides Java wrappers on top of both the C++ library (through JNI) and the CORBA server to ease integration with J2EE environment.

SAP InfiniteInsight® supports the following Java Virtual Machine: SUN JVM 1.6 for Client/Servers installations SUN JVM 1.6 and above for standalone installations

PAM Pluggable Authentication Module

SAP InfiniteInsight® Authenticated Server supports users authentication through PAM service. Depending on the PAM configuration for SAP InfiniteInsight®, PAM itself can then perform authentication using various mode (through UNIX passwords, Kerberos, ...)

SSL Secure Socket Layer

The communication between the Java Client and the Server can be encrypted using SSL communication instead of regular TCP networking.

Notes -

See the Annex (see page 51), to know which binary corresponds to each platform. For All Intel 32-bit processors (IA32) we recommend large L1 / L2 cache sizes For Sparc processors (Sparc v8 or v9) we recommend processors with correct Floating Point Unit .

For example, we do not recommend UltraSparc T1, as it shares one FPU for 8 Processing Units. The SAP InfiniteInsight® Connector is not available on Solaris 10 X64 platform since DataDirect is not

yet supporting this platform.

SAP InfiniteInsight® is Unicode enabled, which means that data and meta-data (such as the name of columns) can be provided under native or Unicode character sets. The API is however fully Unicode based.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 18 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Technology

4.2 Expected Behavior on Multi-Processors Architecture

The behavior of an application on a multi-core architecture depends on its internal threading policy.

On Symmetric Multi-Processing architecture (SMP - most common architecture for multi-processors), the Operating System will run concurrently all threads started by an application.

Each thread can only be run at one time on a single core, meaning that a single thread can not be executed on more than one core at a time. However, over the lifetime of the thread, the Operating System can switch it from one core to another.

4.2.1 SAP InfiniteInsight® Threading Policy

In SAP InfiniteInsight®, a predictive model is generally run in a single thread. There is one dedicated thread for each ‘model building’ process, or ‘model applying’ process.

This means that:

a single model will use only one core for learn or apply, when running several models (learn or apply) concurrently, several cores (one for each model) can be

used.

Note that in some cases, the limiting factor might be the Input/Output speed for reading data. For example, if the data are located on a remote server (file or DB access), connected through a slow network, the CPU might not be used at 100%.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Sizing Modeling Servers19 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 19

5 Sizing Modeling Servers

Prior installation of InfiniteInsight® on computers used for predictive analytics, it is important to understand how to size these machines according to usage patterns.

SAP InfiniteInsight® provides a sizing tool in Excel for this purpose, but it is interesting to understand the rules under this sizing tool to be able to customize the results for each particular case. This tool can be downloaded on the SAP Help Portal (http://help.sap.com/ii/).

Important Notice: Resource sizing is a difficult exercise, this tool is provided as is in order to present some data based elements. Results should be analyzed and interpreted with care.

Sizing exercises end up providing estimates for 4 elements:

RAM size (used by all models in the modeling processes) The temporary disk space (used when caching on disk is activated) The disk space used to store predictive models Transfer size between data sources and modeling computer Number of cores

In order to get a proper evaluation of the sizing, some inputs must be provided by the team in charge of modeling, such as:

Type of models to be built (such as classification, regression, clustering, segmentation, time series forecasting, association rules): Since SAP InfiniteInsight® models are self-contained, they contain all descriptive statistics for each variable for each internal data sets (the ‘Training’ analytical data sets is decomposed internally into partitions called ‘Estimation’, ‘Validation’ and an optional ‘Test’ data set). On one hand, the modeling team does not have to keep any meta-data repository about their analytical objects, but, on the other hand SAP InfiniteInsight® models take more disks space than keeping only the scoring equation. Some models may be heavy in both disk space and memory consumption such as clustering that keeps all cross-statistics for all input variables for all clusters for example.

If the modeling team policy is to build models with the optional Test data set, most models will require 1/3 more RAM and disk space to be saved, since the statistics on the test data set is contained in the model (both in RAM and on file). In most situations, this flag should be set to 0 (the default when running model training in SAP InfiniteInsight®).

The flag called ‘Interactive’ must be set to 1 when SAP InfiniteInsight® is used from any user interfaces, this requires more RAM since, after a model training, the model is translated into a parameter tree used to transport information used in reports or any visualization panel. In most situations this flag should be set to 1.

Number of concurrent models: in simple organizations, the number of concurrent models is linked to the number of concurrent users, but some modeling teams may decide that each user may run several modeling sessions in parallel. The number of concurrent models should count not only the ones managed through interactive sessions (when predictive analytics is used to discover relationships between data elements), but also through the batch sessions, which are usually run through scheduled tasks, using KxShell scripts of InfiniteInsight® Factory.

The maximum usage of the modeling computer will take into account the number of concurrent models, either through interactive session or through batch sessions. Interactive sessions are more memory consuming than batch sessions since all information computed by the models may be used through the user interface to provide some insights to the user of the client (Java Remote Assistant).

CUSTOMER SAP InfiniteInsight® 6.5 SP5 20 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Sizing Modeling Servers

Size of the analytical data set (for training and apply): the training data sets are always transferred between data sources and modeling computers, we are using an approximation to derive the transfer size through the following equation: #lines x # columns x 4 bytes.

Some other inputs must be provided by the information technology team, the most important are:

Usage of the cache: using the cache requires more resource on the computing server but frees a lot of bandwidth for data transfer between the data sources and the modeling computers. Customers may decide to deactivate the cache. It must be known that the cache feature is activated by default on 64 bits architecture and not activated by default on 32 bits. The cache impact the RAM consumed during training since data sets are stored in memory to fasten multiple sweeps on them.

When activated, the cache stores data into a L1 cache (usually in memory) up to a user specified value (set by default to 500MB). The tool considers that L1 is assigned to memory..

When a data set is larger than the L1 cache, the remaining data is stored into a L2 cache (usually in temporary disk space) up to a user specified value (set by default to 1024 MB). The tool considers that L2 is assigned to disk.

The cache is only activated when dealing with ODBC sources. Future versions of the sizing tool may take into account the following flags.

32- or 64-bit architecture preferred for modeling computers: due to RAM requirements for building large models, we recommend 64-bit architecture, especially for modeling servers. 32-bit architectures should be kept for workstations. The effect on RAM sizing should be minor.

The fact that the preferred operating system flavor is Windows or UNIX: Internal customer policies will dictate if the information technology team prefers using Windows or UNIX operating systems. This said, when dealing with a modeling server for multiple clients, UNIX operating systems tend to scale better when spreading tasks and processes on multiple cores.

SAP InfiniteInsight® provides an Excel™ tool to help this sizing process. The next section describes the computations underlying this tool:

IN THI S CHA P TE R

Training Phase ..................................................................................................................................................... 21 Apply Phase ......................................................................................................................................................... 22 Sizing Tool ............................................................................................................................................................ 23

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Sizing Modeling Servers21 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 21

5.1 Training Phase

5.1.1 RAM Sizing

Of the course, the memory depends upon the model type (classification, regression, clustering, segmentation, time series or association rules). Most model sizes are first impacted by the analytics data sets sizes but also by some other aspects such as the average number of categories for discrete variables, association rules memory consumption depends heavily on the user defined parameters which are the support and confidence of the rules. SAP InfiniteInsight® sizing tool makes some hypothesis based on the databases available for non-regression tests.

The second important element is the activation of cache in RAM. Since version V5, SAP InfiniteInsight® provides a cache mechanism in order to minimize the data transfer between the data sources and the computing server. Besides minimizing the data transfer, data caching is also interesting when dealing with complex queries run by relational databases in order to compute on the fly the analytical data sets: when the data is cached, the complex query is run only once. This caching mechanism is useful for the training phase. Data cache sizes may be tuned from the configuration file at installation time. The default configuration depends on the modeling computer architectures:

For 32-bit architecture no data is cached in memory, and data can be cached in local temporary files when under 500 Mega-Bytes.

For 64-bit architecture, data is cached in memory up to 500 Mega-Bytes and extra data is cached in temporary files when under 2 Giga-Bytes.

These limits are provided in order not to pollute smaller configurations with data in memory or on temporary files: They can be increased for larger modeling computer configurations. When the analytical data sets used for training are larger than these configured limits, InfiniteInsight® does not cache the data and reverts to multiple sweeps over the data sets, thus increasing the network traffic, and making the training process slower. The cache is only valid when data is read from relational database sources: This is an incentive to have data stored under relational data sources with respect to text files or SAS files.

For training models, the memory consumption can be computed as the sum of:

The model size which depends mainly on the number of input variables (columns of the training data set) since most of the memory is taken to hold statistics of the input variables.

The memory eaten by the cache in L1 (if the data sets are small enough to hold in the cache – when data sets are larger than the values provided by L1 and l2, then the cache is automatically deactivated)

The memory which is taken by the parameter tree in order for the user interface to communicate with the models, and provide reports to the user (which is only true for interactive session).

5.1.2 Data Transfer

When a model is trained, the data is taken from the data source and read in SAP InfiniteInsight® modeling server. Most of SAP InfiniteInsight® algorithms require several sweeps on the training data sets. This explains why the data transfer can be important when the cache is not used. The data transfer size is equivalent to the size of the training data set when the cache is activated, but is a multiple of this value when the cache is not activated, depending on the number of sweeps of the algorithm (which is only estimated for segmentation since the number of sweeps depends upon the data itself).

CUSTOMER SAP InfiniteInsight® 6.5 SP5 22 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Sizing Modeling Servers

5.1.3 Temp Disk Space

When cache is activated and when the data set size is larger than the user specified limit for L1, the remaining part of the data set is stored into a L2 space (usually disk space).

5.1.4 Disk Space in a Year

When a large number of models need to be saved, customers must be aware of the size it takes to store all these models on disk. The size a model is taking on disks is very close to the size of a model in RAM.

5.2 Apply Phase

5.2.1 RAM Sizing

The size of the RAM taken to apply a model is the same size than after training.

5.2.2 Data Transfer

In connection with ODBC sources, when InfiniteInsight® Scorer is purchased, there is no data transfer since the SQL query representing the scoring equations are sent directly to the data base that plays the role of the scoring engine. When the source is not an ODBC source or when InfiniteInsight® Scorer has not been purchased, the data corresponding to the apply data set needs to be transferred to the SAP InfiniteInsight® server.

5.2.3 Temp Disk Space

There is no temporal storage on disk during apply.

5.2.4 Disk Space in a Year

When a large number of scores need to be saved, customers must be aware of the size it takes to store all these scores. As a convention, we have taken the size of the scores equivalent to a data set with 4 columns (the identifier of the model, the identifier of the date, the identifier of the customer to be scored and the score itself).

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Sizing Modeling Servers23 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 23

5.3 Sizing Tool

SAP InfiniteInsight® provides a Microsoft Excel™ tool to help this sizing process on the SAP Help portal (http://help.sap.com/ii/).

CUSTOMER SAP InfiniteInsight® 6.5 SP5 24 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Network Requirements

6 Network Requirements

IN THI S CHA P TE R

RDBMS Connectivity ............................................................................................................................................ 24 Client / Server Connectivity ................................................................................................................................... 24

6.1 RDBMS Connectivity

The SAP InfiniteInsight® server has to be connected with data sources. Data sources may be files (in standard formats such as CSV, or in proprietary formats such as SAS ™), but, most often, operational data sources are implemented through data base systems (RDBMS). SAP InfiniteInsight® supports ODBC connectivity with a list of supported database vendors (Oracle, Teradata, SQL server, DB2, MySQL, Netezza, and more). As said earlier, in the preferred embodiment of SAP InfiniteInsight® installation, data transfers between such databases and SAP InfiniteInsight® servers will be for the analytical training data sets. Common scenarios involve training data sets of 500,000 lines and 1,000 columns, which are not large with respect to the bandwidth capacity of LAN.

6.2 Client / Server Connectivity

Each of the three components involved in the application (the CORBA Name Server, the SAP InfiniteInsight® Server and the Client Application) can be located on a different machine (although in most installations, both the Name Server and the SAP InfiniteInsight® Server will be located on the same Server).

The communication protocol used under the CORBA framework is the TCP protocol. The first requirement is that the:

The SAP InfiniteInsight® Server must be capable to access the CORBA Name Server The Client application must be capable to access both the CORBA Name Server and the SAP

InfiniteInsight® Server.

From a Network Administration point of view it means that:

The CORBA Name Server and the SAP InfiniteInsight® Server will be assigned a specific TCP port. By default, the startup scripts we use fix them, and they should probably set properly depending on the network strategy.

Both of these ports must be accessible from the Client machine (for example network firewalls should allow communication on these ports between Client machine and the servers).

In the following we will name "CORBA Name Server port" the TCP port used by the CORBA Name Server and "SAP InfiniteInsight® Server port" the TCP port used by the SAP InfiniteInsight® Server.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Data Access Management25 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 25

7 Data Access Management

InfiniteInsight® Access is a solution for Accessing Data in a wide variety of formats. It allows reading and writing to and from SAS files, SPSS, files, Minitab files, Excel files and several other file types.

ODBC

The section ODBC (on page 51) in the Data Access chapter lists the platforms that have been tested by our development team and that can be easily reproduced.

Platforms

In SAP InfiniteInsight® context, a platform is to be considered the combination of the following elements:

the client platform, where SAP InfiniteInsight® is running the DBMS (Database Management System) platform, where the DBMS providing the data is running the DBMS the ODBC driver, which is the software layout exposing the data from the DBMS in a standard way the ODBC driver manager, which is an intermediate software layer managing the ODBC drivers

installed on a computer

IN THI S CHA P TE R

Access Rights for Files / RDBMS ..........................................................................................................................25 Unicode for RDBMS ..............................................................................................................................................27

7.1 Access Rights for Files / RDBMS

Since all models are built based on the extraction of information from data, SAP InfiniteInsight® integrates different data access instances to read several types of files formats - SAS, SPSS, Excel, Minitab, delimited text, and fixed length text -, tables and views or to select statements from data bases through ODBC connections (available on both Windows and UNIX platforms).

For efficiency matters, SAP InfiniteInsight® does not create a separate analytics data store. It reads the data from the existing data sources, and saves output and models back into the data source.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 26 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Data Access Management

7.1.1 Rights Definition

Data source rights are defined either via the database management system or via the OS system (for flat files).

For each data source:

The right... Allows the user to...

Read Data read the data stored on the data store

Write Data save data on the data store

7.1.2 Data Access Processes

When working in a client-server mode, all the data is accessed by the server. The data source (ODBC, ...) must be correctly installed (drivers, ...) and configured on the server.

Note- No data access is done from the client environment, all data access is performed on the server side. This means that the server is only able to access the data present on the server host (and no t on the client machine). It is however possible to set up the client and the server on the same physical host, so that all data avai lable on the client side is also avai lable for the server.

When using SAP InfiniteInsight® components via a graphical interface, the data access processes are performed in a seamless manner for the user. They only have to select the data source format to be used ("flat files" or ODBC-compatible data sources) and specify the location of the source.

Note- Data access is done through the no tion of a cursor (o r line i terator) and SAP InfiniteInsight® provides a technique to allow integrato rs to wri te their own driver to connect SAP Infini teInsight® to their proprietary data sto rage mechanism. The C Data Access API is intended fo r developers who want to wri te access code for proprietary format databases.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Data Access Management27 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 27

7.2 Unicode for RDBMS

SAP InfiniteInsight® is Unicode-enabled: data and metadata (such as the name of columns) can be provided under native or Unicode character sets. UTF-8 and UTF-16 formats are supported for input data.

A "native" file format is also supported. On UNIX platforms the native file format is the one described in the LANG environment variable. On Windows it corresponds to whatever is announced by the system or the CODEPAGE environment variable.

SAP InfiniteInsight® uses its own conversion functions to handle UTF-8/UTF-16 formats. Native character sets are handled using UNIX iconv and Windows native functions.

Two Unicode options can be associated to a data source:

supportUnicodeOnData describes the kind of character conversion supported by the ODBC driver/ODBC Driver Manager/DBMS in the record data.

supportUnicodeOnMeta describes the kind of character conversion supported by the ODBC driver/ODBC Driver Manager/DBMS in the object names (tables, fields, indexes).

The default values for these options follows the standard SAP InfiniteInsight® behavior related to MultiLanguageIsDefault option:

MultiLanguageIsDefault=no (default setting) This behavior is compatible with previous versions: input/output is done in native character sets (client code page). SAP InfiniteInsight® ODBC layout works in conjunction with mini drivers in order to read and emit characters in client code page. MultiLanguageIsDefault=yes All input/output are done in Unicode (UTF-16) characters. SAP InfiniteInsight® ODBC layout works in conjunction with mini drivers in order to read and emit characters in UTF-16 format.

The MultiLanguageIsDefault option is a global switch to the full SAP InfiniteInsight® engine and can be overloaded by supportUnicodeOnMeta / supportUnicodeOnData options for every DSN.

To apply these Settings on a Standalone Installation of SAP InfiniteInsight® Update the KJWizard.cfg file located in <INSTALLATION DIR>\EXE\Clients\KJWizardJNI\.

Example For version 6.5 SP4: C:\Program Files\SAP InfiniteInsight\II_V6.5.4\EXE\Clients\KJWizardJNI\

To apply these Settings on a Client/Server Installation of SAP InfiniteInsight® Update the KxCORBA.cfg file located in <INSTALLATION DIR>\EXE\Servers\CORBA\.

Example For version 6.5 SP4: C:\Program Files\SAP InfiniteInsight\II_V6.5.4\EXE\Servers\CORBA\

CUSTOMER SAP InfiniteInsight® 6.5 SP5 28 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Other Software Requirements

8 Other Software Requirements

IN THI S CHA P TE R

Standalone Application Mode ............................................................................................................................... 28 Client/Server Mode ............................................................................................................................................... 29

8.1 Standalone Application Mode

When deploying InfiniteInsight® as a standalone application with the Windows Client Installer – which means that InfiniteInsight® runs on the machine without having to access a server – an ODBC manager and drivers for a specific RDBMS (optional) must be installed if the data is stored in a database.

Note- Inf initeInsight™ provides i ts own Java Runtime Environment (JRE 1.6) in the installat ion package.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Other Software Requirements29 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 29

8.2 Client/Server Mode

There are two ways to deploy InfiniteInsight®:

using a client installer (local installation), or using Java Web Start (web access to the application).

According to the installation mode, the following requirements must be met:

Installation Mode Requirements on the Server

Requirements on the Client Note

Client Application ODBC manager and drivers for a specific RDBMS (optional): in case the data is stored in databases.

/ SAP InfiniteInsight® provides its own Java Runtime Environment (JRE 1.6) in the installation package.

Java Web Start Client Web Server: Windows IIS, Apache Server (http://www.apache.org), ...

ODBC manager and drivers for a specific RDBMS (optional): in case the data is stored in databases.

Java Runtime Environment (JRE) version 1.6 or above (available on the Java website (http://www.oracle.com/technetwork/java/javase/downloads/index.html)).

Web browser: MS-Internet Explorer version 5.0 or above (available on the Microsoft website (http://download.microsoft.com)), ...

Users have to install a compatible version of Java Runtime Environment (JRE 1.6 or above) on the client machine.

An installation through Java Web Start may seem more demanding, however it provides additional features that will save time such as automatic updates. For more information on Java Web Start, refer to the SAP InfiniteInsight® Java Web Start Installation guide.

Note- There is no need to install Java Runtime Environment (JRE) on the server side.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 30 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Modeler - Data Encoding Technical Specifications

9 InfiniteInsight® Modeler - Data Encoding Technical Specifications

InfiniteInsight® Modeler - Data Encoding (formerly known as K2C) is a data preparation transform for building a consistent (robust) coding scheme for any attribute belonging to a training data set containing a business question (specific target attribute to analyze). For example, each possible value (category) for a nominal attribute is either discarded as non consistent, or coded as a number for later use by subsequent transforms. Each ordinal attribute is provided as its natural order or encoded with respect to the target when available. Each continuous attribute is provided as a normalized number or encoded with respect to the target when available.

InfiniteInsight® Modeler - Data Encoding brings Intelligence to any OLAP system (IOLAPTM) through the determination of an optimal banding and binning strategy to explain a measure of a cube structure.

9.1 Features

Scalability: InfiniteInsight® Modeler - Data Encoding is linear in number of lines and columns. Data Passes: InfiniteInsight® Modeler - Data Encoding processes the Estimation and Validation sets in

a single pass for each. Inputs: Inputs can be ordinal, nominal or continuous. Targets: Targets can be ordinal, nominal or continuous. Results: Segments for continuous values in order to build histograms and quantiles information, Level of under-representation for nominal categories to be discarded and collapsed into a

miscellaneous class called ‘KxOther’, Groups of initial segments or categories for all specified targets to realize best compromise between

fit and robustness, Quality and robustness indicators for all input variables (each input variable can be considered as a

univariate model with its own quality and robustness indicators). Output: As such, InfiniteInsight® Modeler - Data Encoding does not generate any specific output but coding of

variables as requested by the following components such as InfiniteInsight® Modeler - Regression/Classification and InfiniteInsight® Modeler - Segmentation/Clustering.

Parameters: User Enable Compress: Boolean flag allowing the user (when set to false) to deactivate the target

based optimal grouping performed by InfiniteInsight® Modeler - Data Encoding on this single attribute. User Band Count: Number, present only for continuous attributes, allowing the user to change the

number of bands (segments, bins) to collect statistics on this attribute from the default of twenty. User Enable KxOther: Boolean flag, present only for nominal attributes allowing the user (when set to

false) to deactivate the compression into KxOther for very infrequent categories. Of course, this will generally lead to non stable data representation and coding, as well as increased memory and processor consumption.

Nominal Groups, Ordinal Bands, and Continuous Bands: can be used by the user to force a data structure. This can be used to force a drilling hierarchy for example, segmenting age in user-defined segments to be used by SAP InfiniteInsight® modeling techniques.

User Modulus: Allows the user to enforce the bands of the continuous attributes to be the modulus of the given value. For example this allows the user to enforce the fact that bands are always a multiple of 1000 when dealing with monetary values.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Modeler - Data Encoding Technical Specifications31© 2014 SAP AG or an SAP affiliate company. All rights reserved- 31

CUSTOMER SAP InfiniteInsight® 6.5 SP5 32 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Modeler - Regression/Classification Technical Specificati

10 InfiniteInsight® Modeler - Regression/Classification Technical Specifications

InfiniteInsight® Modeler - Regression/Classification (formerly known as K2R) trains models implementing a mapping between a set of descriptive attributes (model inputs) and target attributes (model outputs). It belongs to the regression family of algorithms, and can be used to solve binary classification and regression mining functions. InfiniteInsight® Modeler - Regression/Classification is not a “text book” regression algorithm such as linear least squares.

It uses a proprietary algorithm, an SAP InfiniteInsight® derivation of a principle described by V. Vapnik as "Structural Risk Minimization". The returned models are expressed as a polynomial expression of the input attributes.

InfiniteInsight® Modeler - Regression/Classification also allows the specification of a weight attribute for each training row in order to adapt the cost function to the user requirements. By default without a weight attribute, each training row is considered to be of equal value. The output model can be analyzed in terms of attribute contributions weighing the relative importance.

InfiniteInsight® Modeler - Regression/Classification can be used in any Attribute Importance function. InfiniteInsight® Modeler - Regression/Classification brings Intelligence to any OLAP system (IOLAPTM) through the determination of the important dimensions that can be used to explain a measure of a cube structure.

10.1 Features

Scalability: The behavior of InfiniteInsight® Modeler - Regression/Classification is linear with the number of lines and linear with the number of input attributes of the training set. However, it is combinatorial with respect to the degree of the polynomial since higher polynomial degrees greatly increase the number of input attributes.

Data Passes: InfiniteInsight® Modeler - Regression/Classification requires two passes on the Estimation data set, and one pass on the Validation data set.

Inputs: Inputs can be ordinal, nominal or continuous. Targets: Targets can be binary nominal or continuous. Results: Attribute importance in order to quantify the relative importance of each input in predicting the

targets. The quality and robustness indicators of the estimation of the targets as well as some common

measures (such as classification rate for classification tasks or mean square error or Pearson coefficient for the regression case)

Output: Estimation of continuous targets, Decision and/or probabilities associated with binary classification. InfiniteInsight® Modeler - Regression/Classification can also generate an outlier indicator when the

target is known and statistically far from the estimation of this target. InfiniteInsight® Modeler - Regression/Classification can also generate error bars for continuous

estimates.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Modeler - Regression/Classification Technical Specifications33© 2014 SAP AG or an SAP affiliate company. All rights reserved- 33

Parameters: Polynomial Order: User can specify a polynomial order greater than 1.

10.2 Notes

1 Multi-nominal targets are not yet directly supported. A user can create a disjunctive coding of a multi-nominal target with as many Boolean attributes as there are categories, train a model with all these attributes as targets, and combine the probability outputs to make a final classification.

2 Ordinal targets are accepted by the algorithm but the proper debriefing in the form of a confusion matrix (leading to classification rates) is not yet directly supported.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 34 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Modeler - Segmentation/Clustering Technical Specificati

11 InfiniteInsight® Modeler - Segmentation/Clustering Technical Specifications

InfiniteInsight® Modeler - Segmentation/Clustering (formerly known as K2S) trains models implementing a mapping between a set of descriptive attributes (model inputs) and the ID (model output) of one of several clusters/segments computed by the system. It belongs to the family of clustering/segmentation algorithms for training descriptive models. The goal of these models is to gather similar data into groups. The question of similarity is discussed below.

The current version of InfiniteInsight® Modeler - Segmentation/Clustering first builds prototypes in order to minimize the intra-distances of cases within clusters and maximize the inter-distances between different clusters. This notion of distance can be based on the input distributions when no target is provided, but, it must be noted that InfiniteInsight® Modeler - Segmentation/Clustering is more powerful when used for ‘supervised segmentation’. In this case a target is used to encode all inputs and provide a notion of distance which is meaningful for the application. Similar to InfiniteInsight® Modeler - Regression/Classification, the target is any attribute relevant to the user's business. For example, the purchase amount for a customer, the response to a marketing campaign, or the fact that an individual churned in the last two months.

InfiniteInsight® Modeler - Segmentation/Clustering uses a derivation of SRM to compute a short logical expression for each cluster through a feature called “SQL Expressions”. For example a cluster may be defined as "age <= 35 AND marital-status in ['Divorced']". This has several advantages:

Logical expressions are generally very easy and natural to interpret. The segmentation process is easier to integrate in operational environments such as relational

databases through SQL.

11.1 Features

Scalability: The behavior of InfiniteInsight® Modeler - Segmentation/Clustering is linear with the number of lines, more than linear with the number of columns.

Data Passes: InfiniteInsight® Modeler - Segmentation/Clustering processes data with 4 sweeps on the Estimation data set and one sweep on the entire Training data set. When the ‘segmentation’ mode is selected, the number of passes is proportional to the longest statement found in a cluster expression (rarely above 7).

Inputs: Inputs can be ordinal, nominal or continuous. Targets: Targets are optional and can be binary nominal or continuous. Results: When ‘segmentation’ mode is chosen, short logical expressions for each cluster. Global statistics for each cluster. All descriptive statistics between each cluster and each selected

input. Frequency: percentage of population gathered in the cluster. When supervised, InfiniteInsight® Modeler - Segmentation/Clustering also provides: % of 'label' in classification case (binary target): percentage of label in the cluster, where label is the

least frequent category of the binary target. Target Mean in regression case (continuous target): mean value of the target for data assigned to the

cluster. The KI and KR that can be associated with the cluster ID.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Modeler - Segmentation/Clustering Technical Specifications35© 2014 SAP AG or an SAP affiliate company. All rights reserved- 35

Outputs: Cluster or segment indices in different formats. Parameters: Number of Clusters: The user must specify the number of clusters. Type of distance internally used: city-block, Euclidian, or absolute difference. The encoding strategy may be tuned in some ways.

11.2 Notes

1 Multi-nominal targets are not yet directly supported. A user can create a disjunctive coding of a multi-nominal target with as many Boolean attributes as there are categories, train a model with all these attributes as targets.

2 Ordinal targets are accepted by the algorithm but the proper debriefing in the form of a confusion matrix (leading to classification rates) is not yet directly supported.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 36 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Modeler – Time Series Technical Specifications

12InfiniteInsight® Modeler – Time Series Technical Specifications

InfiniteInsight® Modeler – Time Series (formerly known as KTS) lets you train predictive models from data representing time series. With InfiniteInsight® Modeler – Time Series models, you can:

Identify and understand the nature of time series through trends and cycles. Forecast the evolution of a time series in the short and medium term, that is, to predict their future

values.

InfiniteInsight® Modeler – Time Series breaks a time series into four components:

Trend: The trend represents the evolution of a time series over the period analyzed. The trend is represented either by a function of time or by signal differentiating, which is calculated in InfiniteInsight® Modeler – Time Series using the principle that a new value can be predicted based on only the previous known value. Calculating the trend allows InfiniteInsight® Modeler – Time Series to build a stationary representation of the signal (that is, the time series does not increase or decrease any more). This stationary representation is essential for the analysis of the three other components.

Cycles: The cyclicity describes the recurrence of a variation in the signal. It is important to distinguish calendar time from natural time. These two time representations are often out of phase. The former - which is referred to as seasonality - represents dates (day, month, year and so on), while the latter - which is referred to as periodicity - represents a continuous time (1, 2, 3 and so on).

Fluctuations: Fluctuations represent disturbances that affect a time series. In other words a time series does not only depend on external factors but also on its last states (memory phenomena). InfiniteInsight® Modeler – Time Series tries to explain parts of the fluctuations by modeling them on past values of the time series (ARMA or GARCH models).

Information Residue: The information residue is the information that is not part of the trend, cycles, or fluctuations. As such, predictive models generated by InfiniteInsight® Modeler – Time Series are characterized only by the first three components - trend, cycles and fluctuations.

12.1 Features

Scalability: InfiniteInsight® Modeler – Time Series is usually used on small data sets Data Passes: InfiniteInsight® Modeler – Time Series internally computes a lot of models that are

compared for best results. This leads to a number of passes between 6 and 10 passes on Estimation and more than 12 passes on Validation depending on the number of internal cycles found.

Inputs: InfiniteInsight® Modeler – Time Series needs an ordered data set with values associated with a column of type date, date-time or number. Extra input variables can be specified.

Targets: The signal to forecast must be continuous. Results: Signal components as trends, cycles, and seasons. Internal contribution of lagged variables when such model is chosen. Outputs: Forecasts at the selected horizon. Error bars around the forecasts Forecasts decomposition. Parameters: Number of Forecasts: User must specify how many forecasts they want to produce. Explanatory Attributes: User can specify extra inputs used as explanatory attributes.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Modeler – Time Series Technical Specifications37© 2014 SAP AG or an SAP affiliate company. All rights reserved- 37

12.2 Notes

1 When using explanatory attributes, several rows of data must be provided with non-blank input values beyond the last row used for training. The number of extra rows corresponds to the number of forecasts requested.

2 The current version of InfiniteInsight® Modeler – Time Series does not accept missing values for the signal. 3 Dates to be associated with the forecasts can be either provided in the apply data set or will be generated

approximately by InfiniteInsight® Modeler – Time Series. 4 There is no formal constraint about the fact that each line should be associated with dates (or times) with

a constant period, but this is the intent of the underlying algorithms.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 38 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Modeler - Association Rules Technical Specifications

13 InfiniteInsight® Modeler - Association Rules Technical Specifications

InfiniteInsight® Modeler - Association Rules (formerly known as KAR) generates association rules. Association rules provide clear and useful results, especially for market basket analysis. They bring to light the relations between products or services and immediately suggest appropriate actions. Association rules are used in exploring categorical data, also called items. Items belong to Transactions.

The strengths of InfiniteInsight® Modeler - Association Rules are:

to produce clear and understandable results, to support unsupervised data mining (no target attribute), to explore very large data sets thanks to its ability to first generate rules on parts of the data set

before aggregating them (exploration by chunks), to generate only the more relevant rules (also called primary rules).

Once rules are generated, they can be used in apply mode in order to generate items that could be included in the transactions.

The SAP InfiniteInsight® implementation uses a third generation algorithm (‘A Priori’ algorithm belongs to the first generation, ‘FP-Tree’ algorithm belongs to the second generation) which can be used to generate only meaningful rules where all other techniques return a lot of redundant rules. This allows for both scalability and minimizing the number of generated rules without loss of information.

13.1 Features

Scalability: We have developed an incremental version of our algorithm, which greatly reduces the memory consumption.

Data Passes: One pass on the transaction data set, two passes when the incremental version is activated.

Inputs: InfiniteInsight® Modeler - Association Rules needs an Events data set with transaction identifiers and item identifiers. The items will be used to generate the rules. InfiniteInsight® Modeler - Association Rules also needs a Training data set that contains the ticket identifiers that will be used to train the system and build the rules.

Results: Rules expression with corresponding quality indicators for each rule Descriptive statistics on the items and transactions. Outputs: Recommendations which can be seen as items associated with probabilities. Parameters: Confidence allows filtering rules by the probability associated with the consequent. Support allows filtering rules by their frequency. Size of rules. Possibility to deactivate SAP InfiniteInsight® specific optimization techniques to compare with other

environments.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Modeler - Association Rules Technical Specifications39© 2014 SAP AG or an SAP affiliate company. All rights reserved- 39

13.2 Notes

When the incremental option is selected, the transactions must come grouped through the transaction index. The system will exit on error if all items of the same transaction are not provided contiguously.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 40 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Explorer - Event Logging Technical Specifications

14 InfiniteInsight® Explorer - Event Logging Technical Specifications

The purpose of InfiniteInsight® Explorer - Event Logging (formerly known as KEL) is to build a mineable representation of events history. For example, InfiniteInsight® Explorer - Event Logging can be used to represent RFA (Recency-Frequency-Amount) views of a customer based on purchase history.

It is not a data mining algorithm but a data preparation transform. As discussed above, the input data for the regression, classification and clustering transforms require a single data set with a fixed number of attributes. However, commonly a customer is associated with several events (purchase history for example) with variable number for every customer. This list of events across several rows must be translated into a single row with a fixed number of attributes. These types of operations are called pivoting in data mining because they translate information contained in the same attribute of different rows into different attributes on a single row (for a given customer identifier for example). InfiniteInsight® Explorer - Event Logging can be used to represent any time-date stamped history, such as the history of a customer, or the history of a log of defects associated with a machine in a network.

This component merges static information (single value per ID) and dynamic information (multiple values per ID). The user must have these two data sets before using the component. The data set containing static information is generally called the "reference" data set, and it is associated in the models with the classical data set names or roles such as Training, Estimation, Validation, Test or ApplyIn. The data set containing the log of events (sometimes called the "transactions" table) is associated with a label beginning with the string "Events". InfiniteInsight® Explorer - Event Logging is said to build coarse grain representations as it summarize the events into different periods of interest.

14.1 Features

Scalability: The scalability of InfiniteInsight® Explorer - Event Logging is linked to the memory of the server because a temporary version of all aggregates is created.

Data Passes: One pass on the Events data set. Inputs: InfiniteInsight® Explorer - Event Logging needs an Events data set with some continuous

attributes to be aggregated, such as amounts. Each row could for example represent the fact that a customer has bought a product at this time for this amount. Possible types of aggregations are minimum, maximum, sum, average, and count. InfiniteInsight® Explorer - Event Logging also needs a Training data set that contains the identifiers that will be used to group the Events (in the previous example, it could be the customer identifier).

Outputs: Aggregates as specified by the user Parameters: Profile of the period of aggregations Aggregation functions for each period and between periods.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Explorer - Sequence Coding Technical Specifications41© 2014 SAP AG or an SAP affiliate company. All rights reserved- 41

15 InfiniteInsight® Explorer - Sequence Coding Technical Specifications

The purpose of InfiniteInsight® Explorer - Sequence Coding (formerly known as KSC) is to build a mineable representation of events history. For example of InfiniteInsight® Explorer - Sequence Coding can be used to represent web log sessions. InfiniteInsight® Explorer - Sequence Coding is able to represent each session as both the count of pages and the transitions between pages (or meta-information about the pages).

It is not a data mining algorithm but a data preparation transform. As discussed above, the input data for the regression, classification and clustering transforms require a single data set with a fixed number of attributes. However, commonly a customer is associated with several events (purchase history for example) with variable number for every customer. This list of events across several rows must be translated into a single row with a fixed number of attributes. These types of operations are called pivoting in data mining because they translate information contained in the same attribute of different rows into different attributes on a single row (for a given customer identifier for example). InfiniteInsight® Explorer - Sequence Coding can be used to represent any time-date stamped history, such as the history of a customer, or the history of a log of defects associated with a machine in a network.

This component merges static information (single value per ID) and dynamic information (multiple values per ID). The user must have these two data sets before using the component. The data set containing static information is generally called the "reference" data set, and it is associated in the models with the classical data set names or roles such as Training, Estimation, Validation, Test or ApplyIn. The data set containing the log of events (sometimes called the "transactions" table) is associated with a label beginning with the string "Events". InfiniteInsight® Explorer - Sequence Coding is said to build fine-grained representations as it summarizes the count of different events or even the transitions between different events for a given reference object.

15.1 Features

Scalability: The scalability of InfiniteInsight® Explorer - Sequence Coding is linked to the memory of the server because a temporary version of all aggregates is created.

Data Passes: Two passes on the Events data set. Inputs: InfiniteInsight® Explorer - Sequence Coding needs an Events data set with some discrete

attributes representing events to be counted or for which transitions will be counted. For example, each row could represent a click on a given Web page. InfiniteInsight® Explorer - Sequence Coding also needs a Training data set that contains the identifiers that will be used to group the events (in the previous example, it could be the web session identifier).

Outputs: Count on each selected type of events or transitions between these events. Parameters: List of selected events used to filter Specific types of counts or transition counts for events Flag indicating if each transaction should be encoded or only the entire sessions.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 42 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Explorer - Text Coding Technical Specifications

16 InfiniteInsight® Explorer - Text Coding Technical Specifications

InfiniteInsight® Explorer - Text Coding (formerly known as KTC) is a solution for Text Analytics. It automatically prepares and transforms unstructured text attributes into a structured representation to be used within the SAP InfiniteInsight® modeling components.

InfiniteInsight® Explorer - Text Coding automatically handles the transformation from unstructured data to structured data going through a process involving “stop word” removal, merging sequences of words declared as 'concepts', translating each word into its root through “stemming” rules, and merging synonyms. InfiniteInsight® Explorer - Text Coding allows text fields to be used “as is'” in classification, regression, and clustering tasks. It comes packaged with rules for several languages such as French, German, English and Spanish, and can be easily extended to other languages.

InfiniteInsight® Explorer - Text Coding improves the quality of predictive models by taking advantage of previously unused text attributes. For example, messages, emails sent to a support line, marketing survey results, or call center chats can be used to enhance the results of models for cross-sell or attrition.

16.1 Features

Scalability: The behavior of InfiniteInsight® Explorer - Text Coding is linear with the number of lines of the data set. The average size of the texts contained will influence the computing time.

Data Passes: If InfiniteInsight® Explorer - Text Coding has to recognize the language, a first path of the data set will first be processed. In order to create a dictionary for every textual variable, one pass on the training data set is done. If the data set has more than one textual variable, the number of passes does not grow with the number of textual variables to process.

Inputs: The data set must contain at least one variable containing text and must be declared as storage string and value textual.

Results: Language recognized for all the textual variables (one language for all). Dictionary containing all the selected roots for every textual variable Statistics on Stemming Rules usage Statistics on every root of every dictionary Outputs: Language Recognition for each line: Add a variable that indicates the recognized language for every

text Vectorization: Add variables corresponding to the text representation in the dictionary. The type of the

value depends of the Encoding parameter Generate only Roots: Only displays variables corresponding to the text representation in the

dictionary. The type of the value depends of the Encoding parameter Transactional Mode: Creates a transactional file that has for every text, X lines corresponding to the

roots for the text in the order of appearance. Parameters: Repository containing the language files List of excluded languages Language recognition Possibility for the user to specify the language Language processing options

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Explorer - Text Coding Technical Specifications43© 2014 SAP AG or an SAP affiliate company. All rights reserved- 43

Encoding parameters for vectorization

CUSTOMER SAP InfiniteInsight® 6.5 SP5 44 © 2014 SAP AG or an SAP affiliate company. All rights reserved-InfiniteInsight® Explorer - Semantic Layer Technical Specifications

17 InfiniteInsight® Explorer - Semantic Layer Technical Specifications

SAP InfiniteInsight® provides a module to edit, save, and retrieve data manipulations as described in the document Data Manipulation: Use Case Scenarios. When data stores (directories or ODBC sources) are associated with a repository containing data manipulations, these connectors appear as regular files or tables and can be used directly (like other data) to train or apply models.

One of the useful features of InfiniteInsight® Explorer - Semantic Layer is the ability to declare arguments. Arguments are symbols with associated values that can be changed before executing the data manipulations. They can be used anywhere within InfiniteInsight® Explorer - Semantic Layer.

InfiniteInsight® does not offer a special engine to execute these data manipulations, since they can all be performed by standard SQL engines embedded with all major relational databases. Instead, InfiniteInsight® Explorer - Semantic Layer can be seen as an object oriented layer that is used to generate data manipulation statements in SQL, which are processed, in turn, by the data base server.

17.1 Features

Results: Time-stamped population (snapshots of the entities and a given time)

Filtered Time-Stamped Population Cross Product Time-Stamped Population Compound Time-Stamped Population

Temporal analytical data sets. Outputs: Data manipulation aggregates (based on conditions, expressions, string or date manipulations, etc.). Possibility to create targets on the fly. Parameters: Entity (the object of interest targeted by an analytical task). Analytical Record (logical view of all attributes corresponding to an entity)

Possibility to define domains in the analytical record (group of attributes describing an homogeneous section of an entity). Time stamp variables prompt(s) Business performance indicators (a signal associating dates with one or several metrics). Repository containing the metadata

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Social Technical Specifications45 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 45

18 InfiniteInsight® Social Technical Specifications

The purpose of InfiniteInsight® Social is to build attributes that can augment customer profiles (or any other object of interest) derived from graph structures. These graph structures can be extracted from an events history, such as contacts between customers or employees or transactions linking customers and products. For example, in the telecommunications industry, InfiniteInsight® Social can be used to build different social networks based on Call Detail Records.

In this sense, InfiniteInsight® Social should be seen as a data preparation transform used to extract information from graph structures to generate a fixed number of derived attributes. InfiniteInsight® Social can be used to build a set of graphs and to extract properties for each customer, by analyzing its connectivity and profiling its neighborhood within the graph set. InfiniteInsight® Social has also an algorithmic aspect linked to the automated discovery of communities within graphs and the computation of derived attributes linked to statistics obtained on these communities (such as the density of the community associated with each customer for example)

This module can merge static information (single value per Id) and dynamic information (multiple values per Id). The data set containing the interaction events is called the Link Data Set and it contains information used to build the links (or edges) for the graph set. The optional data set containing static information is called the Node Decoration Data Set, as it provides information on entities that will be the graph nodes. Finally, an additional Identifiers Conversion Data Set can be added to join the Link and the Node Decoration data set together, if their respective Ids are not the same. It is a two-column data set that translates Ids from the Link Data Set into Ids of the Nodes Decoration Data Set. (For instance, phone line numbers and client Ids).

InfiniteInsight® Social can be called a Social Network Analysis component as it offers a way to get a view on social interactions hidden in raw data and to derive attributes from these structures.

18.1 Features

Scalability: The scalability of InfiniteInsight® Social is linked to the memory of the server as all graphs are stored in the main memory. This explains why it is recommended to use InfiniteInsight® Social only on 64-bit servers with a fair amount of RAM.

Data Passes: One pass on the Link Data Set. Inputs: InfiniteInsight® Social needs an Link Data Set with a least two columns (source node and target

node) to build the graph set. The Node Decoration Data Set is optional and allows InfiniteInsight® Social to aggregate properties (means, mode and profile) for a given nodes neighborhood. The Identifiers Conversion Data Set is also optional and can be specified for convenience.

Outputs: Connection Analysis: information on the degree centrality (number on neighbors) Circle Analysis: aggregates properties of direct neighbors Centrality Analysis: information on a nodes influence potential (computed through mathematical

metrics) Neighbors Mode: lists the neighbors of a node Parameters: The graphs loading parameters allow building multiple graphs from a single Link Data Set by using

different filtering mechanisms such as creating a graph per period of time or per type of interaction.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 46 © 2014 SAP AG or an SAP affiliate company. All rights reserved- InfiniteInsight® Social Technical Specifications

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Scorer Technical Specifications47 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 47

19 Scorer Technical Specifications

InfiniteInsight® Scorer generates source code in different formats for InfiniteInsight® Modeler - Data Encoding/InfiniteInsight® Modeler - Regression/Classification and InfiniteInsight® Modeler - Data Encoding/InfiniteInsight® Modeler - Segmentation/Clustering models. The supported output codes are:

C MYSQL SQL PMML2 AWK HTML (JavaScript) SAS, JAVA BASIC TERADATA UDF DB2 UDF Oracle UDF SQLServer UDF Score Card in HTML

19.1 Features

Scores obtained by using the generated codes should be the same as those obtained with SAP InfiniteInsight®. However, slight differences may exist, mainly due to precision issues in computation. The following tables sum up known problems.

Caption

Color Meaning

++ Syntax is correct and results are the same as SAP InfiniteInsight® engines (1).

! Syntax is correct but results are different (2).

!! Code not implemented.

X Code is generated, but problems can occur on some systems due to system limitations.

Not tested

(1) Results may be slightly different due to precision issues, especially with models with a lot of variables.

(2) Database types without RTrim (right trim, automatically suppressing whitespaces at the end of a string) consider as different two categories with names only differing by an ending whitespace.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 48 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Scorer Technical Specifications

19.1.1 Without Date Variables

Key Code K2R order 1 K2R order 2 K2S K2S with SQL Expression

C ++ ++ ++ ++

JAVA ++ ++ ++ ++

PMML3.2 ++ !! ++ !!

AWK ! ! ! !

CPP ++ ++ ++ ++

SAS ++ ++ ++ ++

SQLServer ++ !! !! ++

SQLServerUDF ++ !! !! ++

HANA (1) ++ !! !! ++

ORACLE ++ !! !! ++

OracleUDF ++ !! !! ++

SQLDB2 ++ !! !! ++

DB2UDF ++ !! !! ++

DB2V9 ++ !! !! ++

SQLTeradata ++ !! !! ++

TERAUDF ++ !! !! ++

MYSQL ++ !! !! ++

MYSQLUDF ++ !! !! ++

SybaseIQ ++ !! !! ++

SybaseIQUDF ++ !! !! ++

SQLNetezza ++ !! !! ++

SQLVertica ++ !! !! ++

PostgreSQL ++ !! !! ++

Note (1) InfiniteInsight® Scorer manages HANA column and row storage.

Caution Only SQLServer key code handles trimmed data during its execution.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Scorer Technical Specifications49 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 49

19.1.2 With Date Variables

Key Code K2R order 1 K2R order 2 K2S K2S with SQL Expression

C ++ ++ ++ ++

JAVA ++ ++ ++ ++

PMML3.2 !! !! !! !!

AWK !! !! !! !!

CPP ++ ++ ++ ++

SAS ++ ++ ++ ++

SQLServer ++ !! !! ++

SQLServerUDF ++ !! !! ++

HANA (1) ++ !! !! ++

ORACLE ++ !! !! ++

OracleUDF ++ !! !! ++

SQLDB2 ++ !! !! ++

DB2UDF ++ !! !! ++

DB2V9 ++ !! !! ++

SQLTeradata ++ !! !! ++

TERAUDF ++ !! !! ++

MYSQL ++ !! !! ++

MYSQLUDF ++ !! !! ++

SybaseIQ ++ !! !! ++

SybaseIQUDF ++ !! !! ++

SQLNetezza ++ !! !! ++

SQLVertica ++ !! !! ++

PostgreSQL ++ !! !! ++

Notes (1) InfiniteInsight® Scorer manages HANA column and row storage.

Caution Only SQLServer key code handles trimmed data during its execution.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 50 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Scorer Technical Specifications

19.2 Notes

The limit for the parameters number of a UDF is independent from the SAP InfiniteInsight®; it is determined by the DBMS limitations. The following table details the maximum number of arguments allowed for a UDF with respect to the DBMS:

DBMS Number of parameters

SQLServer 2000 1024

Oracle 128

Teradata 128

DB2 90

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Access51 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 51

20 InfiniteInsight® Access

InfiniteInsight® Access is a solution for Accessing Data in a wide variety of formats. It allows reading and writing to and from SAS files, SPSS, files, Minitab files, Excel files and several other file types.

20.1 ODBC

This section lists the ODBC platforms that have been tested by our development team and that can be easily reproduced.

20.1.1 Platform: A Definition

In SAP InfiniteInsight® context, a platform is to be considered the combination of :

The client platform, where SAP InfiniteInsight® is running. The DBMS (Database Management System) platform, where the DBMS providing the data is running. The DBMS. The ODBC driver. This is the software layout exposing the data from the DBMS in a standard way. The ODBC driver manager. This is an intermediate software layer managing the ODBC drivers installed

on a computer.

20.1.2 Reproducibility Issue

With regard to the complex combination of setting parameters that platforms constitute, extensively exploring potential platforms appears to be a huge task.

To enhance SAP InfiniteInsight® reproducibility and test processes on platforms, we are currently working on a tool that will let us:

Test new platforms more easily, Integrate all test platforms on daily non-regression tests.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 52 © 2014 SAP AG or an SAP affiliate company. All rights reserved- InfiniteInsight® Access

20.1.3 List of Platforms Reproduced and Tested

All the platforms listed in the table below have been tested at least once but not necessarily re-tested for each version of SAP InfiniteInsight®. However we can easily reproduce them for further testing if need be. Nevertheless, some features (Data Manipulation or InfiniteInsight® Scorer) may not be available or may be untested yet for some platforms.

DBMS InfiniteInsight Engine OS ODBC Driver ODBC Manager

Data Manipulation

Scorer

Access Windows 2000,

Windows XP Microsoft 4.00 Microsoft

db2 9.5 Linux X86 64 bits v91fp7 unixODBC Not tested Not tested

db2 9.5 HP UX Itanium 11.23 64 bits IBM Data Server Runtime Client version 9.5 (v9.5_hpipf64_client.tar.gz) unixODBC Not tested Not tested

db2 9.5

Windows 2000,

Windows XP,

Windows seven

IBM 9.05.00.808 Microsoft

db2 9.5 Windows Server 2003 R2 64 bits IBM DB2 ODBC DRIVER 9.05.00.808 Microsoft Not tested Not tested

HANA 1.0 rev 67 Windows Seven 32 bits HDBODBC32 sp6 rev 67 minimum (HDBODBC 1.00.6768.55550) Microsoft

HANA 1.0 rev 67 Windows Seven 64 bits HDBODBC sp6 rev 67 minimum (HDBODBC 1.00.6768.55550) Microsoft

HANA 1.0 rev 67 Linux 64 bits HDBODBC sp6 rev 67 minimum (HDBODBC 1.00.60.379371) unixODBC

HANA 1.0 rev 70 Windows Seven 32 bits HDBPDBC sp7 rev 0 (HDBODBC 1.00.7.0.58439) Microsoft

HANA 1.0 rev 70 Windows Seven 64 bits HDBPDBC sp7 rev 0 (HDBODBC 1.00.7.0.58439) Microsoft

HANA 1.0 rev 70 Linux 64 bits HDBPDBC sp7 rev 0 (HDBODBC 1.00.7.0.58439) unixODBC

HANA 1.0 rev 73 Windows Seven 32 bits HDBPDBC sp7 rev 73 (HDBODBC 1.00.73.00) Microsoft

HANA 1.0 rev 73 Windows Seven 64 bits HDBPDBC sp7 rev 73 (HDBODBC 1.00.73.00) Microsoft

HANA 1.0 rev 73 Linux 64 bits HDBPDBC sp7 rev 73 (HDBODBC 1.00.73.00) unixODBC

HANA 1.0 rev 73 Solaris 11 sparc 64 bits HDBPDBC sp7 rev 73 (HDBODBC 1.00.73.00) unixODBC

MySQL 5.03 Linux X86 64 bits ODBC connector 5.1.5 unixODBC

MySQL 5.03 Solaris 10 X64 MyODBC 3.51.26 Microsoft

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Access53 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 53

DBMS InfiniteInsight Engine OS ODBC Driver ODBC Manager

Data Manipulation

Scorer

MySQL 5.03 Windows 2000,

Windows XP MyODBC 2.50 Microsoft

MySQL 5.03 Windows 2000,

Windows XP MyODBC 3.51.25.00 Microsoft

MySQL 5.03 Windows Server 2003 R2 64 bits MyODBC 3.51.23.00 Microsoft

Netezza 5.08 (1) Windows 2000,XP 5.00.08 Microsoft

Netezza 5.08 (1) Windows Server 2003 R2 64 bits 5.00.08 Microsoft

Oracle 10.02.0010 Windows Server 2003 R2 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) Microsoft

Oracle 10.02.0010 Windows Seven 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) Microsoft

Oracle 10.02.0010 Linux 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.0010 Solaris Sparc 2.8 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.0010 AIX 5.3 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.0010 HP UX Itanium 11.23 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.0010 Solaris 10 X64 DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.00.10 Solaris 11 sparc 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 10.02.0010 Windows Server 2003 R2 64 bits Oracle 10.02.00.01 Microsoft

Oracle 10g Windows 2000, XP Oracle 10.02.00.03 Microsoft

Oracle 11g 11.1.0.6.0 Windows Seven 64 bits Oracle 11.1.0.6.0 Microsoft

Oracle 11g 11.1.0.6.0 Linux 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 11g 11.1.0.6.0 Solaris Sparc 2.8 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 11g 11.1.0.6.0 AIX 5.3 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 11g 11.1.0.6.0 HP UX Itanium 11.23 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 11g 11.1.0.6.0 Solaris 10 X64 DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 11g 11.1.0.6.0 Solaris 11 sparc 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) DataDirect

Oracle 9.2 Windows Server 2003 R2 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) Microsoft

Oracle 9.2 Windows Seven 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) Microsoft

CUSTOMER SAP InfiniteInsight® 6.5 SP5 54 © 2014 SAP AG or an SAP affiliate company. All rights reserved- InfiniteInsight® Access

DBMS InfiniteInsight Engine OS ODBC Driver ODBC Manager

Data Manipulation

Scorer

Oracle 9.2 Windows 2000, XP DataDirect 6.1 SP2 (InfiniteInsight Data Access for Oracle) Microsoft

Oracle 9.2 Windows 2000, XP Oracle 9.02 Microsoft

PostgreSQL 8.4.2

Windows XP,

Windows 2003,

Windows Seven 32 & 64 bits

PostgreSQL 9.00.03.10 Microsoft

PostgreSQL 8.4.2 Linux 64 bits postgresql90-odbc-09.00.020-1PGD.rhel5.x86_64 UnixOBDC

SQLServer 2000 Windows 2000, XP Microsoft SQLServer 2000.85 Microsoft

SQLServer 2000 Windows Server 2003 R2 64 bits SQLServer 2000.86.3959.00 Microsoft

SQLServer 2005 Windows 2000, XP Microsoft SQLServer 2000.85 Microsoft

SQLServer 2005 Windows 2000, XP Microsoft SQL Native client 2005.90 Microsoft

SQLServer 2005 Windows Server 2003 R2 64 bits SQLServer 2000.86.3959.00 Microsoft

SQLServer 2008

Windows XP,

Windows 2003,

Windows Seven 32 & 64 bits

SQL Server Native Client 10.0 (2) Microsoft

SQLServer 2008 Linux 64 bits Microsoft ODBC Driver 11 for SQL Server unixODBC

Sybase IQ 15.4 Windows Server 2003 R2 64 bits Sybase IQ IQ 12.00.0.6567 Microsoft

Teradata 13.1 Windows 2000,XP Teradata TTUF 13.1 Microsoft

Teradata 13.1 (3) AIX 5.3 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) DataDirect

Teradata 13.1 (3) (4) HP UX Itanium 11.23 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) DataDirect

Teradata 13.1 (3) Linux 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) DataDirect

Teradata 13.1 (3) Solaris 2.8 64 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) DataDirect

Teradata 13.1 (3) Windows XP 32 bits DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) Microsoft

Teradata 13.1

Windows XP,

Windows Server 2003 R2,

Windows Seven 64 bits

DataDirect 6.1 SP2 (InfiniteInsight Data Access For Teradata) Microsoft

Teradata 13.1 Windows XP 32 bits Teradata TTUF 13.1 Microsoft

Teradata 13.1 AIX 5.3 64 bits Teradata TTUF 13.1 Teradata

Teradata 13.1 HP UX Itanium 11.23 64 bits Teradata TTUF 13.1 Teradata

SAP InfiniteInsight® 6.5 SP5 CUSTOMER InfiniteInsight® Access55 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 55

DBMS InfiniteInsight Engine OS ODBC Driver ODBC Manager

Data Manipulation

Scorer

Teradata 13.1 Linux 64 bits Teradata TTUF 13.1 Teradata

Teradata 13.1 Solaris 2.8 64 bits Teradata TTUF 13.1 Teradata

Teradata 13.1 Solaris 10 X86 64 bits Teradata TTUF 13.1 Teradata

Teradata 13.1 Solaris 11 sparc 64 bits Teradata TTUF 13.1 Teradata

Teradata 13.1

Windows 2000,

Windows XP,

Windows Server 2003 R2 64 bits

Teradata TTUF 13.10 Microsoft

Teradata 14.0

Windows 2000,

Windows XP,

Windows Server 2003 R2 64 bits

Teradata TTUF 14.0 Microsoft

Teradata 14.10

Windows 2000,

Windows XP,

Windows Server 2003 R2 64 bits

Teradata TTUF 14.10 Teradata

Vertica 4.1.6 Windows XP,2003, Seven 32 & 64 bits Vertica ODBC Driver 4.1.09 Microsoft

Vertica 4.1.6 Linux 64 bits Vertica ODBC Driver 4.1.09 UnixODBC

Notes (1) Due to Netezza limitations, once a table has been created it is not possible to add a new column when updating the table (via in-database apply or classical

apply). For example, it is impossible to add a scoring column to the table afterwards.

(2) Connectivity on SQLServer 2008 needs the SQL Server native client 10.0 ODBC driver. Alternative SQL Server ODBC driver is not supported by SAP InfiniteInsight®.

(3) All connections to Teradata using InfiniteInsight® Data Access for Teradata need standard Teradata client packages: Tdicu, TeraGSS, cliv2. These packages can be found in standard Teradata Tools and Utility Files CDs. TTUF82, TTUF12, TTUF13,TTUF13.1 are compatibles.

(4) At this time, Unicode configuration with HPUX is not possible.

(5) If Fastwrite is planned to be used with TTUF14.10, patch for fastload 14.10.00.03 is mandatory.

SAP InfiniteInsight® 6.5 SP5 CUSTOMER Flat Files57 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 57

21Flat Files

21.1 Supported Data Formats

The following standard formats are supported:

TAB-delimited files. Comma Separated Values (CSV) files (English CSV). Due to some localization issues, you must be

aware that the CSV format is not always safe. The true delimiter of a CSV file may depend of the machine language on some platforms. For that reason, we recommend using TAB-delimited file.

The TAB-delimited files should comply with the following specifications:

ASCII file Each line contains a record of values. Values in a line are separated by a TAB character (tabulation, ASCII code 9). The first line of the file contains the name of the variables. Variable names should not include the special character slash (/). Variable names should be unique (2 variables should not have the same name). Number should use the English convention for decimal point (decimal point is ‘.’) String may be enclosed in quote characters (single or double quotes); this is not mandatory. The lines should always contain the same number of TAB characters. Dates, if any, should be represented through the following format: “YYYY-MM-DD” (ISO 8601 format).

Note that SAP InfiniteInsight® provides a date coding feature that automatically extracts date information such as “day of week”, “number of years, or days, since this date”, and so on in order to improve the models. Example of valid date: 2000-03-24

Here is a simple example of how such a data file could look (columns are aligned here, they are in fact just separated by a TAB character):

age Workclass Education Educ-Level Gain class

39 State-gov Bachelors 13 2174.00 0

50 Self-emp-not-inc Bachelors 13 0.00 0

38 Private HS-grad 9 0.00 0

53 Private 11th 7 0.00 0

28 Private Bachelors 13 0.00 0

37 Private Masters 14 0.00 0

49 Private 9th 5 0.00 0

52 Self-emp-not-inc HS-grad 9 0.00 1

31 Private Masters 14 14084.00 1

42 Private Bachelors 13 5178.00 1

CUSTOMER SAP InfiniteInsight® 6.5 SP5 58 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Flat Files

21.2 Note about Date and Datetime Variables

Internally in SAP InfiniteInsight® all dates are converted as datetime. This allows comparing and mixing dates with different formats, either date or datetime.

Duration computations also follow this behavior. When performing Event Log Aggregation or Sequence Analysis, the periods defined in the settings (such as "3 periods of 2 weeks before the reference date") are converted as bounds of datetime ranges.

When a date is converted to datetime, the time is set by default to noon (12:00), instead of midnight (0:00). This is to avoid problems when converting back to date from datetime (as a one second delta may change the date from one day). For example, if you look at a table containing date values which description is forced to datetime, you will see the dates with a time set to 12:00:00.

Tip - In the user interface, to indicate a datetime compatible with a date value, enter it with the t ime set to noon (12:00:00).

SAP InfiniteInsight® 6.5 SP5 CUSTOMER SAS Files59 © 2014 SAP AG or an SAP affiliate company. All rights reserved- 59

22 SAS Files

22.1 Supported Data Formats

The following data formats are supported by SAP InfiniteInsight®:

Format Extension Version

SAS for Windows and OS/2 SD2 SAS7BDAT 6/7/8/9

SAS for Unix SSD* SAS7BDAT 6/7/8/9

SAS CPORT STC 6/7/8/9

SAS Transport Files XPT TPT 6/7/8/9

Note - Compressed versions of these formats cannot be read by SAP Inf initeInsight®.

CUSTOMER SAP InfiniteInsight® 6.5 SP5 60 © 2014 SAP AG or an SAP affiliate company. All rights reserved- Annex

23 Annex

IN THI S CHA P TE R

Open Source Software Used in InfiniteInsight® ................................................................................................... 60 List of Available Binaries ...................................................................................................................................... 60

23.1 Open Source Software Used in InfiniteInsight®

InfiniteInsight® uses third-party software that allow powering the processes or improving the GUI manipulation. For more information, refer to the delivered document Third-party Software Delivered with InfiniteInsight®.

23.2 List of Available Binaries

The following table lists the binaries available for each platform.

Platform OS Binary Processors

Windows (32-bits) Windows (2000, 2003, XP, NT 4, Vista, 7) Microsoft and Visual Studio 8 Intel 32-bits (IA32)

Windows (64-bits) Windows (Server 2003, Vista, 7) Microsoft and Visual Studio 8 Intel 64-bits (X64)

Windows 2008 (64-bits) Windows 2008 Microsoft and Visual Studio 8 Intel 64-bits (X64)

Sun Sparc Solaris 2.8 and later version Sun CC 5.0 Sparc v9 (64-bits)

Sun Intel Solaris 10 and later version Sun CC 5.0 Intel 64-bits (X64)

IBM AIX 5.3.0 xlC PowerPC 64-bits

IA-UX 11.23 aCC Intel® Itanium 64-bit

RedHat ES5 (ELSMP) Linux-kernel 2.6 ELSMP g++ 4.1.2 Intel 64-bits (X64)

www.sap.com/contactsap

© 2014 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

National product specifications may vary.

These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Please see (www.sap.com/corporate-en/legal/copyright/index.epx#trademark) for additional trademark information and notices.