analytics toolkits v13

45
Analytics Tools Chapter Synopsis Wars are won by Armies and Strategies but fought with Weapons --Anonymous This chapter focuses on the tools available in the market to carry out different types of analytics. In the beginning we give you a quick look on the typical data flow in an organization, from the time a customer interacts with the business system and generates activity data, through the various stages of data preparation and how it finally lands with a Business User as Insights/Recommendations. This is followed by a quick breakup of type of analytics done at various life stages of the data, e.g., frontend analytics to Upsell solutions to Customers. Then we give you a quick overview of the various factors that shape the decision on which analytics tool to deploy and then give you brief summary of top tools for each type of analytical needs. Finally we wrap up this chapter with mention of other top tools available in the market, which you might want to explore for your needs. Structure of the Chapter: As mentioned above the chapter is broken down into four major categories, Quick introduction into the typical data flow in an organization Type of Analytics and the top toolkits under each Factors to decide toolkits for each type of analytics Brief overview of Top Tools Detailed description of Top Tools Other worthy mentions Quick introduction into the typical data flow in an organization Figure 1 illustrates at high level the typical data flow in an organization. As shown in the figure, the first Presentation Tier is where a Customer interacts with the Business and generates data. The data could be of various types – Customer data, Transactional data, Web/Mobile Activity data, etc. To illustrate better let’s take an example- a Customer John walks into a bank to open a Checking account. 1

Upload: ramkumar-ravichandran

Post on 14-Jun-2015

298 views

Category:

Data & Analytics


1 download

DESCRIPTION

Glossary of tools available for the field of Analytics. Research done on Google, hence no claims on accuracy. Please use this as a directional insights into the tools available for Analytics. Also the research is about a year old, and there have been many wonderful new tools in the market today, so please do your own research to get updated info.

TRANSCRIPT

Page 1: Analytics toolkits v13

Analytics Tools

Chapter Synopsis

Wars are won by Armies and Strategies but fought with Weapons--Anonymous

This chapter focuses on the tools available in the market to carry out different types of analytics. In the beginning we give you a quick look on the typical data flow in an organization, from the time a customer interacts with the business system and generates activity data, through the various stages of data preparation and how it finally lands with a Business User as Insights/Recommendations. This is followed by a quick breakup of type of analytics done at various life stages of the data, e.g., frontend analytics to Upsell solutions to Customers. Then we give you a quick overview of the various factors that shape the decision on which analytics tool to deploy and then give you brief summary of top tools for each type of analytical needs. Finally we wrap up this chapter with mention of other top tools available in the market, which you might want to explore for your needs.

Structure of the Chapter:

As mentioned above the chapter is broken down into four major categories,

Quick introduction into the typical data flow in an organization Type of Analytics and the top toolkits under each Factors to decide toolkits for each type of analytics Brief overview of Top Tools Detailed description of Top Tools Other worthy mentions

Quick introduction into the typical data flow in an organization

Figure 1 illustrates at high level the typical data flow in an organization. As shown in the figure, the first Presentation Tier is where a Customer interacts with the Business and generates data. The data could be of various types – Customer data, Transactional data, Web/Mobile Activity data, etc. To illustrate better let’s take an example- a Customer John walks into a bank to open a Checking account. He provides all details required by the bank to open the account. When the executive enters his information into the system, the warehouse takes in the data (and creates a row) and assigns the John an Identification Number. When John walks to the Teller and deposits money into his Account, the corresponding field in the warehouse is updated. Later when John logs onto his online account to check balances, it generates web Activity and his row is updated. If John transfers to someone transactional information is checked against his balance and then updated.

Now let’s move on to what happens in the backend- the data is stored in the warehouse and whenever John interacts with the system, the front end system interacts with the warehouse through an intermediate Logic Tier and serves John. Logic Tier stores all the logic required to perform the business operations – commands, mathematical calculations, analytical decision making structure, etc. It’s responsible for moving the data between the front and the back end and ensuring that all John’s requests are served correctly.

1

Page 2: Analytics toolkits v13

Analytics Tools

2

Page 3: Analytics toolkits v13

Analytics Tools

The Data Tier is the layer where the data operations happen. Logic Tier directly works with the front end tables which store data for serving business queries. e.g.,

Customer John’s Snapshot data – current account balance, risk profile, statement summary etc. Location profile data -nearest ATMs, Branches, Merchants offering discounts, etc. Recommendations - use his credit card for discount on a weekend movie Up sell or Cross Sell – apply for a Mortgage

Other front end tables record the transactions/activities, e.g., John used his ATM for $500 withdrawal statement printout logged on to site/app and reached Customer service.

Many a times for running business effectively, Businesses need to have a complete view of the Customers, for which they source 3rd party data, e.g., Credit Bureau, Nielsen’s Ratings, Macroeconomic data, etc.

Given that most of the data generated by the front end and/or received by 3 rd party systems are unstructured/unorganized they need to be processed, cleaned and combined logically for eventual storage and usage in analysis or serving business request. These operations are called ETL (Extraction, Transformation and Loading) are done on regular intervals depending on Business requirements.

Post ETL, the structured data flows into various tables in the Enterprise Data Warehouse(EDW). EDW might have specific tables for specific type of information, e.g.,

Customer table – with demographics, snapshot of activity, risk & marketing profile. Transaction tables – containing transactional information like Amount, Number of transactions,

Type of Product purchased, etc.

Business Users (Product Mangers, Marketers, Sales Professionals, etc.) rely on some standard metrics for running their day to day operations. They need to see it daily or at regular intervals to understand what’s going on in their business and if it needs more attention. Given the repetitive nature and standardization of these requirements, it makes sense to create a structure where this information is captured in required format and constantly refreshed and available on multiple channels (Email, Cloud or App) – this is called “Reporting”. To run it again-and-again on the granular tables discussed above will be inefficient & slow, so Business Intelligence professionals typically pre-aggregate the data in a standard structure to serve the various reporting requests. This is called “OLAP(On-line Analytical Processing) ” roll-ups or cubes . The reports are then built off of these cubes and so are efficient/quick.

Analysts typically are interested in finding out what happened, why, where and when, how good or bad it is, etc. and they do this by looking at various metrics and KPIs of the business. They might leverage the reports or cubes or might hit the database directly for getting answers to their questions. Their analysis might consist of charting, tabulations, simple/advanced math or statistical techniques. We will look at the various types of analytical techniques in detail in the following sections.

Type of Analytics and the top toolkits under each

3

Page 4: Analytics toolkits v13

Analytics Tools

Table 1 summarizes four broad types of Analytics, why they are done and the top tools used when carrying out that type of Analytics.

A. Data Collection, ETL & Storage:

Whenever a customer interacts with the business system, data is generated which has to be captured efficiently & accurately and stored in the system from a customer service point of view, business operations view and regulatory requirements. Given the ever dynamic nature of businesses today, data collection, storage & retrieval technologies have proliferated each with their own merits and limitations. Many of them are best for specific set of needs but might not be that useful in other sets of circumstances. Data Storage has really matured from early days when they were simply stored as a dump of information, which then gave way to relational data structure (RDBMS), which was followed by parallel processing and now back to amalgamation of all these broad technologies. Given the varied

4

Page 5: Analytics toolkits v13

Analytics Tools

requirements, fast & accurate delivery of structured business requirements, efficiency of scale at the back end to handling swathes of unstructured data from social media/videos/surveys; no one tool can help run the business end to end.

Going into detail of these technologies will require a dedicated book by itself, but let’s attempt to summarize details at a very top level,

Front End Tables (OLTP), e.g. Oracle, DB2

Front End Tables or OLTP (Online Transaction Processing) tables are best to run the client-facing businesses. Their biggest strength is speed, accuracy and lesser failure rates.

Large scale Historical Storage, e.g., Teradata, SQL Server

These systems are the repository of all the data generated and store the information from the front end & other internal (Clickstream, Survey systems, Testing Infrastructure) and 3 rd party sources. Data from the various sources undergo ETL (Extraction, Transformation & Loading) processing, combined in logical sequences and fed to these systems. These tools are characterized by efficient processing and retrieval of huge data sizes (typically massive parallel processing). They also need to be easily integrated with reporting/analytics platforms.

Unstructured Data, e.g., Hadoop

Over time visionaries realized the need for systems which can capture non-traditional data (videos, comments) that is going to be generated in large quantities unforeseen in their times. They started developing technologies that capture such data without putting any restrictions on the structure of the data but having the flexibility to define the structure at the time of retrieval (reporting/analysis). This strength is also its Achilles heel, no structure means slow retrieval, but with Web 2.0 the time of such technologies has truly arrived and the rapid development of reporting/analytical tools based on these platforms or at least a connectivity tool with existing tools points to a promising & mainstream future of big data.

B. Reporting:

Reporting tools are primarily a visualization (tables, charts, maps, etc.) tool and are specifically used by Business users/Executives to make sense of the data, monitor & understand dynamics (using KPIs) in their portfolio on-the-fly. Analysts too leverage the reports for similar purposes; however they are more interested in the data available in the reports to understand the drives the movements in KPIs. Analysts also leverage reporting tools to understand the enterprise-wide standard KPI creating logic which they can use for their analysis.

Reporting tools are usually judged on the “30-60 rule”. The “30-60” rule says that the broad story should be conveyed in first 30 secs of viewing and should provide capability to do one-level drill-down to get a directional sense of the story.

Reporting tools might need to deal with various kinds of data,

5

Page 6: Analytics toolkits v13

Analytics Tools

Instrumentation Data: record of activity, on live business site, captured via instrumentations Call Log Data: dump of server calls from the live business site and what was delivered Transactional Data Active Customer Data Customer Feedback Data (social discussions, Survey data, etc.)

Some reporting tools also need to incorporate budget, forecasts, competitors & benchmark for Users to best understand where they are.

Given the importance of Reporting in running a business and regulatory compliance and the like, many enterprises create dedicated “Reporting” product for specific needs/industries/domains, e.g., SAS CRMS which is SAS Basel II compliance module.

Factors for deciding Tools for Reporting

Specific needs from a Reporting tool: Wide variety of visualization; availability to access reports from a wide variety of channels – emails, texts, alerts, website, Apps; speed of report refresh; and ability to consolidate data from a wide variety of data sources (and now Big Data too).

Below is the list of factors that should be considered to zero-in on a tool, in the order of priority.

1. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, Alerts, Tweets, Social Shares, etc.

2. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)?

3. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis?

4. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues?

5. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost?

6. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources

7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability.

8. Editorial & Tagging Capabilities: Enabling users to check backend logic for debugging/single source of truth.

9. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics (slicing/dicing enabled) visible across all channels

10. Types of Aggregations possible: OLAP Cubes, Simple/Advanced Math, Statistical techniques,etc.

A plethora of tool are available in the market for Reporting, hence the need for a structured decision making process like above, so that you end up with the tool satisfying most of your needs.

Table 2 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above.

6

Page 7: Analytics toolkits v13

Analytics Tools

Overview

Adobe Marketing Cloud (AMC), the erstwhile Omniture Web Reporting/Analytics suite, is the leader in Web Analytics (Analysis of Clickstream data). Mobile reporting/analytics capabilities are being ramped up.AMC “instruments” actions on Web Pages, buttons, callouts in emails, etc. which it then tracks in its warehouse on Cloud and provides front end (SiteCatalyst for Reporting & Ad Hoc Analysis 3.2 for Slicing-and-Dicing Analytics).

7

ADOBE MARKETING CLOUD (OMNITURE SITECATALYST & AD HOC ANALYSIS)

Page 8: Analytics toolkits v13

Analytics Tools

AMC provides real-time data for a select subset of ~100+ metrics and is slowly ramping up capabilities to make all reporting real-time. Adobe provides multiple solutions for e-businesses to track UX of website visitors, tracking online campaigns effectiveness, Social Media Activity, SEO, SEM and Reporting on Product performance.

Output Delivery System

SiteCatalyst & Ad Hoc Analysis (erstwhile Adobe Discover) are cloud solutions which can also be accessed on Mobile via Apps.

Integration with Other Tools

Limited Data import(excel, csv, txt) functionality. Report exported in excel/pdf.AMC does provide data dump via FTP, which can then be utilized for additional analysis.

Type of Data it can handle

It typically works with Clickstream Data instrumented on Websites, Apps or Emails.Recent efforts to expand into Mobile Web/Apps.

Data/User Limitations (if any)

Data/user limitations dependent on service contract. However speed performance remains pretty stable with increasing size/users. However FTP speed varies on many factors.

Ease of Learning

Both SiteCatalyst and Ad Hoc Analysis are GUI based. SiteCatalyst and Ad Hoc Analysis require <=1 month of training on Business Analytics & Reporting. Large pool of hands-on and/or trained professionals. Lot of training materials are also available.

Cost

Cloud License: CPM $0.01 to $1. Per month or Annual? To check if Ad Hoc Analysis inclusive cost?

Operational Efficiency

6-12 months initial implementation. A significant effort should go into planning, esp on what metrics to implement, where and the naming conventions, since cost of errors significantly higher. Given the amount of required effort in implementation (Omniture expert+Dev+QA), if something goes wrong, it typically takes long & is costly to make changes. AMC requires dedicated trained professionals to manage the system.

8

Page 9: Analytics toolkits v13

Analytics Tools

Editorial & Tagging Capabilities

Editorial & Tagging Capabilities within SiteCatalyst/Ad Hoc Analysis is not sufficient. Most professionals maintain documentation outside of the system (MS-OFFICE etc.)

Visualization Options

SiteCatalyst and AMC provide standard visualization options – Tables, Charts, Click Maps, Funnels, etc.

Types of Aggregations possible

Profiling, not many advanced math functionalities.

Ideal for what type of users: Business Users (Product Managers, Marketers), Developers and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling).Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since AMC is very costly. For start-ups/organizations on a budget, Google Analytics is a cost-effective option.

Overview

Microstrategy is a leading reporting solution and has seen widespread acceptance among Large Enterprise Users.Microstrategy integrates with the warehouse and/or other secondary sources (typically after ETL).Microstrategy has recently expanded its Big Data connectivity and Advanced Analytics capabilities.

Output Delivery System

Microstrategy offers both on-premise and Cloud delivery solutions which can also be accessed on Mobile via Apps.

Integration with Other ToolsMicrostrategy has among the widest range of integrations possible from Warehouses to Hadoop to ODBC to XML export/import. Microstrategy cubes reside on the warehouse and so can be leveraged by other systems directly from there too.

Type of Data it can handle

Works with structured data. Hadoop plug-in available.

Data/User Limitations (if any)

9

MICROSTRATEGY

Page 10: Analytics toolkits v13

Analytics Tools

Depends on Service contract if user pricing. If requirements are significant, Customers buy an On-promise dedicated Microstrategy.

Ease of Learning

Reports/drilldown capabilities are GUI based. However coding in Microstrategy scripting language/SQL is required for report creation. Large pool of hands-on and/or trained professionals.Lot of training materials are also available.

Cost

Report User Pricing: $500-1K per Report receiver. Per month or Annual?Dedicated Server Pricing: >=$25K. Per month or Annual?

Operational Efficiency

6-12 months initial implementation, since Microstrategy experts (Programmers, Architects) required for setting up of reporting framework. Dedicated team required to manage Microstrategy reporting framework.

Editorial & Tagging Capabilities

Editorial & Tagging Capabilities within Microstrategy is pretty intuitive. Users can click on “Report Details Page” and figure out the underlying logic behind the reports & metrics. Microstrategy recommends both technical (SQL logic) and non-technical (plain english) commentary.

Visualization Options

Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word clouds which can be dynamically linked to the back end data.

Types of Aggregations possible

Profiling, simple & advanced math and statistical capabilities.

Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis and Correlation Analysis. Even though Sizing & Estimation possible, it’s not very easy to execute. Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since Microstrategy is costly. For start-ups/organizations with a limited scale, other cost-effective reporting options are available like warehouse packages, Tableau, Excel VBA reporting suite.

10

TABLEAU

Page 11: Analytics toolkits v13

Analytics Tools

Overview

Tableau is fast gaining ground among the business and non-tech analytical users on account of its powerful simplicity. It’s takes data from the warehouse and/or other secondary sources (typically after ETL).Data Import/Export, Analysis, Presentation (Tables/Graphs), Automated Reporting, Scenarios can all be done intuitively, quickly, seamlessly and transitioned with ease. Tableau is incorporating some statistical capabilities like simple predictive modeling in recent versions.

Output Delivery System

Tableau reports need to be created on a PC, but can be hosted on Cloud using Tableau server. Hosted Reports retain OLAP structure of the tables in the backend to facilitate on-the-fly slicing & dicing by the report consumers. Tableau now is also on Cloud and the outputs can be accessed using Apps.

Integration with Other ToolsTableau has among the widest range of integrations possible from Warehouses to Hadoop to ODBC to XML exports/imports.

Type of Data it can handle

Works with structured data. Hadoop plug-in available.

Data/User Limitations (if any)

Depends on Hardware Configuration.

Ease of Learning

GUI based. Requires 1-2 weeks for being able to leverage most of the features of Tableau. Large pool of hands-on and/or trained professionals. Lot of training materials are also available.

Cost

Individual PC Licenses cost between $1-2K. Annual Maintenance of $400. Server Licenses cost $1K per report receiver. Annual Maintenance of $200.

Operational Efficiency

Desktop framework takes minutes to install/use. Tableau server first installation needs some co-ordination effort between in-house DBAs and Tableau Support team. Timelines depends on complexity of the problem but rarely exceed a week. Once initial set-up is completed, no major help needed for ongoing needs/changes.

Editorial & Tagging Capabilities

11

Page 12: Analytics toolkits v13

Analytics Tools

Tableau provides many options for editorials – Title, Summary, sheet description for the reports and dashboard. Given the nature of report creation, types of Aggregation can be checked visually. “Describe option” talks more about the exact operation being done for Metrics.

Visualization Options

Amongst the widest range of visualizations provided – tables, charts, maps, heatmaps, word clouds which can be dynamically linked to the back end data.

Types of Aggregations possible

Profiling, simple & advanced math and some simple statistical capabilities.

Ideal for what type of users: Business Users (Product Managers, Marketers) and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Trend Analysis, Correlation Analysis and Sizing & Estimation. Tableau is the best tool for Sizing & Estimation and Scenario Analysis. Ideal for organization at what stage of Analytics Maturity: Tableau is useful for all types of users. However it suffers from lack of advanced analytics capabilities.

Overview

Flurry is a leader in Mobile App Reporting. Over 100,000 companies use Flurry Analytics in more than 300,000 applications to Reporting, Marketing Attribution and Operational Analytics.Flurry like Omniture “instruments” actions on the front end & campaigns outreach channels for the native Apps by integrating a SDK in the App libraries. This data is then tracked in their warehouse on the cloud and reporting happens on this data.Flurry also has other tools -

Output Delivery System

Flurry is a cloud solution.

Integration with Other ToolsFlurry offers capabilities to download the metrics to CSV on which additional analysis can be performed.

Type of Data it can handle

Flurry works on Activity data from the Apps directly.

Data/User Limitations (if any)

12

FLURRY

Page 13: Analytics toolkits v13

Analytics Tools

Flurry doesn’t impose restrictions on data size. However Business version also exists, which extends capabilities to xyz.

Ease of Learning

Flurry is GUI based solution. Requires 1-2 weeks for being able to leverage most of the features of Flurry. Large pool of hands-on and/or trained professionals. Lot of training materials is also available.

Cost

Basic version is free. Check Business Version

Operational Efficiency

<=30 minutes for basic integration - a small piece of SDK needs to be added to the App libraries and it starts tracking the standard metrics. Some custom events can also be defined in the App.Once initial set-up is completed, no major help needed for ongoing needs/changes.

Editorial & Tagging Capabilities

Metrics are standard and fixed on Flurry reports. However some custom events can be defined and tracked, whose definitions can also be tracked. Documentation on the reports available within Flurry.

Visualization Options

Standard visualization options – tables, charts, funnels.

Types of Aggregations possible

Profiling, simple math.

Ideal for what type of users: Business Users (Product Managers, Marketers), Operational Analysts, Developers and Analysts. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling)Ideal for organization at what stage of Analytics Maturity: Flurry is of a great help to Start-ups, individual developers and small scale organizations. Given that Flurry supports a smaller range of reporting/analytics it’s not ideal for mature organizations or large scale enterprises.

B. Business Analytics:

Business Analysts is one step further in the analytics food chain. They are entrusted with responsibility of making sense of data deluge; find hidden patterns, explaining fluctuations (up or down), sizing opportunities and high level projections. They play a critical role in enterprise decision making. They leverage reports or might query the data sources directly to answer the various business questions.

13

Page 14: Analytics toolkits v13

Analytics Tools

Factors for deciding Tools for Business Analytics

Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the order of priority.

Primary1. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if

Hadoop Connectivity for Big Data analysis?2. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Correlation Analysis

(pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios3. Visualization Options: Tables, Charts, Maps, Heatmaps, etc. which can be dynamics

(slicing/dicing enabled) visible across all channels4. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs,

Scalability Costs, and cost of resources

Secondary1. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained

resources, Training materials & Training cost?2. Integration with other tools: How easily/seamlessly can it connect to various other

tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)?

3. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues?

4. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability.

5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, Alerts, Tweets, Social Shares, etc.

14

Page 15: Analytics toolkits v13

Analytics Tools

Now let’s look at each tool’s capabilities in detail,

OverviewMS-Excel is a spreadsheet application packaged in MS-OFFICE. It’s the most widely used tool for Business Analytics and has seen more powerful additions required to do more sophisticated analysis in recent years.It also has a programming language, VBA, which enhances power for reporting/automation needs.

Type of Data it can handleExcel requires a traditional table structures (rows and columns of data)

15

MS-EXCEL

Page 16: Analytics toolkits v13

Analytics Tools

It also has plug-ins which can connect it to Hadoop/PIG at the back end.

Type of AnalyticsMS-EXCEL is typically used for Aggregate Analytics (Descriptive, Profiling), Correlation and Trend Analysis, Sizing & Estimation and Simple Predictive Modeling & Time Series Forecasting.Recent versions have seen added advanced statistical and math functionalities.

Visualization optionsRecent versions incorporate sophisticated, dynamic and powerful graphing options –both static and dynamic (pivots).

CostExcel PC version comes packaged within MS-OFFICE. Office360 cost TBD?

Ease of LearningExcels popularity stems from a very intuitive and easy-to-learn GUI. Low learning curve (1-2 weeks) to be able to use for less sophisticated business analysis/reporting. VBA coding requires a month of hands-on learning to realize full potential. Large pool of hands-on and/or trained professionals. Lot of training materials are also available.

Integration with Other toolsExcel can be accessed using PC, Cloud(Office360) and through Apps on Smartphones. Most major tools have Excel import/export options. Excel also have XML import/export capabilities.

Data/User Limitations (if any)Latest versions can handle max of 1 MM rows. However recent extensions like Power Pivot can handle upto 10 MM rows.

Operational EfficiencyExcel gets installed automatically as an office package (<=2 hrs max). Cloud360 TBD?. Power pivot and other extensions can be added as plug-ins online.

Output Delivery SystemExcel outputs can be accessed on PC, Cloud(Office360) and via Smartphone Apps.

Ideal for what type of users: Non-technical users, not requiring handling of large datasets and doing high level analytics (simple analysis, reporting, simluations, scenarios or modeling). Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation/Trend/Sizing & Estimations. Ideal for organization at what stage of Analytics Maturity: Useful for all organizations as a simple, cost effective tool for simpler analytical tasks.

Overview

16

HIVE

Page 17: Analytics toolkits v13

Analytics Tools

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. While initially developed by Facebook, Apache Hive is now used and developed by other companies such as Netflix.Apache Hive stores metadata in a RDBMS, significantly reducing the time to perform semantic checks during query execution.It has built-in User Defined Functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use-cases not supported by built-in functions.

Type of Data it can handleUnstructured/Structured data in Hadoop.

Type of AnalyticsHive can be used for Aggregate Analytics (Descriptive, Profiling). User Defined Functions (UDFs) can be created for advanced querying needs – Trend Analysis, Correlation Analysis, Sizing & Estimation.

Visualization optionsTBD

CostCloudera or HortonWorks pricing packages.

Ease of LearningMedium learning curve (1-3 months) to be able to use for business analysis/reporting. Given the increase in Big Data interest, pool of hands-on and/or trained professionals is growing. Training materials/content for Analysts are being ramped up.Cloudera is the leader in training professionals on HIVE, PIG and Impala. It has dedicated training modules for Developers, DBAs & Analytics professionals.

Integration with Other toolsTBD

Data/User Limitations (if any)TBD

Operational EfficiencyTBD

Output Delivery SystemTBD

Ideal for what type of users: Technical Users but who are comfortable with SQL coding and wouldn’t prefer advanced scripting. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation, Text Mining.

17

Page 18: Analytics toolkits v13

Analytics Tools

Ideal for organization at what stage of Analytics Maturity: Organizations ramping up the Big Data framework in their organizations.

OverviewKsuite is a suite of products developed by Kontagent. Ksuite has three major tools – Ksuite Mobile, Ksuite Social and Ksuite DataMine.Ksuite Mobile is mobile app activity reporting tool and Ksuite, a social metrics reporting tool – targeted for Business Users. Ksuite DataMine is advanced tool targeted for Analysts who need to go beyond charts/tables and understand what’s happening behind the scenes. Ksuite is a SQL like Querying platform. Ksuite like Omniture “instruments” actions on the front end & campaigns outreach channels for the native Apps by integrating a SDK in the App libraries. This data is then tracked in their warehouse on the cloud and reporting happens on this data.Ksuite is a real-time monitoring platform.

Type of Data it can handleIt operates on the App activity data stored on its cloud.

Type of AnalyticsKsuite helps with Aggregate Analytics (Descriptive, Profiling).

Visualization optionsBroad range of advanced visualization options – Tables, Charts, etc.

CostDepends on data and number of apps tracked in Ksuite. Costs >$2,000 per month.

Ease of LearningLow learning curve (1-2 weeks) to be able to use for business analysis/reporting. Ksuite also provides Mobile Analysts and Data Scientists for Consulting. Large pool of hands-on and/or trained professionals.Lots of training materials are also available.

Integration with Other toolsKontagent provides FTP data pipe using which raw data dump can be taken for additional analysis inhouse.

Data/User Limitations (if any)Depends on Service contract, since pricing is data size dependent.

Operational EfficiencyKontagent installation takes minutes, since only the SDK has to be integrated with the App. Kontagent also provides Mobile Analysts/Data Scientists as Consultants to assist with anything during or after installation.

18

Ksuite

Page 19: Analytics toolkits v13

Analytics Tools

Output Delivery SystemKsuite is a cloud solution. Ksuite Mobile can be accessed via App.

Ideal for what type of users: Non-technical users/Analysts. Best suited for efficient reporting and high -level analytics. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling), Simple Correlation/Trend. Ideal for organization at what stage of Analytics Maturity: Useful for established App developers with scale, since Kontagent can be expensive. Flurry could be a cost-effective solution for organizations on budget or individual developers.

C. Advanced Analytics:

Advanced Analytics can be quickly summarized as making sense of the data through in-depth analysis beyond normal business analytics. It could be advanced Text mining (parsing of unstructured data) or statistical (predictive or driver) analysis.

I. Front-end Analytics/Machine Learning:

Front end Analytics is performed on the raw front end tables. Two broad types of data in the front end tables are,

Instrumentation Data: record of activity, on live business site, captured via instrumentations Call Log Data: dump of server calls from the live business site and what was delivered

Front end Analytics differs from Business Analytics in the scope of deliverables. Traditionally biggest users of Front end Analytics were Operational Users (e.g. IT Ops, Security) to monitor site stability, security breaches, etc. However given the richness of the data from being close to user activity, businesses have started performing Machine learning on this data to deliver more upstream solutions like Transactional marketing (offer Credit Card to an ATM user or Netflix recommendations). Tools need to be able to do String Operations, Text Mining and Associativity Analysis apart from usual profiling and descriptive analysis.

Factors for deciding Tools for Front End Analytics/Machine Learning

Below is the list of factors that should be considered to zero-in on a tool. We have listed them in the order of priority.

1. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Machine learning (Text Mining, String Operations, Associativity Analysis) & Operational Analytics (Alerts, Control Charts)?

2. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if Hadoop Connectivity for Big Data analysis?

3. Data/User Limitations (if any): Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues?

4. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained resources, Training materials & Training cost?

19

Page 20: Analytics toolkits v13

Analytics Tools

5. Output Delivery System: Channels in which the results can be accessed - Mobile App, Cloud, PC, Mails, Text, etc.

6. Integration with other tools: How easily/seamlessly can it connect to various other tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)? Front end delivery systems?

7. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve, Support availability.

8. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources

A plethora of tool are available in the market for Front end Analytics, hence the need for a structured decision making process like above, so that you end up with the tool satisfying most of your needs.

Table 4 gives a bird’s eye view of how each of the top tool sizes up against the criteria mentioned above.

Now let’s look at each tool’s capabilities in detail,

20

SPLUNK

Page 21: Analytics toolkits v13

Analytics Tools

OverviewSplunk is the leader in API data Analytics (Analysis of API Logs data). Used in Operational Reporting & Analytics. Splunk is a cloud solution, where the Customers dump their data and use Splunk Text Processing technology for the analytical/reporting requirement.

Type of AnalyticsSplunk text analytics tool is primarily an operational analytics tool but can be leveraged for Business Analytics, Machine Learning & Reporting also.Aggregate Analysis (Descriptive, Profiling). This data can be then analyzed in other tools. Recently some advanced math & statistical analytics capabilities have been added to SQL.Check?

Type of Data it can handleIt typically works with API Logs Data which record the service calls from the front end. Data type could be structured/unstructured as text or name -value pairs. Splunk recently launched HUNK- Hadoop connectivity tool.

Data/User Limitations (if any)Query speed depends on size of data. Max Size of data on Splunk Cloud is specified by service contact.

Ease of LearningSplunk coding typically involves Regular expressions, PERL coding, but it also has a GUI.It requires 1-3 months hands-on learning to familiarize with all capabilities of Splunk.Large pool of hands-on and/or trained professionals.Lot of training materials are also available.

Output Delivery SystemSplunk is a cloud based solution, but its reports can also be accessed via Mobile Apps.

Integration with Other toolsTBD

Operational Efficiency<=1 month for data FTP to be established. Once the data pipes are set-up, reporting/analytics set up can be ramped up in another month. One DBA is sufficient for maintaining/monitoring/troubleshooting the system. A warehouse DBA can double up as Splunk Manager since protocols are similar.

CostData Size(amount of data indexed daily) Pricing. Perpetual License ($5K)+Annual Maintenance (20%) fees.

Ideal for what type of users: Operational Analytics or Front data data profiling needs. Users with some Regular expressions Coding experience needed to build reports/perform analysis. Ideal for what type of analytics: Aggregated Analysis (Descriptive analysis & Profiling).Ideal for organization at what stage of Analytics Maturity: Preferred as Enterprise tool, since Splunk is very costly. If the scale is not a problem and in-house programmers are available then same analytics

21

Page 22: Analytics toolkits v13

Analytics Tools

can be performed using scripting languages like PERL/Python. There are some text analytics tools like PolyAnalyst which can also double-up as Operational Analytics tool if there FTP can be easily established. There are other inbuilt tools in other front-end monitoring systems too.

OverviewMegaputer took birth after development of ground-breaking techniques in machine learning by Moscow State University and Bauman Technical University at MoscowTheir flagship product PolyAnalyst (a suite of reporting+text mining solutions) has been consistently getting rave reviews from peers, users and industry and is now deployed by 8+ US Federal Agencies, 200 Universities, 20 Fortune 100 Companies and so on.TextAnalyst and X-SellAnalyst are two niche products developed for specific user groups. The USP of these products are that enable non-technical users to perform sophisticated analysis easily, quickly and at a larger scale.

Type of AnalyticsPolyAnalyst is a powerful text mining tool which can also be used for Aggregate (Descriptive, Profiling), Trend & Correlation Analysis, Advanced Text Mining, Predictive Modeling, Segmentation, Natural Language Processing and Machine Learning. Its strength is bringing together analysis of traditional statistically analyzable data with non-traditional unstructured text data. TextAnalyst is a dedicated Natural Language Processing tool (based on linguistic and neural network model), which is most beneficial for summarizing huge volume of text data, Summarization, Clustering of Text, etc. X-SellAnalyst is a cross sell recommendation engine (sold as COM component) that works real-time at Point-Of-Sale. It analyzes historical transactions, profitability, recency and other metrics for analysis.

Type of Data it can handlePolyAnalyst can connect to both RDBMS warehouses through ODBC drivers and also work with Unstructured Text data. Integrates with Microsoft Data Transformation Services and similar software. TextAnalyst can connect to text repositories on PCs, Web and in libraries, news agencies, etc.X-SellAnalyst works with any RDBMS warehouse (structured data).

Data/User Limitations (if any)PolyAnalyst: Depends on hardware configuration. Claims quick processing of gigabytes of data and that the productivity can be increased by using 64 bit and cluster server architecture.TextAnalyst: X-SellAnalyst: Fast response time (<1 sec for 100K products in portfolio). Scales well with large scale data. Calculation time increases linearly based on number of products already purchased.

Ease of LearningGUI driven. No coding required. However some training necessary to understand all features and functionalities available in the tool and how best to leverage them.

22

MEGAPUTER (POLYANALYST, TEXTANALYST & X-SELLANALYST)

Page 23: Analytics toolkits v13

Analytics Tools

Megaupter provides training to facilitate Customer Teams to start using the tools to their full potential. It claims <=2 weeks training for complete hands-on independence. Availability and abundance of 3rd party training materials unknown.

Output Delivery SystemPolyAnalyst: Resides on PC. Automated email alerts/logs functionalities. Organization wide sharing features provided.TextAnalyst: X-SellAnalyst integrates with Web/Transaction Server to offer recommendations for Cross sell on the fly.

Integration with Other toolsTBD

Operational EfficiencyTBD

CostTBD

Ideal for what type of Users & Analytics: PolyAnalyst: Non-coding Data Analysts with sophisticated Text Mining needs.TextAnalyst: Non-coding users looking for a quick black-box language processing tool. Journal Editors, Researchers, Scientists, Investment Bankers, LawyersX-SellAnalyst: Retailers (Online & Offline) & Call Centers with needs to increase speed/RoI of cross-sales for a large volume. Ideal for organization at what stage of Analytics Maturity: Depends on when the organizations needs advanced text mining and the budget. X-SellAnalyst resembles a solution which solves large scale problem.

B. Statistical Analytics:

To be able to predict something correctly has always captured the fancy of humankind. Game of odds can be seen everyone around us – games, elections, stock markets, etc. We are all always surrounded by decisions where the future is unknown and uncertain and no one can get it right all the time in all the questions. No one is required to be able to predict future with 100% accuracy, all we want is someone with a vision, a foresight. With the advance in sciences and mathematics where scientists come up with formulae and equations that can relate one thing with another in a fairly reliable way, the same principles and thoughts have been formulated into the discipline of “Statistics” and Economics has proved to be an ardent follower of these rules and laws. With the proven success of Statistics in Economics why would business leaders stay behind, they started applying the same discipline in running business – predicting odds of something happening, predicting the directions of market, forecasting inventory and sales, etc. Thus took birth the era of Statistical Business Analytics. Over time, many tools were developed and used by academicians in schools and universities and Statisticians and Analysts in corporate world but few could keep up with changes in technologies and techniques. Some have stayed, grown and matured with the market and requirements; some have lagged behind and lost in history with golden mention. Some still find application in niche industries,

23

Page 24: Analytics toolkits v13

Analytics Tools

academia, government, research institutions and trading floors, some were acquired as part of vertical integration by larger players in other domain and some have grown into billion dollar entities. Matlab falls predominantly in first group, SPSS in second and SAS in third. And finally some challengers have taken birth, whose meteoric rise is a tale of legends and are here to stay and become even more mainstream – R falls in this bucket. Let’s first look at the factors to decide what tool to use when followed by broader description of each of them.

Factors for deciding Tools for Advanced Analytics

Primary6. Type of data it can handle: Structured tables, Clickstream data, Unstructured Text dump & if

Hadoop Connectivity for Big Data analysis?7. Ease of Learning: Does it have GUI, How much is it Coding dependent, Availability of Trained

resources, Training materials & Training cost?8. Type of Analytics: Aggregate Analytics (Descriptive Analytics, Profiling), Text Mining, Correlation

Analysis (pre-post, A/B), Trend Analysis, Sizing & Estimation, Scenarios, Predictive Analysis, Time Series Forecasting, Segmentation (Decision Trees and Clustering), Life Cycle analysis

9. Cost: License types and fees (Single User and Server), Implementation costs, Operational Costs, Scalability Costs, and cost of resources

Secondary10. Integration with other tools: How easily/seamlessly can it connect to various other

tools/systems both for output delivery or connecting to multiple data sources through ODBC or other data pipes (Hadoop connectivity)?

11. Visualization Options: Ease of understanding and communicating insights through Tables, Charts, Maps, Heatmaps, etc. with commenting and delivered across all channels

12. Data/User Limitations (if any) : Specific data/user limitations, Query performance with increase size or complexity, flexibility in data modeling, scalability issues?

13. Operational Efficiency: How easy/quick/cheap to implement? Dedicated management team needed or Self-Serve Support availability.

24

Page 25: Analytics toolkits v13

Analytics Tools

OverviewSAS has traditionally been a leader in the Analytics Industry. SAS creates solutions for a wide variety of analytics across many industries and domains from Banking to Pharma. It has capabilities to host an Enterprise Data Warehouse, Business & Advanced Analytics, Executive Reporting & Regulatory Compliance (e.g. BASEL II) and Analytical Solution Deployment (e.g. Credit Score based Decision Framework).

Type of Data it can handle

25

SAS

Page 26: Analytics toolkits v13

Analytics Tools

SAS requires a traditional table structures (rows and columns of data). SAS also has abilities to host an Enterprise Data Warehouse dedicated to serving Analytical needs effectively and efficiently. SAS DataFlux module extends capabilities to handle unstructured text data. It also has plug-ins which can connect it to Hadoop/PIG at the back end.

Ease of LearningSAS coding requires 1-6 months of training to be able to do Business/Advanced Analytics & Reporting. However the GUI version of SAS (SAS JMP) which is good for quick analysis requires <=1 month of hands-on exposure. Large pool of hands-on and/or trained professionals. Lot of training materials are also available.

Type of AnalyticsSAS works on “Modules” concept - a module is a dedicated solution set, e.g., ETS module for Time Series Forecasting. SAS foundation sits on BASE and STAT module which contain data preparation and some statistical modeling capabilities. This module can also support many a widely used statistical analysis – A/B Testing, Clustering, Correlation and Trend Analysis.However for other additional features like Decision Trees, Time Series, Text Mining, etc. dedicated modules have to be bought separately. SAS Eminer is the End-to-End tool with GUI frontend (with functions as drag-&-drop nodes). Sold at a premium.

CostBASE/STAT SAS PC licenses can cost between $8-10K per license. Annual Maintenance $3K BASE/STAT SAS Server licenses can cost $20-30K. Annual Maintenance TBD? Significant scaling costs to include additional techniques. E2E Eminer suite costs a premium TBD

Integration with Other toolsSAS requires a PC (Desktop or Laptop) for querying/analysis. However SAS outputs can be taken across many platforms through reporting/delivery modules and/or 3rd party integrations.

Visualization OptionsSAS offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials.

Data/User Limitations (if any) (Data size/users)SAS has no limitation per se. Limitations dependent only on Hardware configurations or Warehouse connections. Certain plug-ins and modules can handle huge quantities of data (TBs).

Operational Efficiency<=2 hrs for desktop. Server installation <=2 weeks of IT effort. Complex installations (advanced server configurations, certain modules esp. EMiner, etc.) need support from SAS.

26

Page 27: Analytics toolkits v13

Analytics Tools

Ideal for what type of users: Advanced Users with high end statistical needs but less complex coding/GUI driven. Typically suited for large enterprises or entities/teams with sufficient budget that can match with scaling costs (even though BASE/STAT modules can answer many needs for some specific needs additional modules need to be purchased). SAS best suited for large scale end-to-end analytical framework. Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics. Ideal for organization at what stage of Analytics Maturity: SAS adoption more driven by budget available since SAS has modules for most of the statistical needs.

OverviewR is quickly becoming a leader in the Analytics Industry. R was developed as an Open Source alternative and was very popular in the Academia/Research circles. However with its value being proved there, it quickly gained ground in the corporate arena as a cost-effective powerful tool.

Type of Data it can handleR can take data from multiple sources through ODBC connectivity and various libraries. It also has plug-ins which can connect it to Hadoop at the back end.

Ease of LearningR is a coding-intensive tool and hence requires 1-12 months of training to be able to do Business/Advanced Analytics. Recently there have been attempts to bring in GUI. Given the growing popularity, pool of hands-on and/or trained professionals is growing in recent years. Lots of training materials are also available.

Type of AnalyticsR works on “Libraries” concepts - these are “function-like” scripts which can carry out specific functionalities, e.g., Logistic Models or Decision Trees. R has 3000+ libraries of advanced statistical techniques over the entire spectrum from Aggregated Analytics to Text Mining to Predictive Analysis. Capabilities of R keeps extending with new libraries being added and in-memory limitations being overcome in some proprietary solutions. It also was one of the pioneers in bridging Big Data with advanced analytics needs.

CostRevolution R packages-PC License $1000, Server License >=$25K R has “Zero Functionality Scaling Cost”- just use the new library to solve a specific problem instead of buying a new module for every new problem.

Integration with Other toolsR requires a PC (Desktop or Laptop) for querying/analysis. However R outputs can be taken across many platforms through reporting/delivery integrations.

27

R

Page 28: Analytics toolkits v13

Analytics Tools

Visualization OptionsR offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials.

Data/User Limitations (if any) (Data size/users)R works on in-memory functionalities, hence suffers from RAM limitations. However some proprietary versions like Revolution R overcomes those limitations via huge parallel processing. TBD?

Operational Efficiency<=2 hrs for desktop. Complex Server installations need support from vendors.

Ideal for what type of users: Advanced Users with high end statistical needs and willing/able to write complex codes. Typically used by start-ups/small organizations with constrained budget, but enough time/resources’ flexibility to spend on training and implementing R. Ideal for what type of analytics: Most type of Analytical needs from basic to advanced statistics. Ideal for organization at what stage of Analytics Maturity: R adoption more driven by budget and complexity of needs. Biggest adoption of R is in Academia/Research institutions with needs that can’t be addressed by other commercially available solutions.

OverviewKS is famous among non-tech users primarily because it offers an intuitive, easy to learn/execute GUI for advanced statistical techniques. KS tools are used in broad range of domains from BASEL to Fraud protection to Loyalty programs.

Type of Data it can handleKS requires a traditional table structures (rows and columns of data) It’s currently missing plug-ins to Hadoop/PIG.

Ease of LearningKS GUI requires <=1 month of training on KS/Strategy Builder. Large pool of hands-on and/or trained professionals. Lot of training materials are also available.

Type of AnalyticsEven though KS has a broad set of statistical capabilities, it’s especially regarded for Decision Trees and Strategy Builder functionality. It offers a decent, cost-effective end-to-end framework (analysis to scenarios) which is sufficient for most non-tech users. Its primary limitation is scale, automation and advanced user needs (macros, loops, advanced statistical techniques).

CostIndividual PC license -TBD

28

KNOWLEDGE SEEKER

Page 29: Analytics toolkits v13

Analytics Tools

Knowledge SeekerKnowledge Studio Strategy Builder

Server license -TBDKnowledge SeekerKnowledge Studio Strategy Builder

Integration with Other toolsKS requires a PC (Desktop or Laptop) for querying/analysis. It offers “In-Database Analytics mode” to perform data mining directly within databases (Teradata, SQL Server, ORACLE and Netezza).

Visualization OptionsKS offers many visualization options with comments on what each output stands for. Further flexibility provided within coding framework to include editorials.

Data/User Limitations (if any) (Data size/users)TBD?

Operational Efficiency<=1 hr for desktop. Complex Server installations need support from vendors.

Ideal for what type of users: GUI users with needs for Advanced Statistical Techniques. Marketing Professionals and Product Manager (in Financial Services Domain) typically favor this not only for Statistical Modeling but also the Strategy Builder Project which offers excellent Scenario Analysis capabilities. Ideal for what type of analytics: Decision Trees, Scenario Building. Ideal for organization at what stage of Analytics Maturity: KS adoption is primarily driven by user technical coding flexibility. KS and Strategy Builder together may cost >$5K and so are also dictated by budget.

Other Worthy Mentions

Given the broad spectrum of data consumption from Reporting to Business Analytics to various types of Advanced Analytics with flavors of Big data integrations, type of analyzable data (Video, Social comments, Location, etc.), platforms analyzed (Web, Mobile, Tablets and now Google Glass), focus on functions (Sales, Dev, PM), industry (High Tech, Banking, Pharma, etc.) no one list of Tools can do justice to all the tools available in the market. Ours was a humble attempt to bring to you a list of strong contenders which are instrumental in driving analytics in many areas.

In this section of the chapter, we list down a few noteworthy tools who didn’t appear in the list above, but which are leaders in themselves and/or are expected to become a force in near future.

Google Analytics

Google Analytics is the default Analytics choice for many Small and Medium Enterprises(<=10 Million hits per month and <=50 rows of data in reports), since it offers a broad suite of Reporting/Analytics

29

Page 30: Analytics toolkits v13

Analytics Tools

solutions for free. It’s quick and easy to set-up, helpful in defining & monitoring KPIs. Data refresh happens every 24 hours. Reports are best suited for KPI tracking, Advertising, Multi-Channel ,Social , Mobile & Video tracking. It can also be leveraged for Aggregate Analytics (Descriptive Analysis & Profiling). However the biggest limitation is Enterprise Scalability, even Premium Version can support a max of 1 Billion Hits per month. Also Google Analytics KPIs allows App creation on the data but doesn’t support data transfer via FTP yet. All said and done, Google Analytics is among the best RoI tool investment for individual developers and SMEs.

RapidMiner

Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open-source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.

RapidMinerThe main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products.

RapidNetRelation and Net explorer – identifies interrelationships in the data, define KPIs at nodes and intersperse geo relationship on Maps.

RapidSentilyzerRapidSentilyzer provides all relevant customer and market information in a single real-time system. It combines efficient crawling techniques with the power of data and text mining and automatically categorizes the latest news according to sentiments and opinions. The RapidSentilyzer BuzzBoard can easily be inspected and gives all necessary information in a central place. This is the way competitive intelligence and customer intelligence has to look like.

RapidDocAutomated Document classification engine offered over web.

IBM Analytics

IBM carried forward it’s warehousing expertise into the new ‘Analytics Era” through acquisition of industry Stalwarts like Cognos for Reporting/Business Analytics and SPSS for Advanced Analytics capabilities. With them, IBM now has a comprehensive, unified portfolio of business analytics software (Cognos, SPSS, OpenPages and Algorithmics) with capabilities from Data Storage to Processing to Reporting to Business and Advanced Analytics and even Analytics Delivery Management. Based on open standards, IBM business analytics products can be used independently, in combination with each other, and as part of broader solutions to key business challenges.

IBM SPSS products

30

Page 31: Analytics toolkits v13

Analytics Tools

IBM SPSS predictive analytics software facilitates statistical analysis, data and text mining, predictive modeling and decision optimization to anticipate change and take action to improve outcomes.

IBM Cognos productsIBM Cognos business intelligence and performance management software provides the integrated dashboards, scorecards, reporting, analysis, and planning and budgeting capabilities to gain and act on fact-based insights.

IBM OpenPages productsOpenPages GRC software allows organizations to manage enterprise operational risk and compliance initiatives using a single, integrated solution.

IBM Algorithmics productsAlgorithmics software helps businesses gain transparency into financial risks in advance, providing information that is vital to organizations.

SAP Analytics

SAP is a world leader in Enterprise software applications. It has now forayed into advanced data insights world with the acquisition Business Objects and HANA product suites.

SAP Business Objects ProductsSAP Business Objects suite contains solutions from BI platform management to OLAP capabilities to Reporting solutions (customizable for various types of delivery – Lumira, Crystal Report and ESRI integrations). Lumira helps in delivering self-service reports on cloud. Crystal Reports assists in integrating reports within Business Applications and Processes. ESRI integration is for geo-spatial reporting.

SAP Predictive Analytics & HANASAP Predictive Analytics solution offers intuitive framework for building complex Analytical models. It can work with existing data environment as well as with the SAP BusinessObjects BI Platform to help mine and analyze data.

SAP HANAHANA is new in-memory platform offered by SAP to increase speed of Analytics/Reporting solutions rapidly.

ORACLE Analytics

ORACLE extended its leadership in Data Storage solutions to Business Analytics with acquisition of Hyperian Essbase and launch of Advanced Analytics solution kit.

Oracle Hyperion Enterprise Performance Management combines market-leading performance management applications with powerful analytics to align financial close, planning, reporting, analysis, and modeling and unlock business potential. It helps customers leverage their ERP investments through seamless data and process integration with Oracle E-Business Suite, PeopleSoft, JD Edwards, Fusion, SAP and other ERP applications. Flexible deployment options

31

Page 32: Analytics toolkits v13

Analytics Tools

include on-premise, cloud, or on engineered systems designed for high performance and scalability.

Oracle Hyperion Enterprise Performance Management delivers a comprehensive, integrated suite of applications featuring common Web and Microsoft Office interfaces, reporting tools, mobile information delivery, and administration. Best-in-class, in-memory analytics software and hardware (optimized to work together) combines planning at the speed of business with unique and powerful strategic and predictive modeling capabilities that improve analytic insight. Best suited for Strategy Management, Planning, Budgeting and Forecasting, Financial Close and Reporting and Profitability and Cost Management.

Oracle Business Intelligence Enterprise EditionDelivers a robust set of reporting, ad-hoc query and analysis, OLAP, dashboard, and scorecard functionality with a rich end-user experience that includes visualization, collaboration & alerts.

Makes corporate data easier for business users to access. Provides a common infrastructure for producing and delivering enterprise reports, scorecards, dashboards, ad-hoc analysis, and OLAP analysis. Includes rich visualization, interactive dashboards, a vast range of animated charting options, OLAP-style interactions and innovative search, and actionable collaboration capabilities to increase user adoption. Reduces cost with a proven Web-based service-oriented architecture that integrates with existing IT infrastructure. It also has Mobile BI, Real Time Decision Management and Big Data Solutions.

Analytic ApplicationsORACLE offers a pre-configured suite of Analytics solutions for various business roles, product lines and industries.

Market Share Research

Gartner publishes annual performance report of business intelligence (BI), corporate performance management (CPM) and analytics applications/performance management software. Revenue totaled $13.1 billion in 2012, a 6.8 percent increase from 2011 revenue of $12.3 billion, according to Gartner, Inc. Tough macro conditions and confusion related to emerging technology terms led to more muted market growth than in previous years.

Source: Gartner Research http://www.gartner.com/newsroom/id/2507915

Table 5: Top 5 BI, CPM and Analytic Applications/Performance Management Vendors, Worldwide, 2011-2012 (Millions of Dollars)

Company 2012 Revenue 2012 Market Share (%) 2011 RevenueSAP 2,902.5 22.1 2,884.0Oracle 1,952.1 14.9 1,913.5IBM 1,625.6 12.4 1,478.8SAS 1,599.7 12.2 1,542.9Microsoft 1,189.3 9.1 1,059.9Others 3,861.90 29.3 3,416.00Total 13,131.1 100.0 12,295.1

Note: SAP reports in Euros, and faced currency head wind that hampered growth in USD.Source: Gartner (June 2013)

32

Page 33: Analytics toolkits v13

Analytics Tools

While all five of the top five BI software vendors retained their top five status, IBM and SAS exchanged places to move IBM into third position and SAS into fourth (see Table 1). IBM grew 9.9 percent in 2012, with revenue of $1.6 billion. The top five vendors together accounted for 70 percent of the total BI software market revenue.

In first place, SAP once again had significantly higher revenue than any other vendor at $2.9 billion with 22.1 percent of the market, although this was up by just 0.6 percent from 2011. Second-place Oracle's revenue grew by 2.0 percent from 2011 to reach $1.9 billion. Fifth-place Microsoft enjoyed the highest growth of the top five vendors in 2012, with revenue rising by 12.2 percent compared with 2011, to reach $1.2 billion.

Chapter Summary

This chapter attempts to impart an intuitive sense of the data movement in the organizations and how it flows from the front end systems to the back end analytical engines and back to consumers as different services, e.g., personalized offering, information or better customer service. Data is consumed by decision makers in various ways as reports informing them about the portfolio condition or as key insights and recommendations from Analysts. A plethora of tools are available in the market to facilitate efficient and effective insights generation, hence the users are recommended to put on a examining lens of factors suggested above to decide on what tool will best serve their needs. The above chapter is just a small door in the bigger universe of ever-evolving tools available for specific functions and readers are recommended to perform their own research before deciding on them.

Pending content: Flowchart of decision making

33