the big picture on big data and cognos

3

Click here to load reader

Upload: senturus

Post on 14-Apr-2017

35 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: The Big Picture on Big Data and Cognos

The Big Picture on Big Data and Cognos www.senturus.com /blog/big-picture-big-data-cognos/

August 1, 2016 Business Strategy & Perspectives

IBM has a long history of supporting major open source projects and the most widely adopted open standards. Theirenterprise customers have benefited from the flexibility, choice, and innovation that come with the open sourcephilosophy. Major projects include SOA (Service-Oriented Architecture), Linux, Eclipse, and now Hadoop. The bigdata analytics open source offering is known as the IBM Open Platform with Apache Hadoop. The commercial sideof this platform, announced in early 2015, is a suite of products for the enterprise branded as BigInsights.

To better understand IBM's big data offerings around Hadoop and its open data platform, it is helpful to put this incontext of the overall vision for the platform and the three phases of the IBM Big Data Analytics lifecycle:

1. Pull in all types of data from disparate sources

2. Put the data into a business context

3. Produce intelligent, data driven business outcomes, for example, operational efficiency, customerengagement, or risk management

IBM endeavors to cover a lot of business territory with its analytics platform. For the enterprise IT department, thetechnology enables data integration, governance, security, and regulatory compliance. For line of businessmanagers, the analytics environment is the home of customer and operational intelligence. While analytics play animportant role in increasing operational efficiency and eliminating business process bottlenecks, it is the customer-centric analytics that have captured the imagination of business executives. Big data analytics offers manyopportunities for improving customer relationships and increasing engagement across marketing channels.

A common big data use case is delivering relevant promotions to customers. We all share the experience ofreceiving credit card offers in the mail from the bank and tossing the envelope directly into the recycling bin withouteven thinking about it. Despite the dismal response rate, it was cost effective for the bank to send the same directmail piece to everyone. With a big data platform, it is possible to develop customer profiles and create targetedoffers for each segment. For example, customers that have a single account and a short customer history would becandidates for a different array of promotions than someone who has been a customer for decades. The cost ofamassing enough data and having the processing power to crunch the numbers in a timely fashion has dropped

1/3

Page 2: The Big Picture on Big Data and Cognos

enough to make it profitable to do so.

With digital advertising and social media data, analysis is required on huge amounts of unstructured data. A coupleof years ago this was experimental at best, but now Hadoop software enables capturing and processingunprecedented amounts of data. It complements the enterprise data warehouse and is an integral part of thebusiness intelligence ecosystem.

Open Data Platform ODPi

The ODPi open data platform is a consortium of IBM and 18 other enterprise software vendors working together tomaximize the adoption of technologies based on Apache Hadoop. The goal of ODPi is to accelerate softwaredevelopment by providing a standard Hadoop solution on which an applications can be run, whether it iscommercial software, open source, or custom code developed in-house. This gives enterprise customers assurancethat they are not locking themselves into a single vendor's Hadoop solution. It also permits using a Hadoopimplementation with products from multiple vendors. For Hadoop to fulfill its role as an enterprise data source, itmust accommodate a broad audience who will be using many different applications.

To that end, the ODPi provides a core platform of agreed on and tested big data Apache Hadoop modules. This isthe ODPi standard, on which the vendors build their applications. For example, Hortonworks, IBM Open Platform4.0 with Apache Hadoop, EMC Pivotal HD 3.0, and Infosys IIP all adhere to the ODPi standard. Analytics softwarevendors or in-house development shops can concentrate on developing applications further up the stack, knowingthat the Hadoop core adheres to a standard and its application will interoperate with any compliant Hadoop system.This accelerates development, promotes code re-use, and simplifies the technical architecture. Implementing aHadoop distribution that adheres to the ODPi standard means not being locked into a proprietary technology.

As a standard, only time will tell if the ODPi will have a lasting impact. The organization has been criticized as beingnothing more than a joint marketing effort for vendors pushing their own commercial flavor of Hadoop. Also to noteare the big data vendors who are conspicuous by their absence: Cloudera, MapR, and Amazon (AWS – EMRElastic MapReduce).

IBM BigInsights and Cognos

On top of Hadoop, IBM has developed a suite of big data and analytics tools under the BigInsights brand. There aretools for scaling and managing the platform (BigInsights Enterprise Management), a machine learning engine(BigInsights Data Scientist – Decision Trees, PageRank, Clustering) and a data exploration and discovery tool(BigSheets). Of particular interest to Cognos customers is BigSQL which runs SQL queries against Hadoop or inother words, BigSQL permits Cognos to use Hadoop as a data source.

This is interesting as data stored in Hadoop only becomes useful when it is put into a business context. CognosAnalytics (V11) is well suited for this role. It is a powerful tool for BI developers and business power users, enablingthe presentation of Hadoop data in a visually appealing format for executives, managers, and line of businessstaffers. Big data becomes much more valuable when it can be interpreted and understood by non-technical users.

Cognos supports connecting to Hadoop using Hive, which translates code from SQL to MapReduce to get resultsfrom Hadoop. There will always be some latency as Hive cannot change the nature of MapReduce, whichdistributes processing work across Hadoop nodes. The query is split into discrete chunks of work and the results areassembled as they are returned. SQL join conditions, which are commonplace in Cognos generated SQL, create anadditional layer of complexity for MapReduce. This further increases the query processing time and will preventsome queries from running at all.

IBM addresses these problems with BigSQL. It works on the same Hive megastore, but produces faster and morereliable results. BigSQL is not just about performance, but also assuring that the SQL query will run. It optimizes

2/3

Page 3: The Big Picture on Big Data and Cognos

SQL for MapReduce so that it will run faster and prevent having to modify the Cognos Framework Manager modelor hand code SQL inside of Cognos. An alternative to Hive and BigSQL is Impala, which makes similar claims toperformance.

Success with Big Data requires getting key pieces to work together. With BigInsights and BigSQL, IBM is providingtools for facilitating Hadoop adoption, including interoperability with existing Cognos infrastructure and functionality.

Stay on top of business intelligence topics, read other Senturus blogs at: http://www.senturus.com/blog/.

ResourcesSenturus webinar Running Cognos on Hadoop:http://www.senturus.com/resources/running-cognos-on-hadoop/

Video of Hive and BigSQL performance test results:https://developer.ibm.com/hadoop/blog/2015/10/23/hive-and-big-sql-performance-test-update/

IBM BigSQL technology sandbox demo cloud environment for Hadoop and BigSQL:https://my.imdemocloud.com/projects/3467

Thanks to David Currie for contributing this article. David is a long-time business analytics consultant. He blogsabout business intelligence and big data at davidpcurrie.com.

Big Data / IBM Cognos

3/3