in-memory computing: powering enterprise high-performance

10
In-Memory Computing: Powering Enterprise High-Performance Computing To succeed in today’s modern digital era, organizations must embrace the next wave of hyperscale computing into mainstream business by considering in-memory computing technologies that not only bolster their large-scale data processing capabilities but accelerate the transformation of raw information into applied knowledge. Executive Summary Traditional high performance computing (HPC)/ supercomputing, analytics and mainstream real- time/batch computing are quickly converging. Mainstream workloads are crossing over the high performance computing arena, demanding faster analytics/batching, resource-intensive computa- tions and algorithms. To succeed in today’s accel- erating digital world, enterprises must collect and analyze mind-boggling amounts of data, in real time, and at ever-faster speeds that most legacy enterprise HPC technologies and systems were not originally designed to accommodate. In our view, organizations need to embark on what we call Enterprise HPC 2.0. This term refers to the ecosystem that leverages/utilizes various latest commodity-hardware-based hyperscale grid tech- nologies such as in-memory computing (IMC), compute and data grid technologies, streaming analytics, graph analytics, etc. These are in con- junction with infrastructure advancements such as solid state drives (SSD)-enabled technology, GPGPU acceleration, general purpose Infiniband interconnect technology, etc. that enable IT orga- nizations to fast-track enterprise computing to better serve the ever-growing data needs of the business. Significant enthusiasm is building around the IMC paradigm for large-scale data analysis. His- torically, in-memory grid technologies were primarily data-focused and used by the orga- nizations for distributed caching patterns to achieve low latency reads of critical transac- tional data. However, IMC technology is progres- sively emerging as a key empowering agent for enterprises seeking to accelerate their real-time decision-making ability and agility, by enabling Web-scale data processing, which are capabilities necessary for staying relevant and competitive in today’s digital era. IMC’s impact is typically felt where organiza- tions are creating new and more innovative ways of working. A dramatic reduction in memory hardware costs also favors the growth of IMC technologies. However, several factors continue cognizant 20-20 insights | november 2015 Cognizant 20-20 Insights

Upload: ngotram

Post on 13-Feb-2017

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In-Memory Computing: Powering Enterprise High-Performance

In-Memory Computing: Powering Enterprise High-Performance Computing To succeed in today’s modern digital era, organizations must embrace the next wave of hyperscale computing into mainstream business by considering in-memory computing technologies that not only bolster their large-scale data processing capabilities but accelerate the transformation of raw information into applied knowledge.

Executive SummaryTraditional high performance computing (HPC)/supercomputing, analytics and mainstream real- time/batch computing are quickly converging. Mainstream workloads are crossing over the high performance computing arena, demanding faster analytics/batching, resource-intensive computa-tions and algorithms. To succeed in today’s accel-erating digital world, enterprises must collect and analyze mind-boggling amounts of data, in real time, and at ever-faster speeds that most legacy enterprise HPC technologies and systems were not originally designed to accommodate.

In our view, organizations need to embark on what we call Enterprise HPC 2.0. This term refers to the ecosystem that leverages/utilizes various latest commodity-hardware-based hyperscale grid tech-nologies such as in-memory computing (IMC), compute and data grid technologies, streaming analytics, graph analytics, etc. These are in con-junction with infrastructure advancements such as solid state drives (SSD)-enabled technology, GPGPU acceleration, general purpose Infiniband

interconnect technology, etc. that enable IT orga-nizations to fast-track enterprise computing to better serve the ever-growing data needs of the business.

Significant enthusiasm is building around the IMC paradigm for large-scale data analysis. His-torically, in-memory grid technologies were primarily data-focused and used by the orga-nizations for distributed caching patterns to achieve low latency reads of critical transac-tional data. However, IMC technology is progres-sively emerging as a key empowering agent for enterprises seeking to accelerate their real-time decision-making ability and agility, by enabling Web-scale data processing, which are capabilities necessary for staying relevant and competitive in today’s digital era.

IMC’s impact is typically felt where organiza-tions are creating new and more innovative ways of working. A dramatic reduction in memory hardware costs also favors the growth of IMC technologies. However, several factors continue

cognizant 20-20 insights | november 2015

• Cognizant 20-20 Insights

Page 2: In-Memory Computing: Powering Enterprise High-Performance

2cognizant 20-20 insights

to slow the adoption at the enterprise, such as a fragmented technology and vendor landscape, a lack of commonly agreed upon industry standards, scarcity of skills and still-emerging industry best practices.

Given that the technology remains in its adoles-cence, the selection of the right IMC technology is critical to any strategic digital business trans-formation decision. Soaring enterprise workloads and the use cases that make use of in-memory processing are informing key decisions around IMC technology platform selection.

A blind jump into the IMC technology valley will not yield durable value. It requires clear and effective analysis and understanding of workloads and business priorities, with a goal to increase scalable performance and competitive benefits for the business. This entails skilled experts to perform a focused evaluation. Furthermore, the multitude of new and emerging products makes is extremely challenging to select the right product and approach.

However daunting this decision may seem, it is of utmost importance for organizations to use IMC technology to help address their ever-mounting high-performance and low-latency processing needs across the enterprise.

This white paper summarizes the features and benefits of using IMC for large-scale data-set aggregations using multiple popular IMC approaches. The paper presents results from an internal study performed in which we created an evaluation scenario to compare various IMC approaches/technology architectures. The study results establish that simple migration to an IMC technology yields performance levels 13 times greater for a given batch workload previously implemented using a disk-based architecture. This paper not only highlights the importance of embracing the IMC agenda for enterprise workloads but offers a formal methodology for choosing the most appropriate IMC platform to fit given business needs.

In-Memory Computing: A Market CheckEffective use of IMC technology along with a clear strategy for adoption can help enterprises reap multiple benefits. Figure 1 lists some of the key use cases across specific industries. While this is just an indication, the possibilities are abundant and are not limited to the specified list.

There have been rapid innovations in the IMC space recently to enable faster computation and processing speeds. These include Hadoop

■ Real-time in-store analytics.■ Fast real-time loyalty offers.

■ Real-time ads placements.■ Real-time sentiment analysis.

■ Faster medical imaging processing.■ Genome analysis.

■ Faster claim processing & modeling.■ Faster actuarial science.■ Fraud detection. ■ Real-time trading

decisions.■ Faster reporting.

■ Inventory management.■ Predictive analytics to avoid unplanned downtime.

Banking & Financial Services

ManufacturingInsurance

Healthcare

Telecom

Retail

Figure 1

In-Memory Computing (Enterprise HPC 2.0)

Page 3: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 3

MapReduce — a batch processing framework that has added support for an in-memory file system called Tachyon. In addition, IBM has added Apache Spark — an IMC system — to its z Systems to bring analytics to mainframes. Also, SQL Server 2016 Community Technology Preview 2 adds IMC power. This has led to the availability of a plethora of IMC technology-based products. However, these products can be classified into various segments, based on their inherent archi-tecture and technological approaches. Moreover, each IMC system is not applicable for every type of enterprise workload. It is therefore imperative to have a clear understanding of the pros and cons of each of these system types in order to effectively select and utilize IMC systems and reap the business benefits.

IMC technology has evolved from its earliest avatar (distributed caching) to today’s integrated in-memory platform that provides storage, compute and transactional services for large-scale data sets. These systems fall under the pure-play IMC technologies category. The “alternate IMC” segment applies to products such as Apache Spark, which, in our view, does not represent all-encompassing in-memory technology in the strict sense since it does not provide a platform

for storing large-scale data. However, it provides a processing platform for large-scale in-memory computing and is said to provide performance up to 100 times faster for certain applications1 and is being endorsed by IBM2 and Amazon Web Services.3

Figure 2 illustrates the evolution of IMC technology, some of the popular products under each segment and the typical workloads for which they are best used.

Given the rapid pace of innovation, the IMC product landscape requires the latest skills and a thorough understanding of a specific IMC system’s architectural underpinnings to validate its fit and effective use for a given enterprise workload. Furthermore, with the multiple options available, enterprises can find it difficult to make the best choice and use of an IMC technology to satisfy their high performance computing needs.

To address these challenges, we — at the Cognizant Hyperscale Computing (HPC) Lab — have launched a structured methodology to help enterprises realize value from the next wave of hyperscale computing using Enterprise HPC 2.0, which leverages in-memory computing grids.

Figure 2

IMC Technology’s Progression

A cache that partitions its data among all cluster nodes.

A data fabric across large cluster of servers for distributed in-memory storage and management of large data sets.

A RDBM system that stores data in memory instead of on disk.

A next-gen platform that integrates IMDG with IMCG and provides additional features like CEP, streaming etc.

A platform for computing and transacting on large-scale data sets in parallel.

Alternate IMC

Pure Play IMC

In-Memory Data Grid (IMDG)

In-Memory Database (IMDB)

In-Memory Data Fabric (IMDF)

■■ Apache IgnApache Ignite (GridGain) (GridGain)

■■ Pivotal GemFire XDPivotal GemFire XD■ racle CoherenOracle Coherence■ GigaSpaces XAPaSpaces XAP■■ HazelcastHazelcast■ nispan (JBoInfinispan (JBoss)

■■ SAP HANASAP HANA■■ Oracle ExalyticsOracle Exalytics■■ ExadataExadata■■ MS SQL2014MS SQL2014

In-Memory Compute Grid

(IMCG)■ Apache Spark

For in-memory computation and processing of data stored in disks.

For a single integrated platform for real-time big data management and computing, handling new HPC payloads such as Streaming, CEP.

In-memory high speed alternative for existing disk-based RDBMS with full SQL support, with no change to application.

For real-time big data initiatives, handling HPC payloads along the lines of MapReduce, MPP with partial SQL support.

Distributed Key/Value Cache for Low Latency access.

■ dMemcached■ cacheEhcache■■ Pivotal GemFireire

Distributed Caches

Page 4: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 4

Figure 3

IMC Technology Selection Process

IMC Assessment Methodology

Refinement (Stage II)Establishment (Stage I)

1 2 3 4

IMC Value Creation: MethodologyA clear process, as well as a framework, is required to establish the business goals and successfully determine the best-fit IMC technology. This is vital to garner the utmost value from an IMC-led transformation. Figure 3 depicts our process for establishing and identifying the right IMC product for the business.

Step 1: Discovery

The business use cases and the workloads to be implemented via IMC technology play a crucial role in the selection of the products. So first the workload is chosen and key goals for implementa-tion are defined.

For this white paper, we studied a retail customer analytics workload previously processed on a modern scalable batch model using Apache Pig, a Hadoop MapReduce-based technology, which has a disk-based architecture. The nature of the tech-nology used for this implementation permitted the solution to be an offline and batch-based system. To be better prepared to handle the disruptive nature of the consumer behavior where latency implies loss of business, we preemptively wanted an alternative solution to support faster and/or near-real-time performance and support for the customer’s customers. We devised an internal study to transform the batch workload using multiple IMC technologies and successfully applied appropriate IMC technology to make it faster.

Next, we defined the key use cases that the workload requires, which becomes the input for the IMC system evaluation matrix. For quick development of the use case and benchmark-ing, we wanted the following core features to be

readily and easily supported by the product, apart from the in-memory caching features normally available with such products:

• Bulk data loading.

• SQL support for easy and fast retrieval of data with conditions.

• SQL support for joining multiple data sets based on criteria.

• Support for creating new tables/data sets dynamically on the fly with data from other tables/data sets.

• Support for stored procedures/user-defined functions/MapReduce to handle very specific aggregations.

• In-memory distributed computation capabilities.

Step 2: Analysis

Second, we needed to ascertain the segment of IMC technology that would best suit the workload and identify a potential list of IMC systems from the category that readily support the evaluation criteria for specific use cases. This is carefully chosen after deliberation with the enterprise’s business and architect stakeholders.

We then performed deep-dive fit and architec-tural analysis on the selected list and determined the best-fit match based on the aforementioned evaluation criteria. From the output of this analysis, the final list of IMC systems that closely fit the requirements was determined. Further proof-of-concept, proof-of-technology and bench-marking were performed on the final list of IMC systems to validate, establish and recommend the best-fit IMC system for a given workload.

1 2 3 4

1 2 3 4

Page 5: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 5

And so, in our case, we selected an initial list of potential IMC products from the IMDG, IMDF and alternate IMC segments, as we needed the capa-bilities like that of MapReduce to handle specific aggregations demanded by the chosen workload. Distributed caching systems lack these features and an IMDB system like SAP HANA that primarily supports SQL workloads was not the right fit in this case.

Figure 4 lists the IMC systems selected. As an internal study, we chose a list of products rated as top vendors and leaders in the given segment by various leading analysts from a good mix of commercial and open-source products.

Fitment Analysis

Next, we performed a comprehensive product comparison and weighted scoring and ranking model on 20 different attributes and dimensions based on the specific list of features that were most essential for quick development and bench-marking of the use case, as listed in Figure 5. This methodology helped us to quickly shortlist one data grid system each from the commercial and open source categories for our final evaluation. In-memory data grids offer many other useful features. IMC vendors have developed unique selling propositions for their products that need to be compared, analyzed and leveraged on a case-by-case basis.

The final considerations were based on the score ratings depicted in following two product comparison scoring figures. Figure 6 shows a comparison between three commercial data grids and offers a comparison between three open-source data grids selected from the previous step, as depicted in Figure 4.

Analysis Results

For the final benchmark and evaluation, we chose Apache Spark as the first product, for its reputation as the next best IMC technology to replace the Hadoop MapReduce framework. From the scoring process, from the commercial category we selected Pivotal GemFire XD (the community version of the GemFire is now available as Apache Geode); the third product chosen from the open source category was Apache Ignite. Both of these products scored the highest as the

Figure 4

Scoring the Requirements

Establishing the Short List

Pure-Play IMC Technology

Commercial■ Pivotal GemFire XD■ Oracle Coherence■ GigaSpaces XAP

Open Source■ Apache Ignite■ Apache Infinispan from JBoss■ Apache Hazelcast

Alternate IMC Technology

Others■ Apache Spark

Figure 5

Category Weightage Percent Criteria

Features 60%

Bulk Data Loading, SQL Queries Support, Stored Procedures Support, Dynamic Data Set Creation, Txn Support, UDF Support, SQL Joins, Sub Queries, JDBC Driver, Caching Patterns (Side Cache, In-line Cache), Replication, Guaranteed Delivery, Change Data Capture, Cloud Integration

System Environment

Setup25%

Application Server (Tomcat/Jetty) Integration, Administration Consoles Availability, Monitoring/Management Consoles Availability, HA & Fault Tolerance, Deployment & Configuration Speed

Dev Environment

Setup15%

Programming Language Support (.Net/Java), Client SDKs/APIs Support, Spring Data Support

Page 6: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 6

10% 10%15% 15%

25% 25%

55%60%

45%

FeaturesSystem Environment

setup

Dev Environment

Setup

Commercial IMC Product Comparison

■ GigaSpaces XAP

■ Oracle Coherence

■ Pivotal GemFireXD

10% 10% 10%

25% 25%25%

60%

45%

35%

FeaturesSystem Environment

setup

Dev Environment

Setup

Open Source IMC Product Comparison

■ Apache Ignite

■ Apache Hazelcast

■ Jboss Infinispan

potential best-fit technology to meet our needs (i.e., the other compared products did not support straightforward SQL joins or subqueries).

We followed this with a detailed proof-of-concept (PoC) and proof-of-technology (PoT) approach and compared the various aspects of the archi-tectures of the three IMC systems selected. We then considered their features, differences and relevance for supporting the large-scale data aggregation required by the use case and validated this with a benchmarking process.

Performance Benchmarking

An identical computing cluster consisting of three nodes was provisioned using the Cognizant

Hyperscale Application Platform, which allows for fast setup and deployment and provides monitoring facilities to gather the benchmark results. The system detail of each node and the IMC software details are shown in Figure 7.

The three systems were then configured with the default cluster settings to determine the as-is per-formance of the IMC systems compared with tra-ditional Hadoop MapReduce (MR) using Apache Pig on Apache Hadoop Yarn 2.4.0. For all three systems, the only setting change we performed was to increase the IMC system process’s memory parameters (JVM) such that the total cluster heap memory size was 250 GB for the in-memory data cache.

Figure 6

Figure 7

The Comparative Matrix

Node Details

IMC System Version

Apache Spark 1.3.1

Apache Ignite 1.2.0-incubating

Pivotal GemFire XD 1.4.1

Disk Space (TB) RAM (GB) CPU Cores CPU Clock Speed

2 128 32 2.6

Operating System - CentOS release 6.5

Page 7: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 7

Figure 8

Performance Comparison

0

5

10

15

20

25

0

5

10

15

20

25

Apache Pig Pivital GemFireXD Apache Spark

Data Loading Times (50GB)

Tim

e Ta

ken

(min

utes

)

Apache Pig Pivital GemFireXD Apache Spark

Task Execution Times (50GB)

Tim

e Ta

ken

(hou

rs)

Benchmark Task

Our study was to compare a batch workload, which performed a good mix of various computations to create new data sets, with computed fields based on aggregations performed in previous steps. The original data was persisted in four different struc-tured data sets with relational integrity between them based on certain attributes/fields. The study was done on 50 GB of data with 500 million records using the traditional MR mode and compared with the twin approaches – using Alternate IMC Apache Spark and using IMDG New SQL products.

Benchmark Execution

We executed each task three times for each IMC system and reported the average of the trials. Each system executes the benchmark tasks separately to ensure exclusive access to the cluster’s resources. During the tests, it was found that Apache Ignite, unlike the other three systems, did not provide out-of-the-box support for bulk ingestion of data from csv files and was unable to handle the ingestion beyond 1 GB volume of data with its default cluster environment settings in a stable manner. This prevented us from testing the system for task executions.

Results

Figure 8 depicts the overall performance numbers of the three IMC systems under different task scenarios.

It is important to note that although perfor-mance tuning was not considered in our study, for

optimal performance of each system the system configuration parameters must be tweaked based on data size, workload types, hardware capacities, resource utilizations, etc. The metrics shown in Figure 8 would therefore change based on the system tuning and optimization techniques used. However, we expect only the execution times to be faster and the relative performance rating of these systems to be equivalent when measured against each other.

Step 3: Recommendation

Third, after creating PoCs and performance-relat-ed benchmarks, we can easily derive, validate and recommend the best-fit IMC system for any given workload. We can also consider where these tech-nologies would potentially give the most durable benefit for enterprise workloads by performing such detailed analysis of their architectural aspects.

For the current workload, we established key findings for each IMC system, as shown in Figure 9 (next page). The results provide evidence and confirm that using IMC technology accelerates computational performance that the enterprises can harness after due diligence and consider-ation. IMC technology can considerably improve the overall processing times, from data loading to execution. For the given use case and data load, processing times improved 13-fold by simply replacing the MapReduce-based batch system with an IMC technology. We found that Apache Spark was best suited for this particular scenario.

1 2 3 4

Wor

kloa

d Op

erat

ions

Mix

Pe

rcen

t Aggregations/Computations

50%

Data Set Joins 30%

Data Set Filters10%

Data Set Select/Create

10%

Data

Set

M

etric

s Input Data Size

(4 datasets)50G

Input Records Count

500 mil

Output Data Size

(1 denormalized view)

150G

Output Records Count

300 mil

Perfo

rman

ce

Met

rics Pre-IMC

Execution Time 13hrs 15min

Post-IMC Execution

Time1hr 6sec

Total Performance Improvement By Apache Spark

13x

Page 8: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 8

Figure 9

Functional Findings

■ Ideal for low latency transactional and operational workloads.

■ Easy to implement.

■ Easy to administer and monitor.

■ Extensive SQL support.

■ Ideal for iterative data analysis, caching intermediate data for real-time querying.

■ Ideal for live stream analytics and predictive workloads also involving machine learning.

■ Easy to implement.

■ Not ideal for analytical and predictive workloads in stand-alone mode.

■ Lacks support for running iterative loops based on a large number of keys from a specific collection.

■ Processing times deteriorate due to missing feature.

■ Not ideal for transactional processing in stand-alone mode.

■ Rudimentary management and monitoring consoles.

■ Lacks support for in-memory data storage.

Pivotal GemFire XD Apache Spark

■ Ideal for big data analytics, fraud detection, risk analytics, customer intelligence.

■ Single integrated platform with additional capabilities such as Compute Grid, Service Grid, CEP Streaming.

■ Nascent stage and requires maturing from incubation status.

■ No out-of-the-box CSV streamer for bulk data ingestion.

■ Large data loading times suffer due to missing feature.

■ Not so easy to implement.

Apache Ignite (incubating)

Step 4: Planning

Finally, with the knowledge and validation achieved in the previous steps, we can then successfully plan and create an effective IMC roadmap.

Key RecommendationsOur analysis establishes that IMC is the future of computing and a key enabling technology for enterprise HPC workloads that require analytical, predictive and cognitive capabilities.

As such, we recommend that:

• Although technology maturity is still uneven, decision-makers must realize that IMC tech-nologies and architectures are well positioned to be adopted and utilized for their mainstream businesses.

• Application development and other IT leaders must look at IMC technology to support a wide range of use cases including batch, analytics, transaction processing and event processing rather than limiting the technology to distrib-uted caching applications.

• Organizations would benefit by shifting to IMC technology when they need to reengineer established applications to increase their per-formance and scalability for fast transaction-al data access (e.g., inventory management, financial reference data, real-time transaction-al data) or to offload workloads from legacy systems performing heavyweight offline cal-culations (e.g., pattern analysis, trade rec-

onciliation, number crunching) or real-time stream processing (e.g., real-time analytics, continuous calculation, fraud detection, click-stream analytics).

• When opting for IMC systems from the open-source model, one way to proceed in a fail-proof manner is to conduct a PoC and a PoT to validate the system and then adopt the commercial counterpart of the same system to ensure stable system support.

Even though our study was limited to three IMC systems, we recommend that enterprises consider a broader range of products for initial evaluation. This should be based on criteria most critical to the business such as available expertise, business drivers for IMC adoption, preference for IMC appliance model, cloud support, product support for post-implementation, mega-vendors, small-size vendors and newer open-source options for open integration. All of these consid-erations are critical to the evaluation matrix. This should be accompanied by the deep-dive-compar-ison scoring model approach similar to that which we followed on a list of parameters such as most significant use cases, workload patterns of use cases, short-term and long-term goals, ability to realize ROI in next three to five years, etc.

A PoC/PoT on shortlisted products would further reinforce the merits/demerits of any evaluated product. This would help the enterprise to make an informed decision to adopt a new IMC technology that creates impact for their business.

1 2 3 4

Page 9: In-Memory Computing: Powering Enterprise High-Performance

cognizant 20-20 insights 9

Looking ForwardAlbeit in-memory technology has been around for many years, the latest advancements around scale-out architecture, increased automation and reduced memory costs have increased the tech-nology’s appeal to all enterprises. IMC innovation continues to be unabated across the whole spectrum of IT market segments — from hardware to application infrastructure to packaged business applications. New in-memory technolo-gies can support new and complex workloads that organizations can confidently apply to

achieve competitive advantage. While we do not advise general replacement of all workloads and traditional approaches by IMC technology, our study suggests that organizations can reap a high reward with the technology if the platform is properly vetted, selected and deployed. So, if you ask us, “what technology can accelerate data processing 10x times and deliver real-time business insights and information with high per-formance and low latency?”, our answer would be Enterprise HPC 2.0 and in-memory computing technology.

Footnotes1 Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion,

“Shark: SQL and Rich Analytics at Scale,” June 2013.

2 http://www.firstpost.com/business/ibms-apache-spark-push-plans-put-spark-bluemix-open-tech-cen-tre-2296260.html.

3 http://searchaws.techtarget.com/news/4500248624/Amazon-Elastic-MapReduce-moves-forward-with-Apache-Spark.

References

• “Taxonomy, Definitions and Vendor Landscape for In-Memory Computing Technologies,” Gartner report.

• “Hype Cycle for In-Memory Computing Technology, 2014,” Gartner report.

• Noel Yuhanna, “Market Overview: In-Memory Data Platforms,” Forrester report, December 26, 2014.

Page 10: In-Memory Computing: Powering Enterprise High-Performance

About CognizantCognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger busi-nesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfac-tion, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 100 development and delivery centers worldwide and approximately 218,000 employees as of June 30, 2015, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.

World Headquarters500 Frank W. Burr Blvd.Teaneck, NJ 07666 USAPhone: +1 201 801 0233Fax: +1 201 801 0243Toll Free: +1 888 937 3277Email: [email protected]

European Headquarters1 Kingdom StreetPaddington CentralLondon W2 6BDPhone: +44 (0) 20 7297 7600Fax: +44 (0) 20 7121 0102Email: [email protected]

India Operations Headquarters#5/535, Old Mahabalipuram RoadOkkiyam Pettai, ThoraipakkamChennai, 600 096 IndiaPhone: +91 (0) 44 4209 6000Fax: +91 (0) 44 4209 6060Email: [email protected]

© Copyright 2015, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. TL Codex 1546

About the AuthorArchana Rao is a Senior Technology Architect within Cognizant HyPerscale Computing Lab, a unit of the Cognizant Technology Labs business unit. She has 11-plus years of cross-industry IT experience developing and providing solutions, focusing on architecture and design of enterprise high perfor-mance computing (HPC) applications using various compute and data grid technologies such as Hadoop, Windows HPC, in-memory computing, search grids and NoSQL. Archana’s focus is on business enablement and transformation through HPC technology and architecture, where she has consulted with many clients implementing strategic technology transformation initiatives. She holds a B.E. in electrical engineering and electronics from University of Madras, Chennai. Archana can be reached at [email protected] | Twitter: @ArchanaRA0.

Acknowledgment Special thanks to Senthil Ramaswamy Sankarasubramanian, Director, Cognizant HyPerscale Computing Lab, a unit of Cognizant Technology Labs, for his invaluable feedback during the course of writing this paper.