iia: the current state of hadoop in the enterprise

17
THECURRENTSTATEOF HADOOP INTHEENTERPRISE

Upload: coy-dean

Post on 17-Aug-2015

12 views

Category:

Technology


1 download

TRANSCRIPT

The Current State of Hadoop in the Enterprise

1

©2015 IIA and SAS Institute Inc. All Rights Reserved.

THE�CURRENT�STATE�OF

HADOOPIN�THE�ENTERPRISE

The Current State of Hadoop in the

Enterprise1

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Executive SummaryOwning, acquiring, analyzing and managing data have suddenly moved from an operational task required by IT to a top corporate priority where information is viewed as a strategic asset. As business plans increasingly call for reliance on Big Data, users of all stripes are catching on and becoming more proficient at making use of data through analytical tools.

Many organizations today unequivocally view data as strategic to their business operations and growth, including those who haven’t quite yet fully figured out how best to extract value from data. Hadoop has emerged as a popular technology for consideration, but knowing exactly where and how Hadoop can be leveraged within the modern enterprise data architecture is an open question.

The attraction of the low-cost, high-availability storage and processing power of Hadoop has drawn many organizations to give this new technology consideration, either by way of limited scope evaluations and pilots or small deployments. And yet, Hadoop may not be a panacea for every Big Data initiative. Even its most enthusiastic champions highlight some challenges that could be slowing down Hadoop’s broader adoption.

Market sentiment about the possibilities of Hadoop is high and continues to grow. The total market size is in the eye of the beholder, but combining the views of top market observers seems to indicate that only 1,000-1,500 organizations globally are actually running Hadoop in production, and this includes the early adopters whose entire businesses are based on data. Generally, Hadoop adoption is modest to date, with most enterprises at knowledge collection, evaluation or piloting stages.

Key drivers for Hadoop adoption include low-cost data storage coupled with a distributed processing environment that’s ideal for experimentation with large, unstructured data sets that have not been accessed by organizations in the past.

A key finding from this study is that many more organizations view Hadoop as playing a strategic role in their future growth, but are still struggling to implement due to complexity, skills gaps, unfriendly analytics interfaces for business users, and the decision to implement a commercial version or to manage it themselves.

This report begins with a summary of the most recent research on Hadoop adoption rates, market size and growth, and the most common use cases of Hadoop today. The second section presents key experiences drawn from qualitative interviews conducted among organizations in various stages of Hadoop deployment. The study concludes with a set of recommendations for organizations who are considering Hadoop.

The Current State of Hadoop in the Enterprise

2

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Current State of the Hadoop MarketThe tsunami of coverage and commentary from vendors, analysts, media and industry pundits on Hadoop in recent years has been nothing short of staggering. While Hadoop itself has been around for a decade, excitement around its potential has been growing in large measure due to the interest in Big Data, the catch-all term for the “data exhaust” pouring out of the ever-expanding universe of mobile devices, sensors and social services connected to the Web and talking to each other (commonly called the Internet of Things).

Hadoop Adoption - Mirage or Real?

As with any emergent technology, “adoption” carries a range of interpretations when it comes to Hadoop’s footprint in the enterprise. Late in 2014, Forrester Research declared bullishly that “Hadoop adoption and innovation is moving forward at a fast pace, playing a critical role in today’s data economy.”1 Around the same time, Gartner took a more conservative view noting that Hadoop had a growing number of pilots, but “no dramatic growth in substantial projects.”2 One industry observer states, “The actual installed base of Hadoop clusters remains a lot smaller than many might expect given the amount of innovation that is going on around the platform.”3

For organizations deciding whether “to Hadoop or not to Hadoop,” this climate of uncertainty and the modest adoption rates are even more confusing when compared against glowing reports of business momentum from commercial Hadoop vendors. The three big commercial Hadoop distributors (Hortonworks, Cloudera, and MapR Technologies) appear to be enjoying healthy growth and have secured the backing of industry giants and the public markets.4

In the context of these mixed signals, determining where Hadoop sits on the hockey stick of adoption can be a tricky exercise. There is no doubt that the interest in the possibilities presented by Big Data among organizations is real. How many organizations are actively realizing these possibilities through Hadoop is another question.

1 Gualtieri.2 Adrian.3 Morgan.4 Intel made a whopping $740 million investment in Cloudera in March, 2014 and Hortonworks received a $50 million strategic investment from HP in November,2014followedbyanIPOinDecember,2014.MapRcloseda$110millionroundof financinginJune,2014ledbyGoogleCapital.Theseare certainly bullish endorsements for the future of Hadoop despite sluggishness in uptake.

For organizations deciding whether “to Hadoop or not to Hadoop,” the modest adoption rates are even more confusing when compared against glowing reports of business momentum from commercial Hadoop vendors.

1

The Current State of Hadoop in the

Enterprise3

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Hadoop Market Size and Growth Potential

Despite the split opinions among Hadoop watchers in the industry, most agree on these measures of market size and growth rates:

1. Number and size of implementations (proof-of-concept pilots as well as production level)

2. Commercial subscription revenues

Number and Size of ImplementationsAggregate data from multiple sources on Hadoop pilots and proof-of-concept experimentations (Hadoop in a sandbox, limited clusters/nodes) suggest that Hadoop is seeing some positive tailwinds, which should be reflected in growing adoption numbers in coming years.

But recent commentary from Gartner notes that “70 percent of companies who have invested in big data have mostly done so for pilots, with only 12 percent using big data in full production environments.”5

Ovum analyst Tony Braer recently estimated an installed base of 1,500 to 2,000 clusters globally by end of 2015. Braer notes that clusters with several thousand nodes remain the exception,6 with a large number of organizations typically starting out their Hadoop implementation at a far more modest scale. “Most enterprises start out with dozens of server nodes and certainly well under 100 nodes for proof of concept projects. Then, as they move into production, those Hadoop clusters grow to hundreds of nodes as the datasets expand.”7

Commercial RevenuesSubscription revenue generated by the three primary commercial Hadoop vendors (Cloudera, Hortonworks, and MapR Technologies) is yet another metric used to measure current health and future opportunity for Hadoop. 451 Research analyst Matt Aslett recently estimated about $374 million in Hadoop vendor subscription revenue in 2014, growing at a compound annual growth rate of 49 percent through 2018. This would imply revenues for support and software reaching $2.7 billion at the end of 2018, according to 451 Research’s growth model.8

5 Savvas.6 TheInternetmarketgiantssuchasGoogle,Facebook,Yahoo,Twitter,AmazonandNetflix,whosebusinesseshavedataattheirverycore,arecommonlycited astheearliestadoptersof Hadoopinproductionscenariosgoingbackseveralyears.Theseclusterssizesareexceptionalandnotrepresentativeof mainstream adoption patterns.7 Morgan.8 Morgan.

Hadoop is seeing some good tailwinds in the organization, which should reflect in growing adoption numbers in coming years.

The Current State of Hadoop in the Enterprise

4

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Geographic DifferencesWhile interest is growing across the globe, at present North America leads in Hadoop adoption. A 2013 Sandhill Group survey9 showed that EMEA, Asia/Pacific and India lag significantly behind North America in terms of Hadoop adoption.

In sum, while the base of Hadoop users is growing, only a small minority appear to be running Hadoop in production at a reasonable scale. Many more appear to have downloaded a free version without moving further. The caution and sluggishness in adopting Hadoop for production should be interpreted less as a sign of lack of interest and more as an indicator of media hype and vendor innovation having outpaced the readiness of organizations.

2015 will see that dynamic evolve with many declaring it to be the year that interest in Hadoop grows, bringing “Hadooponomics” a bit closer to reality for businesses eager to move from exploration and early pilots to production-level projects.

Drivers of Hadoop Adoption

The growing appetite to maximize insight and business value out of untapped data stores is the most significant strategic driver behind interest in Hadoop. More tactically, it is scalable, flexible, low-cost data storage that is cited as the most immediate benefit of Hadoop to the organization.

Low-cost data storageAccording to Statistic Brain, the average cost per gigabyte of storage has dropped from $437,500 in 1980, to $11 in 2000, to just $.05 (five cents) in 2013.10 While cheaper storage is here to stay, the growth in data has offset that advantage.

Two recent end-user testimonials stand out:

• TrueCar collects vast volumes of car price data to power their online car-buying business. The move to Hadoop slashed their monthly storage costs from $19/GB to $.23 cents/GB.11

• Dell SecureWorks, an internet security software company, processes up to 20 billion events per day in real time. It was able to slash its monthly storage costs from $17/GB to $0.21 cents/GB.12

9 Graham; Rangaswami.10 Statistic Brain.11 JohnWilliams,headof PlatformOperationsatTrueCarwasquotedassaying,“We’relookingatdatathatisjustamess,andinthepastwewouldhaveour staff spendingalongtimejustcleaningthatup.”ButwithHadoop,“youcanimaginejustkeepingeverypieceof dataforever,becausethatwayyoucanalways gobacklaterandtakealookandseewhatyoucomeupwith.”Source:Hortonworks.12 Source:Cloudera.

Many organizations in the midst of formulating their broader big data strategies and architectures are attracted to Hadoop as a cost-effective “holding container” for massive stores of unstructured data in an effort to “leave no data behind.”

The Current State of Hadoop in the

Enterprise5

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Scalable Data StorageThe meteoric growth (volume) and speed (velocity) of unstructured data being generated from the social and mobile web has overwhelmed IT and business decision makers alike. “Your company’s biggest database isn’t your transaction, CRM, ERP or other internal database. Rather it’s the Web itself and the world of exogenous data now available from syndicated and open data sources.”13 By some estimates, 90 percent of all data in our digital universe today is unstructured or semi-structured.14 Many organizations in the midst of formulating or broadening their big data strategies and architectures are attracted to Hadoop as a cost-effective “holding container” for massive stores of unstructured data in an effort to “leave no data behind.”

Limits to Hadoop Adoption

Hadoop may not be a panacea for every big data initiative, and even its most enthusiastic champions highlight some challenges that are slowing down its broader adoption.

A survey of over 100 data scientists last year revealed that 76 percent of those who used Hadoop found that “it takes too much effort to program or has other limitations” and is “too slow for real-time analytics.”15

There is also a growing sentiment that Hadoop’s MapReduce engine, which is optimized for batch processing, isn’t designed to handle ad-hoc, interactive real-time data discovery and analytics —a popular use scenario for big data analytics today.

Finally, organizations point to a Hadoop skills gap as an inhibitor to adoption.16 In response, the industry is seeing acquisitions and alliances that address the pain point around Hadoop and analytics talent with companies such as Teradata acquiring Hadoop consultancy Think Big Analytics17 and more recently, a strategic alliance announced between Cloudera and Deloitte.18

The slow ramp up of Hadoop in the enterprise may be a blessing in disguise, enabling vendors of commercial Hadoop distributions, as well as the open source community and app developers building on top of Hadoop, to solve for gaps in the technology that could hasten its adoption in the enterprise.13 Laney.14 Gantz;Reinsel.15 Russom.16 ThisistrueinbothNorthAmericaandEurope.ArecentbigdataskillsworkshophostedbytheEuropeanCommissionnoted:“Evidencealreadyshowsan emergingshortageof analyticalandmanagerialskillsnecessarytomakethemostof BigData.”17 See“TeradataAcquiresThinkBigAnalyticstoAccelerateGrowthof itsHadoopandBigDataConsultingCapability,”September3,2014. www.teradata.com/news-releases18 See“ClouderaandDeloitteAnnounceStrategicAlliancetoAdvanceAnalyticPerformanceof Customers,”February19,2015. www.globenewswire.com/news-release

The Current State of Hadoop in the Enterprise

6

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Hadoop vs. the Enterprise Data Warehouse - Friends or Frenemies?

The buildup of excitement and interest over the past few years regarding Hadoop has triggered some headlines positioning it as a challenger to the incumbent enterprise data warehouse. And yet as more organizations have experimented with and used Hadoop, they’re beginning to clarify the role of Hadoop as an element of their broader data infrastructure.

TDWI recently arrived at the following conclusion after conducting a series of qualitative interviews with data professionals earlier this year:

Few users are even contemplating a warehouse replacement. Instead, many are actively migrating some of their warehouse (defined as data) to other platforms, including Hadoop, as well as data warehouse appliances, columnar databases, NoSQL databases, clouds, and event-processing tools. They do this to get platforms better suited to advanced analytics with the migrated data (and other specialized workloads). In fact, this movement toward multi-platform data warehouse environments is one of the strongest trends in data architecture today.19

The future data processing and data management landscape will be a hybrid of EDWs and Hadoop, with each used where appropriate for the individual downstream analytic and BI use cases. The EDW is the best choice for structured and curated data. A Hadoop-based sandbox is the best choice when experimenting with a use case involving new types of data (web logs, text, email and machine data), which may not be well-qualified.

Depending on the use case, some organizations will find themselves combining data from both environments. Online product recommendations is a common example best met with this hybrid approach as it combines consumer sentiment data (free text reviews) with structured data (pricing, SKU numbers, product descriptions). In the future, data warehouses will evolve to accommodate storage economics, new business use cases, data governance, latency, scalability, and diverse data structure requirements.

19 Russom.

The Current State of Hadoop in the

Enterprise7

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Hadoop Realities – Experiences from the FieldWith the overview of the current market for Hadoop as a backdrop, this section presents actual experiences of end-user organizations at various stages of Hadoop deployment, drawn from a set of extensive qualitative interviews. Some of these “realities” confirm the market perceptions described in the first section, but a few key findings contradict the market sentiment.

Reality 1: Hadoop is viewed as a key component of data-driven strategy

Respondent organizations’ primary reasons for using Hadoop are authentically strategic in nature, born of a specific need to improve their company’s ability to solve complex problems through advanced analytics applied to large untapped data sets. Making the decision to implement a Hadoop-based architecture is therefore a well-thought out component of a larger analytics strategy that is aimed at competing more effectively through greater share and/or revenue, or lower operating costs.

Organizations that rely upon qualified data to support their business decisions and have embraced advanced analytics are therefore best suited to utilizing Hadoop. With an ability to perform more nimble analyses across a large volume of data, Hadoop can help analytics teams bring to light new insights based on data relationships, trends, and new types of information not previously understood.

“[The primary goal for our Hadoop deployment is] to create common centers of excellence around analytics, plus reduce duplication, reduce errors, heighten

data quality, and improve the quality of the insights we obtain.” - Karl Moad, University of Pittsburgh Medical Center

Marketing analytics represents a common use case for Hadoop. Organizations seeking to improve customer service/customer retention and to attract and acquire new customers are finding Hadoop to be especially beneficial. By allowing their analysts to access all customer touch points throughout an organization - and make use of data points that haven’t previously been accessible - these companies look forward to gaining a competitive advantage. At a minimum, respondents view Hadoop as a tool that at least keeps them on par with competitors’ capabilities. Not deploying Hadoop carries the risk of falling behind.

“We have a tremendous amount of data and we’re trying to glean more cus-tomer-related information in order to improve sales and marketing. That has

really been the key driver outside of IT.” - Anonymous participant, health insurance company

2

The Current State of Hadoop in the Enterprise

8

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Other industry-specific goals for Hadoop implementations raised in the interviews relate to areas such as:

• Simplified supply chain management across multiple manufacturing/assembly plants

• Improved ability to track and assess performance management (e.g., human/talent in consulting organizations; machine data/performance and human performance in manufacturing)

• Enhanced ability to perform risk analysis for insurance underwriting

While these goals pertain to specific industry sectors, the common thread is that these organizations see Hadoop as offering a unique solution to help them improve a process that is central to their organization’s health. For example, a consulting services organization struggling to manage turnover among its professional workforce seeks to implement predictive analyses atop the Hadoop infrastructure to identify key areas of burnout. When employees choose to leave, the organization wants to look at the patterns of behavior that could become indicators of flight risk and reshape their human resources strategy accordingly.

Reality 2: Hadoop requires re-thinking the organizational data architecture

As previously discussed, companies have started making adjustments to their data management processes by offloading existing data storage and processing from operational systems or data warehouses to Hadoop, and using Hadoop as a reservoir for storing and processing new data -- particularly unstructured or semi-structured data.

Our study participants confirmed that although Hadoop will be an integral part of the enterprise data architecture, it is typically not seen as a replacement to the enterprise data warehouse. It is a complementary system that is expected to co-exist for a specific purpose, at least for the near term. Participants articulated intentions to use Hadoop as a centralized data hub for downstream BI and analytics usage, but this hub is fed though existing operational systems and data warehouses. Longer-term, however, customers may eventually replace their relational database systems.

Use of data warehousing will continue as needed for BI and analytics, since it contains relevant and curated

The Current State of Hadoop in the

Enterprise9

©2015 IIA and SAS Institute Inc. All Rights Reserved.

sets of structured data. However as organizations face increasing requirements from business units to also analyze different types of data, they will invest in using Hadoop in a sandbox environment and pursue steps to prepare this data and apply analytical techniques.

“Through the years, the different business units have been quite siloed in run-ning the business… now we are trying to operate strategically, more as an

ecosystem. There is a need for us to be able to have full visibility across the offerings.”

- Christina Foo, Intuit

It can be unwieldy for large organizations to get all business units to “snap to” a particular type of data architecture that will allow the corporation to sift through, utilize, and apply learnings across business units. With a strong corporate priority placed on data management and analytics, these organizations are looking for ways to effectively access and manage all of these disparate data inputs accordingly.

“Like any large enterprise, we have a diverse set of upstream systems that generate data and we continue to work towards integrating them quickly. In

the meantime, however, healthcare reforms like the Affordable Care Act neces-sitate taking a longitudinal and cross-sectional view of our

members and that is an area where Hadoop can help.” - Ravi Shanbhag, UnitedHealthcare

Reality 3: Hadoop value primarily seen in new analyses on unstructured data

Not only does Hadoop allow our respondents’ organizations to manage and analyze data across a variety of non-congruous inputs, Hadoop is welcomed as the vehicle that will allow companies to analyze unstructured data to garner incremental benefits and further support the strategic efforts around analytics. The amount of data currently managed within the Hadoop cluster varies drastically depending on where in the adoption cycle a particular organization is. But with the exploding growth of data coming from all sources, these organizations are expecting Hadoop will hold a significant portion of their longitudinal data as well as all unstructured data.

Healthcare organizations in particular are looking for ways to incorporate a mix of data that typically sits outside of the traditional databases: physician notes, lab notes, procedural documentation, images, and more, into one inclusive system to feed their analytics strategy.

The Current State of Hadoop in the Enterprise

10

©2015 IIA and SAS Institute Inc. All Rights Reserved.

“80% of our data is unstructured, and only 20% is really structured… [we have] 20 plus years of collecting all of our clinical data, all of our physician notes, all

of our physician procedural documentation, all of that as well as lab notes and everything else.”

- Karl Moad, University of Pittsburgh Medical Center

While Hadoop is certainly being utilized to capture and store unstructured data, companies must next apply text analytics techniques in order to parse, categorize, and examine sentiments. For example, a major healthcare system can now effectively store a patient’s medication history, treatments, and doctors’ notes (stating qualitative aspects of a patient’s health), and then apply text analytics to the patient notes which can then be folded into advanced models to perform predictive alerts for physicians.

Of course, healthcare isn’t the only industry that benefits from systems that allow access to unstructured data. Any organization that interacts with and markets directly to consumers can now store and manage textual data referencing their company or brand from social media, online communities and call centers. Companies that develop systems to capture and then effectively analyze what is being said about their brand can get a continuous pulse on consumer sentiment, and develop predictive techniques to help guide their PR and advertising strategy, as well as provide alerts when their social media presence needs to be managed.

Another very common, nearly ubiquitous use case is the incorporation of call center notes into corporate data storage systems, which can be combined with traditional CRM systems to help companies manage customer complaints and the hidden challenges customers may be experiencing with sales, service, or use. A really good analytics model built upon this system could make use of this information to provide additional context and insight into the relationships the company has with its customers. Ultimately, companies may be better equipped to identify customers who are at risk well before they decide to leave or discontinue using a product or service.

A final example of a company that is launching a Hadoop initiative in order to improve their ability to store and analyze a wide variety of inputs is a U.K.-based security software company that IIA interviewed for this study. The company provides a service to its clients to monitor and provide data on all security endpoints within an organization; yet they can’t effectively store and analyze these data using existing relational technologies in a timely manner. Their customers have thousands of systems collecting data points on employee communications and transmissions to prevent data leakage or other security failures. Security breaches are always a concern, but with highly publicized hacks such as those that took place recently with Target and Sony, their clients need to more effectively recognize behavioral patterns that foreshadow data leakage or other security breaches across an increasingly vast and diverse set of data points.

The Current State of Hadoop in the

Enterprise11

©2015 IIA and SAS Institute Inc. All Rights Reserved.

“At the moment we’re running into bottlenecks on our bigger customers who want to log everything forever with detail where the relational database just

isn’t responsive. So the initial goal is to improve that, to make it more useful for the customer and allow them to collect more data, more endpoints.”

- IT executive, endpoint security software company

A possible use scenario for them would be to feed the data collected into a datastore for detailed inquiries, as well as providing a means for blending together unstructured and structured data for overall utility reports. In the event of a malware infection, they could imagine overnight jobs running against it, digging into that with more depth, allowing them to include far more data points (and types of) than they can handle today.

“If you just want to return a result very quickly, Hadoop is actually not optimized for that. So for instance, if a prospective customer is visiting our

website and we want to know what products they are likely to purchase– that is what we consider to be real-time processing. Where one would use Ha-

doop-based analysis, is for example when building that model to predict what products the customer would purchase. There we need to go back and forth

and look at all of the values from previous site visits over a period of several years, across 10,000 customers.”

- Adam McElhinney

What Hadoop is NOT used forWhile Hadoop can offload serious data-intensive crunching for large-scale models, cutting the processing time from days to hours, it is not well-suited to real-time data processing with a relatively small number of records. Traditional relational databases are still viewed as the primary and best means for these types of queries.

The Current State of Hadoop in the Enterprise

12

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Reality 4: Satisfactory business user access and experience is still in process

To date, key Hadoop benefits are focused around the management and processing of data, i.e., providing greater flexibility and cost savings in data storage; faster processing of increasingly larger volumes of data; the ability to extend traditional data warehouses; and providing methods for archiving and querying data for longer periods of time to allow greater longitudinal analyses.

Accordingly, within the organization, the starting point for interest in Hadoop is the corporate IT department, which holds primary responsibility for provisioning and maintaining the Hadoop environment for data storage prior to analysis (whether using a commercial solution or a free download).

Hadoop is thus delivering on the IT goal to provide a cost effective and scalable solution for storing older data that is less frequently accessed.

“Let’s say we have ten years’ worth of claim data, clearly we do not need all the ten years of data all the time. Not all the data is as heavily utilized. The goal

is to make sure that the highly utilized data stays in the costliest applications and gets used the most because you want to get the most bang for your buck.

These traditional data warehouse appliances can be effective but expensive. You do not want to store stale data in there which does not get used.”

- Ravi Shanbhag, UnitedHealthcare

While IT and Hadoop power users are relatively satisfied with how their Hadoop systems are delivering on these goals, our study respondents said satisfaction is likely to be lower among business analysts and data analysts, most of whom have yet to leverage self-service data management and data exploration tools that can help them to move quickly and aggressively interact with the data stored in Hadoop. When business decision-makers can easily pull out insights that will guide their decisions, companies will start to realize the potential and achieve the goals they set out to obtain by investing in Hadoop.

Currently, data management and analytics projects involving Hadoop are limited to those that only a highly skilled team of engineers/scientists/developers can perform. This creates a bottleneck in getting projects completed. Hadoop also presents an HR challenge to find and employ the right people. In fact, some respondents indicate that even recent college grads aren’t necessarily trained to work in the Hadoop world yet. Traversing from the structured world of columns and rows is a difficult transition for many analysts to make, who are deeply rooted in the relational database world.

The Current State of Hadoop in the

Enterprise13

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Further delaying the value from Hadoop is the fact that tools that can provide common high-level language support for data management and analytics professionals and/or self-service tools for non-technical users are slowly evolving and being made available within the marketplace. Most study participants said that Hadoop will reach greater adoption throughout their organizations when there are enterprise-class tools and simple end-user interfaces.

“There’s just a whole set of work that people have to do that’s in the exploration side of the house and the vendors really haven’t provided us a good method for doing that for the masses. It’s been more of an engineer type of a mentality. So

that has to evolve in order to get more use out of it.” - Analytics director, entertainment industry

In general, satisfaction with Hadoop implementations is reasonably high, but there is room for improvement. While the initial set-up is viewed as being relatively easy, people find that the open source structure contains some gaps that require workarounds.

“I would say I am fairly satisfied. I have found some things to be more difficult than I thought were necessary.

I cannot responsibly go to my management and say, ‘Hey, let’s just stand up the open source variation of Hadoop and go wild with it’ without some type of

support and management structure behind it.” - Karl Moad, University of Pittsburgh Medical Center

The Current State of Hadoop in the Enterprise

14

©2015 IIA and SAS Institute Inc. All Rights Reserved.

RecommendationsTaken together, the actual experiences of Hadoop users today temper the fervor of the various Hadoop related market segments. Hadoop will undoubtedly play a central role in the data and analytics architectures of the future, but can also carry with it expense, rapid change and frustration in the near-term. As the Hadoop ecosystem continues to develop, reality will come into line with the promise. Until then, we conclude with a set of end-user recommendations:

Identify and define use cases that deliver competitive advantage and are strategic in nature. The majority of end-users interviewed confirmed that while Hadoop is a new technology, its strategic value is also understood among senior leaders. Applying Hadoop to high-profile, valuable use cases that rely on leveraging new data types can quickly rationalize the costs of deployment.

Evaluate whether and how Hadoop fits into your existing data and analytics architecture. As has been noted, the data storage cost advantage of Hadoop can cause some to confuse it as a data warehouse replacement. Successful end-user organizations should carefully plan on the role Hadoop will play within the existing data architecture. For some less analytically mature organizations, it may be too early to actually be useful.

Augment Hadoop with data management, data discovery and analytics to deliver value. For a Hadoop deployment to be worth the effort, business analysts will eventually need to access Hadoop to do their own data analyses. While the deployment itself is critical, remember that success will be evaluated in the eyes of the ultimate consumers of the insights driven from Hadoop data.

Reevaluate your data integration and data governance needs. Use of Hadoop as a data reservoir or as a data hub does not eliminate the need for data integration and governance as part of your modern data architecture. It is important to evaluate your current and future data integration requirements (e.g. acquire, clean, refine, aggregate, federate, etc.) to address variety of business problems and how will it comply with data governance requirements.

Assess skills/talent gaps early and develop a plan to mitigate those gaps before deployment. Among the hurdles experienced by end-user organizations, most pointed to being surprised about the level of skill needed to fully run Hadoop in production. High-performers are assessing the skills necessary before embarking and developing a plan to fill those gaps before setting overly-high expectations with their organizations for project delivery.

3

The Current State of Hadoop in the

Enterprise15

©2015 IIA and SAS Institute Inc. All Rights Reserved.

Bibliography/Endnotes

Adrian, Merv. Hadoop Deployments: Slow to Grow So Far. December 5, 2014. Blog. http://blogs.gartner.com/merv-adrian/2014/12/05/hadoop-deployments-slow-to-grow-so-far/

Boulton, Clint. Hadoop Analytics Is Finding Favor With More CIOs, Deutsche Bank Says. Blog. January 11, 2015. http://blogs.wsj.com/cio/2015/01/11/hadoop-analytics-is-finding-favor-with-more-cios-deutsche-bank-says/

Gantz, John; Reinsel, David. Extracting Value from Caos. Research Report. June, 2011. http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf

Gantz, John; Reinsel, David. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Research Report. December, 2012.

Graham, Bradley; Rangaswami, M.R. Do You Hadoop? A Survey of Big Data Practitioners. Research Report. October 29, 2013.

Gualtieri, Mike. Forrester’s Hadoop Predictions 2015. November 4, 2015. Blog. http://blogs.forrester.com/mike_gualtieri/14-11-04-forresters_hadoop_predictions_2015

Kelly, Jeff; Floyer, David; Finos, Ralph. Wikibon Big Data Analytics Adoption Survey, 2014-15. Wikibon. October 15, 2014. Online. http://wikibon.org/wiki/v/Wikibon_Big_Data_Analytics_Survey,_2014

Laney, Doug. Gartner Predicts Three Big Data Trends for Business Intelligence. Blog. February 14, 2015. http://www.content-loop.com/gartner-predicts-three-big-data-trends-business-intelligence/

Morgan, Timothy Prickett. Hadoop Finds Its Place In The Enterprise. EnterpriseTech Software Edition. October 29, 2014. Blog. http://www.enterprisetech.com/2014/10/29/hadoop-finds-place-enterprise/

Russom, Phillip. Can Hadoop Replace a Data Warehouse? Blog. January 27, 2015. http://tdwi.org/articles/2015/01/27/hadoop-replace-data-warehouse.aspx

Savvas, Antony. 70 percent of companies who have invested in big data have mostly done so for pilots, with only 12 percent using big data in full production environments. Online. October 21, 2014. http://www.computerworlduk.com/news/it-business/3581796/waiting-for-hadoop-is-like-waiting-for-godot-says-gartner/

Snaplogic. Enterprise IT Uncertainty Around Big Data Initiatives in 2015. February 17, 2015. Infographic. http://www.snaplogic.com/blog/_infographic-big-data-uncertainty-2015/

Statistic Brain. Average Cost of Hard Drive Storage. Data Table. November 11, 2014. http://www.statisticbrain.com/average-cost-of-hard-drive-storage

The Current State of Hadoop in the

Enterprise16

©2015 IIA and SAS Institute Inc. All Rights Reserved.

about iiaThe International Institute for Analytics (IIA) is the authority on analytics maturity and best practices and provides the advisory and support for organizations to leverage the power of analytics to drive business results. IIA encompasses a network of analytics experts committed to knowing and sharing the keys to success in an economy increasingly driven by data.

IIA guides mission driven organizations as they build and grow their analytics programs. With an in-depth research library, phone-based and in-person events, and custom training and advisory services, IIA is an extension to business leaders and implementers to provide the strategic guidance required to be an analytical competitor.

about SASSAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Customer analytics solutions from SAS offer the processes and technologies that allow marketers to plan, coordinate and evaluate the success of their marketing initiatives. By putting data in the hands of business users, marketing programs become more effective and the organization becomes more efficient in execution. Deeper insights from data gathered help organizations to become customer-centric by understanding their customers better and improving customer loyalty. Since 1976 SAS has been giving customers around the world THE POWER TO KNOW®.