hype cycle for information infrastructure, 2014...10/8/2014 hype cycle for information...

10/8/2014 Hype Cycle for Information Infrastructure, 2014

http://www.gartner.com/technology/reprints.do?id=1-22RSJA5&ct=141003&st=sb 1/59

Hype Cycle for Information Infrastructure,20146 August 2014 ID:G00261653

Analyst(s): Mark A. Beyer, Roxane Edjlali

VIEW SUMMARY

Innovation supporting information infrastructure is evolving rapidly, and with everincreasinginfrastructure capacity. This Hype Cycle sits mainly on the Peak of Inflated Expectations. We encouragedepartmentlevel experimentation without enterprise commitment over the next three to five years.

TABLE OF CONTENTS

CONTENTS

AnalysisWhat You Need to KnowThe Hype CycleThe Priority MatrixOff the Hype CycleOn the Rise

IMCEnabled MDMSparkCloud MDM Hub ServicesData LakesOpenSource Data Quality ToolsSelfService Data IntegrationFile AnalysisCloudBased Data Identification and Enrichment ServicesMDM AppletsGraph Database Management ServicesMultivector MDM Solutions

At the PeakMDM Professional ServicesInformation Capabilities FrameworkInformation Semantic ServicesiPaaS for Data IntegrationTableStyle Database Management ServicesEnterprise Metadata ManagementHadoop SQL InterfacesInformation Stewardship ApplicationsInternet of ThingsLogical Data WarehouseContextEnriched ServicesDocument Store Database Management SystemsComplexEvent Processing

Sliding Into the TroughBig DataKeyValue Database Management SystemsMultidomain MDM SolutionsInMemory Database Management SystemsData Quality Software as a ServiceEnterprise Taxonomy and Ontology ManagementDatabase Platform as a ServiceMDM of Supplier Data SolutionsHadoop DistributionsCrossPlatform Structured Data ArchivingEntity Resolution and AnalysisInMemory Data GridsData Warehouse Platform as a Service (dwPaaS)OpenSource Data Integration ToolsData Integration Tool SuitesSaaS Archiving of Messaging Data

Climbing the SlopeContent Integration



Table 1. Hype Cycle Phases

Table 2. Benefit Ratings

Table 3. Maturity Levels

Figure 1. Hype Cycle for Information Infrastructure, 2014

Figure 2. Priority Matrix for Information Infrastructure, 2014


MDM of Customer DataMDM of Product DataContent MigrationEnterprise Information ArchivingData Federation/VirtualizationData Quality ToolsInformation Exchanges

AppendixesHype Cycle Phases, Benefit Ratings and Maturity Levels

TABLES

FIGURES

AnalysisWhat You Need to KnowInformation infrastructure is being challenged by an entire series of forces that are introducing newapproaches and technologies. Organizations need to be aware of these forces and the solutions that areemerging, to meet the new demands on their infrastructures. The new definition of infrastructure hasgone beyond flexible and robust and now includes adaptation technology for multiple use cases, as wellas multiple information types and uneven skill levels.

Big data is getting a reality check. Traditional suppliers and vendors are incorporating the lessonslearned, and the acquisitions will soon begin. Advanced analytics are forcing "best fit engineering" — aconcept that will rise in the information infrastructure space through 2018. Best fit engineering rejectsthe concept that a single, integrated platform should force compromises in architectural choices andspecifically indicates using a bestofbreed approach when there are distinct advantages to usingspecific technology solutions.

Gartner defines information infrastructure as "the technology capabilities or practices that enableinformation producers and consumers to describe, organize, share, integrate and govern any type ofdata and content, anytime, anywhere." It is the enabling technology building block of the enterpriseinformation management framework (see "Gartner's Enterprise Information Management FrameworkEvolves to Meet Today's Business Demands"). Technology is introduced to the Hype Cycle when atechnology or practice trigger occurs. Within the information infrastructure, these consist of newprocessing approaches and physical technologies used in capturing, managing, manipulating andproviding access to information assets throughout an organization. The practices and technologiesincluded in this Hype Cycle represent fundamental changes in information management that werecognize in 2014.

Information is at the center of how organizations are preparing their transition into a digital business,changing the way business is done. Information is the context for delivering enhanced social and mobileexperiences, and for connecting the IT and operational technology (OT) worlds. Mobile devices are aplatform for effective social networking and new ways of working, social links people to their work andeach other, in new and unexpected ways, and cloud enables the delivery of information andfunctionality to users and systems. The Nexus of Forces intertwines to create a userdriven ecosystemof modern computing. The digital business expectations for 2020 are also driving a demand forprocessing and analyzing streaming data more efficiently.

The Hype Cycle for Information Infrastructure, 2014 helps CIOs, information management leaders andarchitects understand the evolutionary pace of established information management technologies,creating awareness of emerging technologies and approaches. This will help them to make the most oftheir investments in data management and planning for optimal adoption and deployment strategies.

The Hype CycleAs specific new hardware technology is developed, it creates more speed, more space or more capacityfor parallel processing. This, in turn, drives the evolution and innovation of information managementsoftware and processing to take advantage of this new capacity. As the information processingcapabilities advance, they, in turn, enable the capture of even finergrained and higher volumes ofinformation. This new stress on the information infrastructure then overwhelms existing hardwareadvances and new information management practices are developed.

The process continuously repeats, regardless of isolated business drivers because the overall pattern ofinformation management is to deliver an everincreasing number of methods to capture moreinformation. Inmemory database management systems represent one example of this cycle, while theuse of hyperthreading capability in processor cores represents another. These hardware advances



require new processing and software design approaches to take full advantage of the much largercapacity of this innovation.

Information infrastructure supports the ongoing "production use" of data, while also being influenced by(and required to evolve to support) the organization's nextgeneration data demands (see "InformationManagement in the 21st Century" and "Top 10 Technology Trends Impacting InformationInfrastructure, 2013") — including those technologies related to what Gartner calls the "digitalbusiness." Significant innovation continues in information infrastructure technologies and practices. Thekey factors driving this innovation are the explosion in the volume, velocity and variety of information— and the huge amount of value (and potential liability) locked inside all this ungoverned and underutilized information. This is a very dense Hype Cycle primarily because it represents a wider marketmovement to address information management in a more holistic manner — yet, the innovativeconcepts emerge separately before they can be consumed in a more cohesive infrastructure.

Several factors are creating opportunities while at the same time introducing evolutionary challenges:

The pressure to manage very large volumes of data at greater speed for a variety of types,formats and sources of information continues to grow.

The expectation that information will be used to drive business insight and agility, creating theneed for a more holistic approach to managing information — effectively moving informationmanagement out of the IT department.

The growing regulatory demands for information governance.

The evolution of technologies for data persistence and integration, including the management ofdata deployed in alternative approaches, such as platform as a service (PaaS) and SaaS.

The is a need to support enterprise growth via information technology (including priorities like businessintelligence and analytics, mobile and cloud computing). In addition, new hype surrounding the conceptof the digital business will increase the confusion regarding infrastructure decisions. This is actually amanifestation of a "skills versus technology" decision, which is itself related to the innovation dilemma.Organizations are finding it increasingly important to use technology to enable skilled individuals, whileat the same time providing robust, repeatable technology solutions for those less skilled.

"Old is new again" is the battle cry for attempts to revitalize existing concepts, often with new names.However, new technologies are also making old ideas truly new again as they overcome previousbarriers. Not all of these concepts are misguided marketing attempts to create new markets with newdemands.

Organizations are increasingly seeing the haphazard, uncoordinated and reactive data managementapproaches they have used over the years as impediments to progress in business agility and efficiency(evidenced by interest in more than double the enquiries for MDM and logical data warehouse solutions,as well as increases in data quality and information architecture calls).

Importantly, the budgeted mediocrity in legacy management practices associated with vendors andtools is close to failing. Most have been implementing separate data and content infrastructure, but arenow striving to become more informationcentric using information infrastructure as a means ofevolving to a single endpoint for data management. In 2012, Gartner predicted that through 2015,business analytics needs will drive 70% of investments in the expansion and modernization ofinformation infrastructure — and this trend has continued.

These factors, and others, are applying new or increased demands on the underlying informationinfrastructure. New, nextgeneration, capabilitiesdriven information infrastructure concepts likeGartner's Information Capabilities Framework or architectural approaches like the logical datawarehouse are of key importance (see "Introduction to Gartner's Information Capabilities Framework,""The Importance of 'Big Data': A Definition" and "Understanding the Logical Data Warehouse: TheEmerging Practice"). More importantly, both inhouse staff and professional services implementers arepursuing this modernization (especially the three cited practices) under various names.

This analysis will focus on information infrastructure technologies, while "Hype Cycle for EnterpriseInformation Management" focuses on disciplines. Note that "Hype Cycle for Big Data, 2014" has someoverlap with the information infrastructure technology capabilities required to support big data coveredin this analysis. This Hype Cycle includes master data management as a central piece of informationmanagement, both from a discipline and technology perspective (and is covered here regarding thetechnology aspect) and in the Hype Cycle for Information Management (from a discipline perspective).

"Big data" is crossing the Peak of Inflated Expectations, and progressing to a point where it will becomean embedded and stateoftheart practice by 2018, with some big data practices and technologieslagging and taking until 2020 before they are considered to be in a state of productivity. It is becomingmore likely that big data management and analysis approaches will be incorporated into a variety ofexisting solutions in existing markets. As a result, we are seeing many of the specific technologiesbecoming part of the information infrastructure and justifying to be tracked separately — but as usecases, benefit and adoption can vary substantially.

The "data lake" is a notable addition in 2014. It is a more recent incarnation of previously failedapproaches to create a multivariate data management approach for analytics. It has a high potential formisdirection (with three distinct descriptions used in the market already), but also has the potential tosucceed if the scope is carefully managed to match the intended use cases.

Finally, governance of information is becoming a major concern for organizations, even more so asmore information is being managed. The topic of governance and practices is as equally significant astechnology infrastructure. As a result, Hype Cycle for Enterprise Information Management containsnearly as many technology profiles as this Hype Cycle.




Source: Gartner (August 2014)

The Priority MatrixOrganizations should consider how each of the information infrastructure technology capabilities helpsto address their enterprise information management objectives over the following years.

The big news is that the hype regarding big data is getting a reality check. With issues of datasovereignty (national laws and regional practices) and security, as well as priority management andintegration with traditional systems emerging, big data is over the peak and heading for the Trough ofDisillusionment.

It is important to remember that the definition of big data is not just the technology, but the innovativenature of that technology — which began causing disruption in traditional data management marketsaround 2010 and is now being answered in mid2014 by more traditional suppliers. As new globalregions and new verticals begin to identify more use cases for big data, the revenues for big datasolutions will continue to grow, but at this point, we will see traditional vendors deploying connectors,SQL engines, incorporating big data file systems (for example, HDFS) and enabling their already maturedistributed processing management tools to manage MapReduce and graph analytics jobs. At the sametime, Spark is introducing the concept of realtime analytics for streaming data.

There's plenty of hype left in big data, but innovation will give way to developing maturity in theenvironment. The race is on between new vendors to mature their products and offerings beforetraditional vendors enable new processing models and new data types. The new suppliers will needsignificant research and development investment as well as channel development, but traditionalvendors already have this. In a classic "tortoise versus the hare" race, traditional vendors have beguntheir slow and steady progress toward incorporating the innovation from the fast but inconsistent newofferings. The trough is the forge or crucible of the Hype Cycle where many offerings go in, but whenbig data comes out, the survivor count will be much lower and the market will begin to pick the three orfour winning solutions in each addressable area.

On the horizon, the Internet of Things (IoT) will demand a new area of innovation, such as cloudcapabilities for information infrastructure that to this point have been laboratory concepts. Cloudbaseddata integration will demand a new type of distributed processing optimization, as well as datacollection optimization. Fundamentally, the IoT is an information management challenge that beginswith messageoriented middleware.

Importantly, not all of this infrastructure will be under a centralized control and management system.Digital businesses will demand capabilities that extend their information management and optimizationstrategies far beyond traditional, tightly managed data centers. Concepts around data governance anddata enrichment that are specifically targeted for use in the cloud will force new data governance andstewardship models.

Larger trends are having a significant impact on the Hype Cycle for Information Infrastructure, 2014.The increased emphasis on the efficient use of inmemory capabilities appears in multiple technologies.Inmemory data grids enable the use of distributed memory for massive inmemory processing forapplications and at the same time, are increasing the speed of both transactional data managementand analytics. In what appears to be a new type of information or data management technology, thehybrid transactional and analytic platform (HTAP), inmemory database solutions are speeding upcolumnvectored databases beyond the already significant optimization approaches that have alreadymade these "vertical" databases faster.

At the same time, a large class of technologies and practices on the Hype Cycle that are classified as



having moderate benefits introduce the concept of gradual, incremental, as well as cumulativeenhancements to the overall information infrastructure. Crossplatform structured data archiving isallowing for the retirement of data assets from missioncritical platforms into extended archiveenvironments at lower costs, but without losing access to the data.

Various "as a service" solutions are presenting alternative cost models for traditional solutions.Similarly, many new forms of databases are being introduced to provision specific engineering solutionsto old problems. All of this innovation at first seem to be incremental and even "unexciting," but havethe overall effect of giving organizations the option to deploy specific solutions that gradually lowercosts and promotes a more focused approach to using both human and hardware resources.

Figure 2. Priority Matrix for Information Infrastructure, 2014


Off the Hype CycleDatabase appliances, columnstore database management systems and opensource DBMSs alladvanced off the Hype Cycle as they entered into full market productivity.

On the RiseIMCEnabled MDMAnalysis By: Andrew White; Bill O'Kane

Definition: IMCenabled MDM solutions are those that persist the operational master data (forexample, golden record) hub inmemory (DRAM) for all uses — operations and analytics.

Position and Adoption Speed Justification: Demand for IMCenabled MDM solutions is justemerging and will likely grow in tandem with demand for MDM in general. The hype related to thistechnology, however, is running ahead of demand. Demand itself is very businessdriven, meaningthat, when the volume of data processing or analysis exceeds several million records in a realtimeformat, an ondisk approach has certain limitations that lead to complexity in deployment. However,there are not that many MDM implementations around the globe with such requirements. There aresome (we estimate less than 1% of the overall MDM installed base), and the number of them aregrowing, but such penetration, maturity and demand is very early. We do expect increases in demandsoon because certain industries have large volumes of master data that need to be accessible in realtime for many uses. Retail is a good example (millions of product records and millions of consumer



records). With the hype associated with inmemory computing, MDM is being dragged along with it.

In 2013, a number of software vendors in the MDM market started to offer versions with inmemoryenabled capability. The first vendors focused more on a realtime reporting capability from an inmemory copy of the operational hub that persisted on disk. That is, the operational DBMS was stillstored on disk, but a copy of that data was loaded into memory and maintained there for realtimeanalytical processing. There would be a delay between operational updates to the data on disk, andupdates to the "real time" inmemory analytical hub. Several vendors had announced plans to movetheir entire operational MDM solution into memory, but in 2013, only one vendor actually had an MDMcustomer live with an operational DBMS inmemory.

In 2014, we see increased hype, driven mostly by the passion for what inmemory computing can dofor IT delivery of its solutions to the business. Demand remains muted, but it is increasing slowly. Weexpect to hear of and see several more "live" implementations of IMCenabled MDM, and moredeployment of realtime reporting and analysis use cases. Some vendors will likely offer their singledomaincentric solutions inmemory; others will offer their multidomaincapable MDM solutions inmemory.

User Advice: If and when you begin mastering very large volumes of master data that may requirelarge volumes of data updates, processing or analysis, you may need to inquire your preferred MDMvendor about their inmemory computing strategy. Increasingly, vendors are aware of the possibility ofyour question; but too few have castiron visions that are practical for how they would meet yourneeds. Watch carefully for new product strategies and road maps in this area. Use and demand livereferences as the primary way to determine if a vendor has solved this capability.

Do not get caught up in the hype. Do not assume you need an IMCenabled MDM solution — at leastnot yet. Look for one only when your volumes, latency and/or performance requirements warrant it.

Lastly, be aware that operating a business application inmemory and an MDM hub inmemory could bejust as complex — in terms of data integration, harmonization and governance — as running each ondisk. The complexity is not derived from the location where the data is stored or processed (that is, ondisk or inmemory) but is derived by the number of different data models that have to be related toeach other. Thus, if business application "A" is inmemory and business application "B" is on disk, anMDM hub still has to govern information across three data models — even if the MDM hub is also inmemory. The challenge is not where the data resides; the challenge is in the semantic inconsistency inthe underlying data model.

Business Impact: MDM itself provides significant potential benefits. An IMCenabled MDM solutionoffers two benefit trajectories. The first is logically explained as "faster MDM." In other words, largerscale MDM projects can be supported in a realtime environment. This is not required in every MDMprogram. The other benefit trajectory is that IMCenabled MDM could in fact revolutionize MDM and,specifically, how organizations enforce information governance. The need to interject with businessprocesses, analytics and business applications, in order to enforce information policy, takes time. If allthe technology needed to sustain this is in one inmemory space, the possibility that the effort toautomate MDM becomes virtually transparent to the business is intriguing. However, this will take manyyears to appear in the market, and it will require a lot of change in many other areas such as businessintelligence, business applications and business process management.

Benefit Rating: High Market Penetration: Less than 1% of target audience

Maturity: Emerging

Sample Vendors: Riversand Technologies

SparkAnalysis By: Merv Adrian; Roxane Edjlali; Nick Heudecker

Definition: Apache Spark is an opensource inmemory computing environment that can be used withthe Hadoop stack. It will better utilize parallel processing power for highperformance computing (forexample, machine learning, optimization, simulation, graph analysis). It is being widely evaluated as amore flexible replacement for MapReduce's acyclic data flow model. Position and Adoption SpeedJustification: Spark started at UC Berkeley as a research project in 2009 and in 2014, it became anApache toplevel project.

A major release (1.0) became available in June. Leading Hadoop distributors are including it in theirofferings: DataStax supports it with Apache Cassandra, SAP with Hana, and it is available via AmazonWeb Services for use with S3 data. Databricks is providing certification, training and evangelism thatmirror the early Hadoop model, and partnering with most of the players in that market to leverage theirexisting investments and momentum.

Additionally, Databricks recently announced Databricks Cloud, a cloudhosted environment meant tomake it easy to get started with Spark. Spark is an example of the growth of ambitions in the big datacommunity as it moves beyond batch processing, coarsegrained workloads, simple extraction,transformation and loading (ETL) use cases and largevolume analytics runs.

The emergence of YARN in the Hadoop stack is an enabler, but Spark is not confined to the Hadoopcommunity, and Databricks and its partners hope to find a market beyond that set of users. Itsaccelerated market cycle benefits from, but is not constrained by, the growth of the Hadoop ecosystemand use cases. Its current availability for interactive use with Python or Scala will initially appeal to datascientists frustrated by the limitations of early Hadoop models. User Advice: Organizations that arelooking at big data challenges — including collection, ETL, storage, exploration and analytics — shouldconsider Spark for its inmemory performance and the breadth of its model. It supports advanced



analytics solutions on Hadoop clusters, including the iterative model required for machine learning andgraph analysis. Despite its significant commercial momentum and an emerging set of applications tosupport or build atop it, Spark is in its early stages.

As of July 2014, not all commercial distributions are yet generally available. There are advantages in itshosted model, both for setup and operational cost and in its support for widely used tools such asPython and Scala in addition to SQL, but it will require significant skills and is likely to have stabilitychallenges in its early releases. Moreover, alternative proposals exist — for example, Google favors itsown stack around Cloud Dataflow, and SAS is leveraging its own Lasr highperformance computinglayer. It can also be expected that the vibrant creativity of the opensource community will raise otheralternatives in the next few years. For now, its use in most organizations should be viewed asexperimental, but very promising.

Business Impact: Spark will facilitate much faster data processing on Hadooplike infrastructures. Inaddition to speed, users can also expect a maturing programming model, application libraries (forexample, for machine learning, optimization and simulation) and continuing adoption for highperformance analytics, data mining, filtering and other uses.

Benefit Rating: High Market Penetration: Less than 1% of target audience

Maturity: Emerging

Sample Vendors: 0xdata; Apache Software Foundation (Hadoop); Cloudera; Databricks; DataStax;Hortonworks; MapR; Pivotal; SAP

Recommended Reading: "Taxonomy, Definitions and Vendor Landscape for InMemory ComputingTechnologies"

Cloud MDM Hub ServicesAnalysis By: Bill O'Kane

Definition: Cloud master data management (MDM) hub services are either multitenant data masteringservices offered as software as a service (SaaS) or for assembly and development as platform as aservice (PaaS). MDM hubs are the physical result of an MDM implementation, and can take the form ofmultiple implementation styles.

Position and Adoption Speed Justification: At present, nearly all MDM software is implemented onpremises, and many industries (financial services, for example) are reluctant to place important,frequentlyshared information (such as customer or financial master data) outside of a firewall. Inaddition, there is resistance to the idea that the governance and stewardship of one's primaryinformation assets can be operated from outside of the firewall, even if the policy formulation itselfremains work that takes place onpremises by business users. Privacy issues have also dampenedadoption of SaaS in some regions such as Europe.

However, onpremises MDM solutions are increasingly being integrated with SaaSbased businessapplications, such as those of salesforce.com, and various cloudsourced data services (such as dataquality or data integration as a service) that can help with onpremises MDM implementations. Inaddition, a few of the established MDM vendors have recently brought cloud versions of their MDMofferings to market in anticipation of future demand, generating increased hype, if not yet actualuptake.

Gartner is seeing an increase in interest in cloud MDM hub services in 2014 from end userorganizations, although they are still a very small component of the overall MDM market spend. Lessthan 4% of our MDM inquiries in the previous year had a cloudbased element (up from 3% a yearago). Service providers and vendors, including Cognizant, IBM, Informatica, Oracle, OrchestraNetworks, and Tibco are among those with cloud MDM hub services, though some of these offer lessfunctionality than their onpremises counterparts. Several other MDM vendors, including RiversandTechnologies and Stibo Systems, have cloud MDM hub service implementations — or offer such supportonly as a proof of concept — but are not aggressively marketing them. Finally, Dell Boomi (a leader inthe integration platform as a service market) has also recently introduced a cloud MDM hub servicesoffering, with no onpremises option.

User Advice: As cloud MDM hub services become more mature, and if you are an early adopter,consider implementing these solutions if you don't have the inhouse skills, if funding out of operatingexpenses is more acceptable than out of capital expenses, and if the planned MDM capability isrestricted in scope to mainly one functional group within your organization (or is initiativespecific andsuited to a consolidationstyle implementation mainly used for analysis and reporting purposes). If youare implementing an enterprisewide operational MDM capability that requires tight, realtime,potentially transactional integration with existing and new business applications, or that requirescomplex workflow and application interaction patterns, then execute the MDM solution onpremises orin a private cloud behind the firewall.

Organizations with complex requirements should not assume that they will significantly lower their totalcost of ownership, or reduce complexity, by moving to cloud MDM hub services. However, organizationswith tight capital budgets, those that have constrained IT resources, or those that want to deploysomething simple quickly, should consider these services, as long as they mitigate their risks withappropriate governance controls.

Business Impact: The availability of cloud MDM services potentially offers organizations new fundingmodels, deployment flexibility and improved time to value. However, in common with other cloudofferings, cloud MDM services may become more expensive as the years go by because the operatingexpense does not decrease in subsequent years. These services will also be potentially helpful to



companies that do not have the IT resources to deploy and maintain onpremises software.

Cloudbased technologies often lead to greater fragmentation of the enterprise's software infrastructureand applications portfolio into a hybrid IT model. In contrast to the vision of an onpremisesapplications portfolio dominated by a single vendor's suite, many organizations are moving toward ahybrid mix of onpremises and cloudbased business applications. Endtoend business processes will,however, need to be optimized across these multiple platforms and systems and, because master datais the most heavily shared information in an organization, there is often a need to build a great deal ofcomplex integration into the MDM system, the contributing data sources and the endtoend processes.

Due to the level of complexity involved in a typical MDM implementation, best practices around cloudintegration should be strictly followed. Organizations with sensitive data may well feel that master datacan't be put in a public cloud, but a private cloud is fine because the data is effectively within thefirewall. In either case, there are real benefits to be realized in the areas of rapid prototyping anddevelopment, reduced infrastructure costs, and the relegation of MDM software upgrades (which areoften quite complex and risky) to the service provider.

Benefit Rating: Moderate

Market Penetration: Less than 1% of target audience

Maturity: Emerging

Sample Vendors: Cognizant; Dell Boomi; IBM; Informatica; Oracle; Orchestra Networks; SumTotalSystems; Tibco Software

Recommended Reading:

"The Impact of CloudBased Master Data Management Solutions"

"The Advent of Master Data Management Solutions in the Cloud Brings Opportunities and Challenges"

"Three Trends That Will Shape the Master Data Management Market"

"Platform as a Service: Definition, Taxonomy and Vendor Landscape, 2013"

"Data in the Cloud: The Changing Nature of Managing Data Delivery"

Data LakesAnalysis By: Mark A. Beyer; Nick Heudecker

Definition: A data lake is a collection of storage instances of various data assets additional to theoriginating data sources. These assets are stored in a nearexact, or even exact, copy of the sourceformat. The purpose of a data lake is to present an unrefined view of data to only the most highlyskilled analysts, to help them explore their data refinement and analysis techniques independent of anyof the systemofrecord compromises that may exist in a traditional analytic data store (such as a datamart or data warehouse).

Position and Adoption Speed Justification: The data lake concept is a reconstituted approach to adata collection or enterprise data hub. It has emerged as an alternative to traditional analytics datamanagement solutions, such as data marts and data warehouses. It is intended to mirror the "schemaon read" approach pursued by big data processing.

The data lake concept has attracted extreme hype in less than a year as organizations seek alternativesto more canonical approaches to consolidation (such as logical models, data warehouses andstandardized message queues). Most of the interest is currently in North America, more specifically theU.S., but the concept is starting to attract hype in EMEA as well. As traditional vendors incorporate theirown ability to combine schemaonread with schemaonwrite solutions, the hype will decline rapidly.

Some vendors argue that a data lake can be virtual, but that is data federation.

The concept of persisting metadata about objects in a data lake, and data in its native formatting andstructure, has value. If the current hype could be redirected toward introducing appropriate use cases,instead of attempts to replace existing technology and practices, the data lake could finally become asolid value proposition. A "data lake" is a new term for a longstanding practice. It is intended to offeran alternative analysis environment isolated to innovation areas and, more specifically, to data minersand data scientists. If used appropriately in these areas and by these roles, data lakes will be ofconsiderable benefit and eventually reach the Plateau of Productivity. However, if data lakes are insteadpromoted for wider use by lessskilled analysts, they will be abandoned due to difficulties in managingand accessing them.

User Advice:

The fundamental assumption behind the data lake concept is that everyone accessing the datalake is moderately to highly skilled at data manipulation and analysis. Implementers and vendorspromoting data lakes recognize this by recommending that someone in the business coordinatedata lake navigation but offer scant details of how this will support businesscentric timetodelivery requirements.

The number of people capable of using a data lake will be small. Not all users should useadvanced tools and open architectures. Follow the approach of introducing candidates, developingcontenders and, when appropriate, moving from data lake to data warehouse with compromisemodels. Advanced practitioners such as data scientists and data miners will develop multiple



candidate analytics, advanced business analysts will choose contender models, and the majority ofusers will be expected to use the best compromise that emerges.

By definition, data stored in data lakes lacks semantic consistency and governed metadata.Dispensing with these makes data analysis a highly individualized condition (a consumerizationofIT goal) at the expense of any easy comparison or contrast of analytic findings. Because usersgive highly individual and personal contextualization to the data, there is little or (more likely) noleveraging of experience from user to user.

There are certain SLA expectations that can be served by data lakes (namely that advanced userswill be able to conduct data discovery and data science investigations) and are compatible with bigdata solutions, but the majority of enduser SLAs rely on repeatability, semantic consistency andoptimized delivery. Explore alternative information management architectures, such as the logicaldata warehouse, to rationalize how information is stored with how it is used.

Business Impact: The data lake concept has potential to have a high impact on organizations, but itsimpact is only moderate at present. To get full value from a data lake, its users must possess the skillsof system analyst, data analyst and programmer, as well as significant mathematical and businessprocess engineering skills. The greatest benefits will therefore accrue only when data scientists and/orhighly skilled data miners use a data lake to enable the practice known as "data fusion" (see "DataFusion Fuels Integration: The Information Reactor of the Distant Future" [note that this document hasbeen archived; some of its content may not reflect current conditions]), as well as data discoverytechniques.

There are several possible implementation options for data lakes. In the most basic, data is simply"landed" on lowcost commodity storage and any data refactoring or rationalization is deferred untildata is actually accessed for a schemaonread approach. This is, to some extent, expedient for the ITorganization, but it requires programmer and system analyst skills from business personnel, as well astools and skills to navigate the lake.

Depending on the method of implementation, a data lake can be a lowcost option for massive datastorage and processing. Processed results can be moved to an optimized data storage and accessplatform, based on business requirements and tool availability. However, the potentially high impact ofthis will be diluted by vendors seeking to use the term "data lake" merely as a means of gaining entryto the highly mature analytics and data management markets, with the intention merely of convertingthe data lake into a data warehouse architecture under a different name.



Maturity: Adolescent

Sample Vendors: DataStax; Pivotal; SAS


"The Confusion and Hype Related to Data Lakes"

"Harness Data Federation/Virtualization as Part of Your Enterprise's Comprehensive Data IntegrationStrategy"

"Best Practices for Securing Hadoop"

"The Logical Data Warehouse Will Be a Key Scenario for Using Data Federation"

OpenSource Data Quality ToolsAnalysis By: Ted Friedman

Definition: Opensource data quality tools are the latest opensource addition to the informationmanagement family of products. They provide functionality under an opensource license agreement,which typically includes a subset of the key data quality operations: data profiling, cleansing, matching,monitoring and standardization.

Position and Adoption Speed Justification: This market segment is still largely in its infancy. Only ahandful of opensource projects are underway that provide various data quality offerings, from freelydownloadable community editions with limited functionality to more fully featured commercial editionsbundled with vendor support. Most opensource data quality tools offer limited functional capabilitiesrelative to the leading commercial packages. For example, many focus on a single aspect of dataquality, such as data profiling or matching or address standardization. However, some opensourcevendors already provide a comprehensive feature set, including profiling, cleansing, matching,enrichment and monitoring.

Awareness of the available opensource data quality offerings is low, as evidenced by infrequentinquiries on this topic from Gartner clients. The overall adoption by enduser organizations is stilllimited. Consequently, the position on the Hype Cycle remains fairly close to the Technology Trigger.

User Advice: Recognize that the opensource movement has not yet had any effect on the data qualitymarket, as its market share remains below 1%. The available toolsets are weaker in most functionalareas compared with commercial offerings, and they are not being commonly adopted in ITorganizations at this point. Because data quality is a complex problem area, a combination of solidtechnology, services and the availability of skills is critical to success. Today, only commercial offeringshave a significant combination of these elements.



Recognize that the overall cost of ownership of data quality tools includes more than just the purchaseprice of the software. Use your enterprise's overall opensource strategy and experience with otheropensource offerings to judge the potential opportunities and risks of deploying opensource dataquality tools in your enterprise. For organizations new to data quality tools, the opensource route withcommunity editions may be the least expensive path because there are no licenses to acquire —however, cost must be carefully weighed against functional capabilities and ease of deployment.

Business Impact: Opensource tools could reduce the cost of implementing data quality processes.Because data quality needs are pervasive, this would be applicable to enterprises in all industries, aswell as to small and midsize businesses that lack the budgets, infrastructure and skills for morecomprehensive data quality tool suite deployments. However, given the state of opensource dataquality technology and the general lack of awareness in the market, it is unlikely that organizations willderive significant business value from these tools in the near term.

The significant appetite for lowercost IT infrastructure will create an opportunity for opensource dataquality vendors to raise the interest in their products. In addition, with the overall increase of interest ininformation governance competencies — including a strong focus on data quality — many providers ofvarious applications and services will want to embed data quality capabilities in their offerings. This willcreate another promising channel of demand for these tools.

Benefit Rating: Low


Maturity: Embryonic

Sample Vendors: Human Inference; Infosolve Technologies; SQL Power; Talend


"Who's Who in OpenSource Data Quality (2012 Update)"

"Human Inference Explores Open Source With DataCleaner Acquisition"

"Magic Quadrant for Data Quality Tools"

SelfService Data IntegrationAnalysis By: Rita L. Sallam; Roxane Edjlali

Definition: Selfservice data integration semiautomates the data loading, modeling, preparation,curation, profiling, data quality and data enrichment process for structured and unstructured data,making the data integration process accessible to business analysts in addition to traditional IT users.These platforms feature automated machine learning algorithms that visually highlight the structure,distribution, anomalies and repetitive patterns in data with guided businessuseroriented tools toresolve issues and enhance data.

Position and Adoption Speed Justification: Data preparation is one of most difficult and most timeconsuming challenges facing business users of business intelligence and data discovery tools. To date,data integration tools and techniques have either been separate and ITcentric; or basic functionalityfor data mashup has been bundled with data discovery capabilities. Data can be very large, diverse andcoming from both internal sources and lesswellknown external sources. Therefore, simplifying howusers assess and discover the shape of the data, clean the data and create reusable components can beof tremendous value to companies deploying businessuseroriented analytics as they try to makebusiness intelligence (BI) content and application development far more productive and agile, andthereby drive greater use. However, as the number of sources and variety of data types grow, datapreparation steps also grow in complexity.

Selfservice data integration requires that users master both the technical aspects and the businessrequirements of joining data together. Selfservice data integration goes beyond the capabilities of mostdata discovery tools that offer basic data mashup to help users prepare their data for analysis and canbe very timeconsuming (see "Magic Quadrant for Business Intelligence and Analytics Platforms").

There are three types of vendors providing this capability:

Standalone, vendoragnostic, selfservice data integration platform offerings, such as Paxata,Trifacta, Tamr, and ClearStory.

Standalone, selfservice data integration capabilities that extend ITcentric data integrationplatforms, such as those from Informatica (Data Harmonization), Microsoft (Power Query), IBM(Data Click) and Tibco (Clarity).

Purposebuilt, selfservice data integration capabilities bundled with data discovery and BIcapabilities that extend basic businessuser data mashup.

Some selfservice data integration tools may be positioned as vendoragnostic with a range of datastorage options, but others may be preferred by users of a certain BI platform (e.g., Alteryx's Tableau,with the ability to create Tableauspecific files). Many of these tools are cloudbased, which shouldenhance businessuser accessibility to data, but may pose challenges for organizations that don't wantto duplicate their data in the cloud. We expect selfservice data integration to experience a rapidascension along the Hype Cycle because it extends businessuser data mashup to address a perenniallydifficult and highvalue problem of data preparation.

User Advice: Business intelligence and analytics leaders and business analysts, particularly in



companies where data discovery platforms have been widely deployed, should look at selfservice dataintegration to extend businessuser data mashup to help business users enhance and streamline theirdata preparation time and effort while improving data sharing and reuse.

Business intelligence and analytics leaders should develop a deployment strategy for these tools, whichfactor in different user profiles, as selfservice data integration should be more tightly managed thandata discovery deployment. Because the market is emerging, BI leaders should consider a range ofvendors, including incumbent data integration platforms, data discovery and BI vendors that areextending their offerings, as well as a range of startups that are driving innovation.

Overall, data integration requirements have evolved beyond bulk/batch data loading by adding datasynchronization or data virtualization. Selfservice data integration tools do not offer the same breadthof capabilities as fullblown data integration solutions. As many organizations are already equipped withdata integration tools, they should identify how to best complement existing capabilities with selfservice data integration.

Finally, while selfservice data integration will encourage sharing and reuse of data for BI purposes, itwill not, by itself, replace the need for a proper data governance program.

Business Impact: Selfservice data integration is attempting to do for traditional ITcentric dataintegration what data discovery platforms have done for traditional ITcentric BI: to reduce thesignificant time and complexity users face in preparing their data for analysis (as much as 80% of theoverall analytics development effort), and to shift much of the activity from IT to the business user tobetter support governed data discovery.

Benefit Rating: High


Maturity: Emerging

Sample Vendors: Alteryx; ClearStory; Informatica; Microsoft; MicroStrategy; Paxata; Tamr; TibcoSoftware; Trifacta


"Magic Quadrant for Business Intelligence and Analytics Platforms"

"Magic Quadrant for Data Integration Tools"


File AnalysisAnalysis By: Alan Dayley

Definition: File analysis (FA) tools analyze, index, search, track and report on file metadata and, insome cases, file content. This supports taking action on files according to what was collected. FA differsfrom traditional storage reporting tools not only by reporting on simple file attributes, but also byproviding detailed metadata and contextual information to enable better information governance andstorage management actions.

Position and Adoption Speed Justification: FA is an emerging technology that assists organizationsin understanding the evergrowing repository of unstructured data, including file shares, emaildatabases, SharePoint, etc. Metadata reports include data owner, location, duplicate copies, size, lastaccessed or modified, file types, and custom metadata. Progressive and costconscious organizationsare moving past "throw more disk" at their storage problems and realizing they need a betterunderstanding of their data. The desire to optimize storage costs, implement information governanceand mitigate business risks (including security and privacy risks) are among the key factors in theadoption of FA. The determination of file ownership and the enablement of moreaccurate chargebackare also made available with FA, which can benefit all verticals.

User Advice: Organizations should use FA to gain a true understanding of their unstructured data,where it resides and who has access to it. Data visualization maps created by FA can be presented toother parts of the organization and be used to better identify the value and risk of the data, enabling IT,line of business, compliance, etc., to make moreinformed decisions regarding classification, informationgovernance, storage management and content migration. Once known, redundant, outdated and trivialdata can be defensibly deleted, and retention policies can be applied to other data.

Business Impact: FA tools reduce risk by identifying which files reside where and who has access tothem, allowing remediation in such areas as eliminating personally identifiable information, corrallingand controlling intellectual property, and finding and eliminating redundant and outdated data that maylead to business difficulties, such as multiple copies of a contract. FA shrinks costs by reducing theamount of data stored. It also classifies valuable business data so that it can be more easily found andleveraged, and it supports ediscovery efforts for legal and regulatory investigations.



Maturity: Emerging

Sample Vendors: Acaveo; Active Navigation; Aptare; CommVault Systems; EMC; HPAutonomy; IBM



StoredIQ; Index Engines; Novell; NTP Software; Nuix; Proofpoint; SGI; STEALTHbits Technologies;Symantec; Varonis; Whitebox Security; ZyLAB


"Innovation Insight: File Analysis Delivers an Understanding of Unstructured Dark Data"

"Does File Analysis Have a Role in Your Data Management Strategy?"

"Best Practices for Data Retention and Policy Creation Will Lower Costs and Reduce Risks"

"Best Practices for Storage Administrators: Staying Relevant in an InformationCentric Data Center"

"Use These Unstructured Data Management Best Practices to Manage Based on the Time Value of Data"

CloudBased Data Identification and Enrichment ServicesAnalysis By: Bill O'Kane

Definition: Cloudbased data enrichment services are provided by trusted information providers thatare callable through Web service interfaces in real time at record level. They enable organizations toidentify customers, products and suppliers better and enrich the data they hold on those entities interms of structuring into hierarchies and adding additional attributes.

Position and Adoption Speed Justification: Data identification and enrichment services have beenavailable for more than 20 years in the form of batch interfaces, where an organization sends a batch ofcustomer or other records to a marketing service provider (MSP) or data service provider (DSP), suchas Axciom, Dun & Bradstreet (D&B) and Experian.

These providers then clean and standardize the data, match it against their own internal data universeand enrich the organization's data with some combination of identifiers (for example, D&B's DataUniversal Numbering System [DUNS]), hierarchies (such as legal entity hierarchies) and additionalattributes (such as demographic data). MSPs and DSPs have also had limited realtime capabilities for anumber of years, but these products were not widely adopted.

With the general rise in cloudbased services, there is a new focus on the provision of cloudbased dataidentification and enrichment services. In this model, the identification and enrichment process is acallable service that is performed in real time, on a perrecord basis. It is paid for either as asubscription or on an ondemand basis. In addition, there are new developments in how elements ofthe trusted data universes are being created, for example, through crowdsourcing in the Jigsawelement of Data.com (from salesforce.com), as well as the inclusion of social data.

This focus on cloudbased identification and enrichment services or data as a service started in 2010and is still gaining maturity, driven in part by the increasing adoption of master data management,which allows organizations to more accurately match their internal data to that provided by theseservices. The most visible service providers on the customer data side are D&B (with its D&B360service), Acxiom and salesforce.com (with its Data.com service, which includes Jigsaw and D&B360).D&B now has partnerships with the leading customer relationship management vendors: Microsoft,Oracle, salesforce.com and SAP.

There are also a number of smaller specialists, including Fliptop, Loqate (which focuses on identificationof addresses and locations), NetProspex, Rapleaf and WorldCheck (which provide access to watchlistdata). In the product data area, 1Sync is moving toward businesstoconsumer public APIs for itsproduct data.

User Advice: If you have a business need for realtime validation and enrichment of customer masterdata against a trusted data source, then investigate these services. You will need to check out thematurity of different services, as well as the accuracy of the data provided, and you may find that youneed a combination of multiple services to meet your requirements.

You should also check out what this will cost you and the contractual terms and conditions. If you givehundreds of salespeople access to these services, either directly or indirectly via composite applications,and it is being charged on a perrecord basis, you need to know what the running costs will be andwhether you can afford it. On the question of contractual conditions, you need to know what happens tothe data if you let the subscription to the service lapse, because you may be legally required to deleteit.

Business Impact: This type of service provides the ability to enrich and correct key master data at thepoint of creation. It should lead to better and higher data quality and richer datasets for better businessprocesses, including improved sales performance and customer experiences. Cloudbased dataidentification and enrichment services will be key products in cloud services brokerage ecosystems andwill be employed by integration and customization brokerages to add value to cloudcentric solutions.


Market Penetration: 5% to 20% of target audience


Sample Vendors: 1Sync; Acxiom; Dun & Bradstreet; Experian; salesforce.com (Data.com)

MDM AppletsAnalysis By: Bill O'Kane



Definition: Master data management (MDM) applets are callable, modular components of businessand/or presentation logic provided by the MDM solution vendor. They can be embedded in businessapplications, often operating in the context of the application's UI style, and are used to access andmanipulate the master data in the organization's MDM solution.

Position and Adoption Speed Justification: MDM applets are technically still at an emerging stage,with vendors such as Informatica, IBM and Oracle originally introducing these capabilities in 2010.Although these were initially freely available with several MDM vendor solutions, recognition of thevalue of MDM applets is increasingly leading to their pricing as individual components. Once there is anestablished MDM solution, MDM applets are relatively easy to deploy from a technical perspective, andwe believe that they will be increasingly adopted and mature once organizations grasp their inherentvalue and become proficient at integrating them into their environments.

The position continues to remain unchanged because, although the technology has fully emerged,vendors continue to struggle with commercializing this concept through educating potential buyers onthe value proposition. This dissonance has also inhibited major marketing by the vendors that havedeveloped these offerings.

Following the initial excitement around this capability, vendors have not actively pursued widespreadadoption. We believe that this is partially because many large clients with requirements for MDMapplets have built them internally already or have very specific functional needs.

Additionally, some MDM vendors do not fully understand business application environments, so they donot always grasp the potential for technology like this. On this basis, we continue to hold the positionon the Hype Cycle until more progress in this area is realized. The hype around this technology willincrease and eventually peak as MDM itself becomes more widely adopted, both by large and midsizeenterprises.

User Advice: Use MDM applets to reinvigorate existing applications that support important businessprocesses with data and/or services provided by the MDM solution at the center of an organization'sinformation infrastructure. Use MDM applets for readonly data exploration purposes; for example,enabling users to determine whether the business entity, such as a customer or product, being inquiredabout is already known to the business or is a new customer or product. They may also be employed toexplore business entity hierarchies and/or attributes together with the lineage and change history. Inaddition, use MDM applets to integrate master data into the flow of the operational applications andbusiness processes and to allow the (trusted) calling application to update the master data with what islikely to be the more current data available in the context of the current transaction being executed.

Business Impact: MDM applets enable business users across the organization, otherwise bound tolegacy applications, access to qualified and governed data as part of an MDM solution. They are a greatway to expose qualified and governed master data to remote locations, to distributed or independentbusiness users or units that can drive value from improved decision making or process effectivenessthrough visibility to an organization's master data. MDM applets effectively provide a costeffective wayto "renovate" established processes without requiring wholesale changes in legacy applications(depending on integration requirements).

There remains the issue of user behavior. Just offering an MDM applet for use by a customer servicerepresentative does not mean, for example, that the rep will take the extra time to check to see if thephysical client is a new customer. So challenges remain, as with MDM, but for those environmentswhere users explore and drive their business based on qualified information, the benefits will bemoderate — but at only incremental cost. The ROI will potentially be very attractive. These benefits willbe limited by the design and flexibility of the core MDM solution.



Maturity: Emerging

Sample Vendors: IBM; Informatica; Oracle; Software AG


"The Five Vectors of Complexity That Define Your MDM Strategy"

"The Important Characteristics of the MDM Implementation Style"

"Establishing Milestones to Optimize MDM Time to Value"

Graph Database Management ServicesAnalysis By: Nick Heudecker

Definition: Graph database management systems (DBMSs) represent relationships among entities andsupport complex network traversal operations. Most graph databases use basic graph theory, makingthem suitable for generalpurpose use cases, such as processing complex manytomany connectionslike those found in social networks. Other systems use triplets or network databases for morespecialized applications.

Position and Adoption Speed Justification: Despite their maturity as an emerging technology,graph DBMSs are positioned early on the Hype Cycle due to relative early adoption in the market.Graph DBMSs are only just being adopted by enterprises for their relationshipbased data models. Thegraph data model lends itself to representing rich domains, such as financial fraud detection,



telecommunications network analysis and master data management.

There are currently opensource and commercially licensed graph DBMSs. Some mainstream vendors(such as IBM and Oracle) also include graph analytics or capabilities in their DBMS products. Manygraph DBMSs are triplestores that use resource description framework (RDF) data model approaches— but they do not have to be triplestores. This debate over the best approach to graph databases inthe market will resolve over the next five to seven years, and lengthen the time for the solution toreach the Plateau of Productivity.

Some graph DBMSs also include layered functionality and specific features for data science or advancedanalytics tasks (for example, the R2DF framework, which utilizes RDF weights to the path ranking).Some frontend graph analytics leverage configurable storage to create a graph data model (forexample, OQGraph, which can use MySQL or MariaDB for storage), while others run as embeddedservers (like OpenLink's Virtuoso).

Much of the hype around graph DBMSs revolves around ad hoc discovery of relationships. However, themajority of graph use cases use data where relationships are already defined. Graph databases cannavigate these relationships more efficiently than their relational counterparts.

User Advice: The core advantage of graph DBMSs are the relational data models they support. Thesedata models allow users to describe dynamic and complex domains more naturally and efficiently thanis possible in a relational database management system (RDBMS). Assess graph DBMS capabilities ifRDBMS performance for highly nested or relational data falls outside of the SLA.

Business Impact: The overall impact of graph DBMSs is moderate. While graph DBMSs represent asubstantial shift in how information is organized and used, this radical change will also slow adoptionuntil industryspecific use cases emerge and skills become readily available in the market.



Maturity: Emerging

Sample Vendors: IBM; Neo Technology; Objectivity; Oracle; YarcData


"A Tour of NoSQL in Eight Use Cases"

"Cool Vendors in DBMS, 2014"

"Who's Who in NoSQL DBMSs"

Multivector MDM SolutionsAnalysis By: Bill O'Kane; Saul Judah; Andrew White

Definition: Multivector master data management (MDM) solutions provide an integrated set of facilitiesfor ensuring the uniformity, accuracy, stewardship, governance, semantic consistency andaccountability of an enterprise's official shared master data assets. These meet the needs of thebusiness across all vectors of MDM complexity including industries, data domains, use cases,organizational structures and implementation styles.

Position and Adoption Speed Justification: Multivector MDM solutions provide support for all fivevectors of MDM complexity:

Industries — for example, productcentric industries, service industries and government

MDM data domains — for example, customer, supplier, partner, location, product, item, material,asset, ledger, account, person and employee

MDM use cases — for example, design/construction, operational and analytical

Organizational structures — for example, centralized, federated and localized organizations

MDM implementation styles — for example, registry, consolidation, coexistence and centralized

Multivector MDM solutions contain comprehensive facilities for data modeling, data quality, datastewardship, data governance, data services and data integration in workflow and transactional usagescenarios. Also, they provide high levels of scalability, availability, manageability and security.

Comprehensive, integrated multivector MDM solutions are still new and very immature. Most vendorsthat seek to address this segment really only offer multidomain MDM offerings with some limitedcapability to support parts of the other vectors with the same solution. Or, vendors offer multiple MDMofferings in an attempt to meet as many multivector needs as possible.

Vendors may advertise a convergence road map, but they are several years away from comprehensivemultivector MDM coverage in either a single or integrated suite of mature products. Other vendorsclaim to do everything in a single MDM product. However, on closer inspection, they fall short in termsof the range of use cases, data domains, industries or implementation styles they comprehensivelysupport.

For this reason, progress is slow, so we have reassessed the position of multivector MDM. We havemoved it forward slightly again in 2014 because hype is continuing to increase (occasionally underother names, such as "multistyle MDM"), though its time to plateau remains long. Our view is that this



gives a better view of the inherent challenges in achieving this capability, the current state of progressand the time that it will take for vendors' capabilities to mature.

This technology represents the likely "end state" for MDM solutions. However, it sometimes getsconfused with "multidomain MDM" because a multidomain capability is often wrongly depicted as theend state — this ignores the need to additionally support multiple industries, use cases, organizationalstructures and implementation styles.

User Advice: Comprehensive multivector MDM solutions are still at the emerging stage as firstgeneration (singledomain MDM) and secondgeneration (multidomain) MDM offerings mature andevolve further. Conceptually, multivector MDM solutions should satisfy the overall needs of your MDMvision, but practically, your shortterm and midterm MDM needs will only be met by leveraging acombination of first and secondgeneration MDM solutions on the market.

Create an MDM vision and strategy that aligns with and enables your organization's business vision andstrategy. Ensure that the vision meets your organization's longterm MDM needs in terms of includingall necessary use cases, data domains, industryspecific requirements, organizational structures andimplementation styles.

Perform a gap analysis between your longterm MDM requirements and the capabilities and road mapof MDM vendors that provide a good fit for your MDM requirements. Balance criteria — such as longterm viability and fit with wider information management and application strategies — against MDMcapabilities. If you are comfortable with the implications, then invest on the basis that your chosenmultivector MDM solution vendor will improve the consistency, breadth and depth of its MDM offeringsover time.

If necessary, invest in additional MDM products and vendors on a tactical basis to fill the gaps that needbusiness solutions in a time period within which your multivector MDM solution vendor cannot deliver. Itis a good working assumption that, for most large and/or complex enterprises, two or more MDMsolutions will be needed during the next three or four years, until vendors achieve the level of maturitynecessary to meet customer requirements with one integrated solution.

Business Impact: The successful implementation of a comprehensive multivector MDM solution inlarge, complex organizations with fragmented and inconsistent master data will potentially create atransformational business impact because it provides for sufficient organizational and technicalfunctionality and flexibility to fully optimize an enterprise's master data assets.

As with single and multidomain MDM implementations, the benefits realized will generally be in termsof improved growth in revenue and profits, cost optimization and efficiency — as well as riskmanagement and regulatory compliance. However, it is quite likely that benefits will increasesignificantly over time as organizations migrate from first and secondgeneration MDM offerings tomultivector MDM solutions.

Benefit Rating: Transformational


Maturity: Emerging

Sample Vendors: IBM; Informatica; Oracle; SAP; Tibco Software


"MDM Products Remain Immature in Managing Multiple Master Data Domains"


"The Seven Building Blocks of MDM: A Framework for Success"

"The Important Characteristics of the MDM Implementation Style"

At the PeakMDM Professional ServicesAnalysis By: Bill O'Kane

Definition: Master data management (MDM) professional services provide strategic, tactical andoperational support to organizations engaging in MDM programs. These services include the formulationof information strategies, business cases, road maps, business and technical metrics, governanceorganizations and processes, organizational recommendations, information life cycle discovery andplanning, and technology selection. These services may also be used to facilitate the installation andconfiguration of MDM software.

Position and Adoption Speed Justification: External service providers (ESPs) — large and small —have rapidly entered the MDM professional services market as its engagements can be quite lucrative,depending on how much exposure a client is willing to have to a single ESP in the context of its MDMprogram. Anecdotally, we estimate that the total market size for MDM professional services is betweendouble and triple that of the MDM software market, even at this early stage, and it continues to movesteadily forward on the Hype Cycle curve.

For the past two years, our annual MDM Summit surveys suggest that between 40% and 50% ofimplementations engage with a thirdparty ESP; the rest rely primarily on, or use to some extent, the



professional services component within MDM software vendor organizations. Vendors — which range insize from the "big four" consulting firms to boutique outfits specializing in one industry, region, orsector — are now fully engaged in the market with capabilities that are essential to large enterpriseslooking to achieve "critical mass" in one or more data domains and within a single budget cycle.

MDM professional services engagements focused on strategy typically involve a relatively small group(in comparison with the technical team supporting the actual MDM implementation) of highlyexperienced senior data strategists. Many of the smaller boutique MDM consulting firms will view theinitial phases of the strategy engagement as their opportunity to prove their ability to execute duringthe later phases. Although verification and other retroactive assessment activities will certainly be ofinterest to them, smaller firms tend to avoid the actual implementation of MDM software, leaving asignificant opportunity for the larger ESPs.

User Advice: Regardless of the size of the ESPs under consideration, clients should give preference tothose firms with several verifiable references from organizations with similar operational environmentsand challenges and, if possible, within the same industry. In the case of MDM implementationengagements, clients should be sure to determine the degree of expertise that each candidate companyhas in related implementation types, such as ERP if operational MDM is being implemented, or datawarehousing if the use case is more analytical.

Clients should also require named resources in all proposed statements of work, and should thoroughlycheck the credentials of those resources. Additionally, great care should be taken not to reduce thestrategy development phases, whether resourced in house or externally, in order to fund theimplementation phases for MDM.

In addition, clients are strongly advised to separate the strategy portion of the MDM program(including, but not limited to, technology selection) from the implementation phase. In fact, it is oftenpreferable to engage a strategy provider that does not perform implementations, unless the client'sindustry or geographical profile makes this impractical. The relative immaturity of the MDM consultingmarket, coupled with the large number of tools and approaches available to address these programs,has caused many (if not most) MDM strategy firms that also perform implementations to specialize inone or two MDM software vendors' offerings. This bias in implementation expertise can easily taint themanner in which a firm approaches an overall MDM strategy.

Business Impact: Engaging the expertise of an ESP with significant levels of MDM strategy ortechnology implementation experience in your industry and master data domains can be an invaluableasset for ensuring the success of these critical efforts. In an MDM strategy engagement, the levels ofexperience that these firms generally possess in specialized areas can significantly accelerate an MDMprogram and reduce implementation risks. These areas include engaging the business for funding andstarting the data governance organization and process, as well as innovative methods of identifyingprocess pain points relating to master data and the formulation of metrics around them to prove theongoing value of a program.

From a technology implementation perspective, engaging the expertise of an ESP with significant levelsof MDM implementation experience in your selected technology solution and master data domains canbe invaluable for ensuring the success of these critical efforts, including their successful alignment withinternal technical disciplines such as those for business continuity planning and SLAs. These MDMstrategy and implementation skill sets can also be transferred to internal client members of both theMDM and information governance teams, bringing modern and tested techniques for informationmanagement to enterprises that likely did not possess them in the past.




Sample Vendors: Accenture; ByteManagers; Capgemini; Cognizant; Datum; Deloitte; HighPointSolutions; HP; Hub Solution Designs; IBM Global Business Services; InfoTrellis; PwC; Tata ConsultancyServices (TCS); Wipro


"Best Practices in MDM: How to Evaluate and Engage External Service Providers"

"Market Guide: External Service Providers for Master Data Management"



"A Master Data Management Initiative Needs Effective Program Management"

Information Capabilities FrameworkAnalysis By: Ted Friedman

Definition: The Gartner Information Capabilities Framework (ICF) is a conceptual model that describesthe set of information management technology capabilities needed to define, organize, integrate, shareand govern an organization's information assets in an applicationindependent manner to support itsenterprise information management goals.

Position and Adoption Speed Justification: Organizations increasingly attempt to move from



architectures of tightly coupled systems and selfcontained applications to modular softwarecomponents, reusable services and multipurpose content. These transitions expose informationmanagement infrastructure vulnerabilities, such as poor data quality, lack of metadata transparency,unknown or conflicting semantics, inconsistent business intelligence and analytics, conflicting masterdata and the lack of an integrated view across the content continuum. Organizations have technologiesand processes to address such challenges, but they are scattered throughout the organization.

For most enterprises, the current approaches to information management technology areheterogeneous and complex, often with information silos affecting data sources, databases andapplication environments, as well as legacy data. At the technological heart of new approaches toinformation management is an information environment, a series of codependent services,repositories, tools and metadata management that enable the description, organization, integration,sharing and governing of all types of information in an applicationneutral way, giving users theinformation and tools they need for a specific use case through improved reuse and consistency.

Innovators are aware that the optimal path to adding capacity and capabilities is no longer through thesimple addition of storage, applications and databases, without consideration of how the information willmove throughout the supporting infrastructure and a sense of interlocking and interactive managementservices.

There is now a focus on more transparency and optimization (via rich metadata capabilities), as well asstandardization and reusability of functions commonly required across informationintensive use cases.However, through 2015, 85% of enterprises will fail to adapt their infrastructures for informationmanagement to align with these ideals. Emerging architectures such as the logical data warehouserequire application of the principles supported by the ICF model — and earlystage adoption of theseprinciples is already happening.

User Advice: Organizations must rethink their approaches to delivering information managementinfrastructure, with a focus on capabilities that are required across multiple use cases and independentof specific applications and physical representations of data. By viewing information as a strategic asseton a par with applications and business processes, they can develop stronger competencies in thegovernance of information assets and derive greater value from their use, while also increasingconsistency, shareability and reuse.

The ICF is a vision for how these goals can be achieved. Organizations should begin to work toward thisby identifying opportunities to align and standardize various information management capabilities insupport of closely related initiatives, while also filling in capability gaps that may already exist in theirenvironment.

The ICF concept does not dictate specific architectural approaches, implementation tactics, or tools andtechnologies. Rather, organizations should use it as a guiding description of the set of capabilities that,when properly aligned and integrated, can enable the fulfillment of EIM principles. It covers:

Management of information in an applicationindependent manner

Provision of type and sourceneutral views of, and interaction with, information assets

Support for a range of use cases, and consistency of capabilities across them

Enablement of consistent reuse, sharing and governance of information for exponential increasesin value

Appropriate costbenefit choices when deploying information management technologies

The ICF addresses critical components of information management. Organizations should adopt itsprinciples to promote a better understanding of the meaning and value of information assets, and toexpose and share them in a variety of formats and contexts in conformance with informationgovernance policies.

In effect, organizations should adopt the ICF's concepts as a vision for how to fulfill their informationmanagement infrastructure requirements in a strategic manner.

Business Impact: Organizations will move toward the vision articulated by the ICF at different speedsand in different ways. However, evolution toward a cohesive information infrastructure is inevitable.Enterprises are beginning to recognize that information management technologies should beapproached as a coherent set of capabilities that operate on the enterprise's information assets. Gartnerbelieves that, through 2015, organizations integrating highvalue and diverse new information typesand sources into a coherent information management infrastructure will outperform their industry peersfinancially by more than 20%.

Furthermore, the gap between leading organizations in information management practices and otherswill expand rapidly. Those failing to adopt the principles articulated in ICF concepts will continue to fallbehind in terms of the extra cost, productivity drain, and lack of ability associated with outmodedinformation infrastructure. This gap's increase will ensure the eventual dominance of topperformingorganizations.

Organizations that apply this approach can capture a range of specific benefits, such as:

Enabling business growth by improving the timeliness and quality of their decision making throughaccess to a more comprehensive set of information sources

Improving the agility of their processes for introducing new contextaware products and services

Improving their ability to predict new opportunities or challenges through pattern seeking,matching and discovery

Reducing/managing risk by improving their compliance with regulations and policies through



improved information quality and governance

Reducing the cost of storing, locating and integrating information across the informationcontinuum



Maturity: Emerging


"Introduction to Gartner's Information Capabilities Framework"

"New Information Use Cases Combine Analytics, Content Management and a Modern Approach toInformation Infrastructure"

"How to Use (and Not Use) Gartner's Information Capabilities Framework"

"Predicts 2014: Why You Should Modernize Your Information Infrastructure"

Information Semantic ServicesAnalysis By: Mark A. Beyer; Frank Buytendijk

Definition: Information semantic styles are programming code representations of agreements for howto govern the interdependence between application flows and repositories. Information semanticservices convert those agreements from embedded code to callable services, which include taxonomicand ontological recognition of how a business process uses data.

Position and Adoption Speed Justification: Information semantic services depend significantly onmetadata becoming much more functional (see "Defining the Scope of Metadata Management for theInformation Capabilities Framework"). Information semantic services use many different types ofmetadata to facilitate the orchestration of various services that run throughout the informationinfrastructure.

Processing for different data types is often embedded within an application, frequently within the singlecontext of the single application involved. In other cases, independent services, such as data qualityservices for names and addresses and XML parsers, run outside applications, accepting inputs in variousforms and providing output in similar various forms.

There are many levels of semantics: format, content, use case definition, interfacing, redefinition andso on. In this practice, finegrained services are built, catalogued and then deployed in multipartorchestrations. It is the abstraction level that enables the various functions required to manage data tobe isolated in these finegrained services.

It is the level of registration, sequencing, prioritization and variable pathways that make thedeployment of this services approach so complex. This complexity often results in architects deciding tocreate services that are less abstract and tuned more specifically to data management functions withina welldefined context (for example, ERP services that manage master data based on the supportedprocesses or "somewhere specific," rather than the abstracted principles of using assets or participants"almost anywhere").

Serviceenabled application code (which is closer to modular code running on an application server thanan actual orchestration of freestanding services) will persist for a long time. We refer to this as adedicated semantic style. Information semantic services represent a more fluid approach, one thatrequires significantly abstract design to achieve; it represents the extreme architectural approach(orchestration) at the opposite end of the spectrum from dedicated semantic styles in Gartner'sInformation Capabilities Framework (see "The Information Capabilities Framework: An Aligned Visionfor Information Infrastructure").

Adoption of semantic services has increased slightly, driven by the increased acceptance of standardssuch as the Resource Description Framework (RDF) and Web Ontology Language (OWL) and theadoption of semantic technologies in areas such as social media, search, business processmanagement, analytics, security, content management and information management. In addition, theintroduction of semantic interpreters will encourage extensibility to include big data.

Significant barriers to this more complex and flexible style include legacy applications, issues withinformation abstraction (such as ontology and taxonomy resolution) and a reluctance by organizationsto adopt formal business process modeling (to demonstrate reusable application flows). Importantly, weemphasize more business benefits this year as a fresh perspective on the technology, which requires a"demystification" of this largely metadata function.

User Advice: Organizations should not try to implement sweeping replacements of legacy systemswith loosely coupled, independent services in their information management architectures. Rather, theyshould pursue a targeted approach of experimentation with each of the six semantic styles and withcombinations of them.

For now, continue to build the data service layer with an orientation toward the file management,structured and unstructured repositories and message queues. Proper abstraction of information assetsand the appropriate management of metadata will enable a more flexible future architecture.

Pursue a more formal business process design and documentation standard in the organization to



promote identification of shared application processing flows. When business processes cross, theirinformation and information processing flows also cross one another. The advice here refers toreviewing and modeling business process flows and not enterprise data objects.

Evaluate application development platforms, and data management and integration businessapplications for their ability to share metadata, call external services, the commonality of thedeveloper's interface, and their capabilities for specifying business logic through explicit models, insteadof code. Interoperability of development tools should be a highly rated selection criterion.

Business Impact: A business's ability to model processes and to have toolbased change detectionprocesses in place to inform IT will decrease the time to delivery for new information processingdemands. Similarly, by placing process modeling at the center, the true owner of information assets(the process, not people, applications or databases) begins to push requirements simultaneously to theinformation design and the application design. The result is that the sometimes personal agendas ofindividual business managers will be more easily identified, which may at times be contrary to theirwishes and even their business goals.

Shared application flows also mean that shared ontology is equally important, including the ability toidentify when assumed ontological sharing is incorrect. This will force the business to identify when itsprocesses are attempting to share data, which actually hinders the various processes from collectingnew data points when needed.

This architectural approach and its incumbent design demands help businesses identify gaps in theirknowledge of business processes and inappropriate linking. As such, the design of information assetsbecomes an exercise in business process clarification.

There is a high probability that information semantic services will become embedded in othertechnologies.



Maturity: Emerging

Sample Vendors: Collibra; Software AG


"Information Management in the 21st Century Is About All Kinds of Semantics"

"The Nexus of Forces Is Driving the Adoption of Semantic Technologies, but What Does That Mean?"

"Stop the Madness of Ad Hoc Information Infrastructure!"

"Toolkit: Improve Architectural Decisions With BusinessDriven Information Infrastructure"

iPaaS for Data IntegrationAnalysis By: Eric Thoo

Definition: As a subset of integration platform as a service (iPaaS), data integration functionality isoffered as a cloud service for capabilities such as replication, extraction, transformation and loading.The iPaaS technology for data integration can be used as an alternative to onpremises solutions andcommonly supports cloud service integration. Service providers can leverage iPaaS tooling to build dataintegration solutions to be offered as a service.

Position and Adoption Speed Justification: Data integration initiatives continue to grow inimportance and are generally addressed by onpremises implementations of packaged tools or customcoded solutions. However, information management leaders are beginning to examine data integrationtechnology capabilities that meet "as a service" demands as they diversify their deploymentapproaches. Approximately 7% of participating organizations in a 2Q14 study indicated that iPaaSofferings were used for data integration, an increase compared to approximately 4% in 2013. Dataintegration tools offered as iPaaS are evolving to address common integration tasks involving cloudbased endpoints, especially for organizations with deployment time constraints and limited resources.

They are also reducing the barrier and complexity for tool deployment. In many cases, usage of iPaaSfavors offerings that support both data and application integration within a single toolset. Vendors areincreasingly becoming active in this area by providing purposebuilt cloud offerings to solve specificdata integration problems, or by offering an iPaaS rendition of existing onpremises data integrationtools.

Some data integration tool vendors, though without an iPaaS offering, are offering options for theirsoftware to be deployed in public cloud environments. Midsize organizations — and business analystsoutside of IT in larger organizations — are beginning to adopt these capabilities to move data betweenpopular cloudbased applications and onpremises databases, or to federate views that include clouddata sources. Likewise, providers of SaaS applications are beginning to look toward simple cloudbaseddata integration services as a way to ease the challenge of onboarding new customers; they oftenembed thirdparty or their own iPaaS capabilities in their SaaS offerings. IT groups in largerorganizations are also beginning to look at public and private cloud infrastructures as a way to provisionnonproduction environments — development, test and quality assurance — for their chosen dataintegration tools. Largescale production deployments of iPaaS for data integration remain scarce, whilesustained interest and adoption have continued over the past year. However, as the adoption of cloud



delivery models continues to increase, data integration tool vendors will continue to expand theirofferings of functional capabilities as cloud services to customers.

User Advice: Consider using iPaaS for data integration as an extension of the organization's dataintegration infrastructure to manage cloudrelated data delivery, and to address the growing necessityfor data sharing between SaaS and onpremises applications. Be aware that some offerings may bepurposebuilt to solve data integration problems involving cloud data, where either the source or thetarget (sometimes both) are known and static in terms of structure.

Consider these offerings, too, for usage that does not require significant configuration andcustomization. In supporting data integration workload for provisioning the runtime environment,evaluate iPaaS as a way of accelerating time to integration, minimizing costs and resource requirementsrelative to onpremises models, and simplifying the deployment of data integration infrastructure withan adaptive or incremental approach.

However, recognize that deployments of iPaaS for data integration rarely address the full scope andcomplexity of broader data integration requirements. In addition, user concerns about the reliability ofcloudbased data integration processes (in relation to data loss and the security of data moving acrossorganizational boundaries, for example) will need to be qualified or addressed. Migrating data from onepart of the cloud to another, or into internal applications, may face data integrity issues (includingincomplete or inaccurate data) that aren't addressed in iPaaS. Plan to ensure data integration processesare governed, traceable and reconfigurable enough to support requirements such as legal andregulatory policies.

Business Impact: iPaaS for data integration has potential for most organizations. Early benefits arerelieving some of the common challenges in data integration initiatives, particularly in the areas offlexibility, scalability and cost. Resourceconstrained organizations apply solutions to specific issues,such as the synchronization of data between onpremises and offpremises applications, and thecomposition of integrated views of data sources residing both inside and outside the firewall.Organizations with experience in dealing with SaaS and other cloud services are expanding theirtechniques to create a moredynamic computing environment for data integration workloads (such assimplifying deployment and expanding access to data sources in the cloud).

With increasing interest in cloudbased solutions for data warehousing and analytics, requirements fordata aggregation have the opportunity to leverage iPaaS for data integration involving the access anddelivery of datasets utilizing the cloud environment. However, cloud delivery of data integrationcapabilities will not reduce the necessity for planning, governance, managing risk and compliance, andalignment of comprehensive integration activities across the organization. As cloud infrastructuresmature, iPaaS for data integration will become an increasingly common component of organizations'information management infrastructures, complementing established onpremises platforms or insupport of the approach of a hybrid integration platform.




Sample Vendors: Actian; Attunity; Dell Boomi; Informatica; Jitterbit; MuleSoft; SAP; SnapLogic


"Magic Quadrant for Enterprise Integration Platform as a Service"

"Who's Who in Integration Platform as a Service"

"Data in the Cloud: Harness the Changing Nature of Data Integration"

"How to Use Hybrid Integration Platforms Effectively"

TableStyle Database Management ServicesAnalysis By: Nick Heudecker

Definition: Tablestyle database management systems (DBMSs) store rows of data in tables, makingthem the most conceptually similar to their relational DBMS (RDBMS) counterparts. However, tablestyle DBMSs do not have relationships between rows. Additionally, tablestyle DBMSs support flexibleschema definitions. These traits make tablestyle DBMSs popular for storing semistructured data, likelog or clickstream data.

Position and Adoption Speed Justification: The attributes of tablestyle databases enable them tostore semistructured, or sparse, data in a massively distributed fashion. This makes them ideal forstoring log or timeseries data, highfrequency counters or a search engine back end. Additionally,several tablestyle DBMSs integrate with Apache Hadoop, a popular distributed processing framework.Hadoop can use these databases for access to reference data to support MapReduce tasks, as well as asource or target for processing tasks.

The popularity of tablestyle databases has grown as enterprises collect and process increasingamounts of structured and unstructured data. For example, these databases are ideal stores forcollecting sensor and machine data from multiple sources in several formats. While several tablestyleDBMSs originated from opensource roots, vendors quickly emerged to provide tools, support andtraining. Several of these vendors, such as Cloudera, Hortonworks and MapR Technologies, werealready offering support for Apache Hadoop. Extending their support to Apache HBase, which is built on



several components from Hadoop, was a natural move.

Commercial support for Apache Cassandra, another tablestyle DBMS with opensource origins, issupported by DataStax. Cassandra differs from HBase in a few key ways. Unlike HBase, Cassandra isnot based on Apache Hadoop components. Additionally, it does not have different types of nodes. Allnodes in Cassandra are the same, which greatly simplifies deployment.

In addition to the semistructured data store, tablestyle databases replicate data between nodes. This issupported either natively, in the case of Cassandra, or through the underlying Hadoop Distributed FileSystem (HDFS), in the case of HBase. This allows for durability of data, as well as of the overall cluster,in the event of node failures. Today, tablestyle databases do not offer traditional atomicity,consistency, isolation and durability (ACID) transactions.

Instead, they provide eventual consistency through the optimistic BASE model, which is defined as:

Basic availability

Softstate

Eventual consistency

Instead of enforcing consistency for every operation, BASE enables systems to scale faster by allowingconsistency to be in a state of flux. Relaxing consistency constraints enables systems to be moreavailable to service requests. Applications can specify different levels of consistency on a perinteractionbasis with most tablestyle DBMSs.

Despite their relative maturity when compared with other NoSQL DBMSs, tablestyle DBMSs arecomplicated by a number of components that must be installed, configured and monitored.Comprehensive management tools and operational best practices are emerging from vendors, but thereis a pronounced lack of support from third parties. Vendors continue to add enterprise features aroundsecurity, analytics and operations management.

Inquiries from Gartner clients indicate increased interest in these DBMSs, but with increasing confusionon how and when to use them. However, adoption accelerated in 2013 and early 2014. This wassparked by a need to store large amounts of log, event and sensor data in scalable footprints. Adoptionshould continue to accelerate as use cases are documented and publicized.

User Advice:

The right hardware and deployment environment for tablestyle databases depends on theexpected workload and data volume. Gain an accurate understanding of these factors beforedeployment.

The distributed nature of tablestyle DBMSs puts additional stress on network resources whenhandling read and write requests, as well as when performing replication. It is essential to ensurethat your network has no bottlenecks to efficient operation. Additionally, conflict resolutionrequires time synchronization across the cluster as well as between applications interacting withindividual nodes.

Some solutions allow for tradeoffs to be made between consistency and latency, even on a peroperation basis. Decisions about how these tradeoffs will affect application interaction patternsmust be factored into application deployments.

Although the two prominent tablestyle DBMSs — HBase and Cassandra — are open source, usersof these technologies should engage with recognized vendors before moving to the productionstage.

Business Impact: The current business impact of tablestyle DBMS is moderate. The ability to storemassive amounts of semistructured data in a distributed manner addresses several existing andemerging use cases, particularly in support of Internet of Things (IoT) initiatives. Vendors offeringproducts, training and support for both Apache Hadoop and tablestyle databases are driving adoptionand accelerating development. It is uncertain whether this type of NoSQL database will gain muchtraction beyond supplementing big data efforts. This uncertainty increases as vendors of keyvalueDBMSs adopt features from their tablestyle DBMS counterparts.



Maturity: Emerging

Sample Vendors: Cloudera; DataStax; Hortonworks; MapR Technologies



"Does Your NoSQL DBMS Result in Information Governance Debt?"


"Decision Point for Selecting the Right NoSQL Database"

Enterprise Metadata ManagementAnalysis By: Andrew White; Roxane Edjlali



Definition: Gartner defines enterprise metadata management (EMM) as the business discipline forgoverning the important information assets of an organization in support of enterprise informationmanagement (EIM). EMM connects what would otherwise have been siloed information initiatives.Gartner defines metadata as "information that describes various facets of an information asset toimprove its usability throughout its life cycle" (see "Gartner Clarifies the Definition of Metadata").

Position and Adoption Speed Justification: Most organizations find that the greatest challenge inusing EMM across the organization is dealing with the cultural and political issues involved in thesharing of metadata and information assets. It can be a real drain on analytical and enduser resourcesto set up and maintain an EMM program — resources that might be better used to address otherbusiness opportunities and threats. Therefore, justifiable ROI metrics are necessary for whatever EMMimplementation strategy is selected. That is why for most organizations, this discipline will move slowlyon the Hype Cycle, rather than speed its way to the Plateau of Productivity.

Repositories can be used to publish reusable assets (such as application and data services) and browsemetadata during life cycle activities (design, testing, release management and so on). However,implementing the technologies capable of managing the enterprisewide variety, volume, velocity andcomplexity of metadata about vital information assets can be costprohibitive for most organizations —generally far more than $1 million.

New technology innovations, such as Informatica's newly announced intelligent data platform, arecreating new interest in linking information silos to improve the value of informationbased businessoutcomes. As such, EMM hype is again increasing. These innovations are bringing with them a greaterneed to govern information assets across multiple information management investments, creating newdemand for EMM and EMMenabled systems. The increasing demand is related to factors such asmoving or accessing information stores and business applications in the cloud, analyzing socialinformation inmemory, and integrating wholly new mobile application layers.

Such factors are creating pressures to reduce complexity in the information infrastructure. This willdrive the need to eliminate the semantic differences between legacy and new information sources, andso governing and enforcing some level of stewardship in those shared information assets is againgaining momentum. New interest from vendors to help end users address the needs of EMM, due inpart to the hype related to big data and inmemory computing, has forced the hype for EMM to reverseslightly. A second wave of hype is thus ramping up, more than in the original cycle.

The emergence of initiatives such as MDM and information governance will be needed to justify therelated necessary investments in EMM. Increasingly, the need for EMM is being identified as a way torespond to shortterm reporting and privacy compliance mandates from government agencies.

User Advice: Only explore EMM when you have multiple, disparate information management programs(each with their own metadata and management) that are neither aligned nor leveraging consistentinformation between them. Use EMM to help govern the metadata and information assets betweenthese programs (see "Defining the Scope of Metadata Management for the Information CapabilitiesFramework," "Metadata Management Is Critical to MDM's LongTerm Success" and "Understanding theLogical Data Warehouse: The Emerging Practice").

EMM is only sought when the organization needs to align its information management programs into amore mature EIM framework. For example, you will manage metadata in support of a data warehousesupporting business intelligence. You will also manage metadata in context to an operational masterdata hub in support of application integration. You will have other uses of metadata for many differentinformation hubs. If the goal is to align the information across these elements, use EMM to govern theshared metadata. You can start with one "connector" and grow the "connections" between the efforts asneeded, over time.

As described in "CIO Critical Capabilities: Metadata Made Simple," one of the key responsibilities of aCIO is ensuring the effective management and leveraged use of the organization's information assetsvia EMM. Usually, the CIO delegates the implementation and management of the discipline to others,such as information managers, enterprise architects, and those responsible for governance, risk, andcompliance (GRC).

For many, the best place to start is to identify and publish which key information assets and metadataare currently being managed by primary stakeholders and shared with others (see "The Eight CommonSources of Metadata" and "Six Common Approaches to Metadata Federation and Consolidation").

There needs to be an EMM strategy or plan for how to improve the situation by leveraging otherplanned initiatives — which may involve the participation of individuals from different organizationalunits. The plan should include those in roles related to information management, enterprisearchitecture, business process management (BPM), serviceoriented architecture (SOA) and GRC, alongwith other business stakeholders.

Business Impact: EMM helps extend the benefits of other investments, such as MDM, BPM and SOA,by supporting the reconciled semantics in the information sources used by those programs. Metadatamanagement will have taken place within those programs individually, but EMM is needed to achievesynergy across them, by reconciling the semantics and governing the resulting enterprise metadata andmodel. Equally, when managing and governing information assets within the context of a pacelayeredstrategy, EMM is used to align the semantics across those layers (see "Metadata Management for PaceLayering").

To sustain an EMM program, you will need to account for people and process issues as well astechnology issues and choices, including those related to identifying the best metadata to use (see"How to Tell Which Metadata Is Valuable" and "Toolkit: Calculate the Value of Your Metadata"), theviability of the technology housing the metadata (see "Decision Framework for Evaluating Metadata



Repositories" and " "Toolkit: Sample RFI and Vendor Rating Spreadsheet for Evaluating MetadataRepositories"), and approaches to federating or consolidating metadata across technologies.

Many service providers perform training and consulting in the discipline of EMM, but we have chosen tolist vendors (see the Sample Vendors section) selling EMMenabling software (specifically, metadatarepositories). Not only do they provide training and consulting, but they do so in a more coordinatedand customized way with their EMM tools. Many also have large user groups with years of pragmaticexperience in successfully implementing the EMM discipline.



Maturity: Emerging

Sample Vendors: Adaptive; ASG Software Solutions; Data Advantage Group; IBM; Informatica;Oracle; SAP; Software AG


"Gartner Clarifies the Definition of Metadata"

"How Metadata Improves Business Opportunities and Threats"

"Metadata Management for Pace Layering"

"The Eight Common Sources of Metadata"

"Six Common Approaches to Metadata Federation and Consolidation"

"Defining the Scope of Metadata Management for the Information Capabilities Framework"

"Information Management in the 21st Century Is About All Kinds of Semantics"

"Decision Framework for Evaluating Metadata Repositories"

"Toolkit: Sample RFI and Vendor Rating Spreadsheet for Evaluating Metadata Repositories"

"How to Tell Which Metadata Is Valuable"

"Toolkit: Calculate the Value of Your Metadata"

"Metadata Management Is Critical to MDM's LongTerm Success"

"Magic Quadrant for Enterprise Architecture Tools"

"Gartner Assessment of Enterprise Architecture Tool Capabilities"

"Metadata Will Improve the Return on Your Video Investments"

Hadoop SQL InterfacesAnalysis By: Nick Heudecker

Definition: SQL interfaces for the popular Apache Hadoop framework allow enterprises to interact withHadoop clusters using familiar SQL syntax. This interface also allows a range of data management andanalytics tools to morereadily access data stored in Hadoop.

Position and Adoption Speed Justification: Data stored in Hadoop is commonly accessed usingMapReduce. This APIbased approach requires a programmer to write application code to execute tasksagainst a Hadoop cluster. While effective for data manipulation tasks like extraction, transformation andloading (ETL), this method has proven cumbersome for data exploration and business intelligence (BI).The requirement to write code removes most business analysts and their tools from the pool ofresources able to leverage data stored in Hadoop. Addressing these problems has led numerousvendors to introduce SQL interfaces for Hadoop.

The earliest version of an SQL interface for Hadoop was the opensource Apache Hive. Queries executedthrough Hive (HiveQL) are converted into MapReduce jobs. This conversion and translation steptypically results in poor performance for most applications. These performance problems, as well asHive's limited data types and SQL functionality, have limited adoption. Vendors, realizing the value of abetterperforming SQL interface to Hadoop, reacted quickly to introduce new offerings. Clouderaannounced Impala, a massively parallel processing (MPP) query engine. MapR Technologies continuesits work on Apache Drill, which is designed to support interactive queries against a variety of datastores, but is currently focused on data stored in Hadoop. MapR also supports Impala and Hive.Hortonworks recently announced completion of its "Stinger Initiative" project, aimed at improvingApache Hive performance. IBM offers Big SQL as part of its BigInsights platform. Pivotal HD's HAWQ,effectively a port of mature Greenplum capabilities, is available as part of Pivotal Big Data Suite.Teradata Aster's SQLH provides direct access to Hadoop. Teradata's QueryGrid allows access toHadoop, among other data sources, from Teradata Database.

Development of SQL interfaces for Hadoop has been rapid, but there are still gaps to effectivedeployment. The Apache Hadoop framework lacks a robust data definition language (DDL), whichmakes SQL such a powerful tool in the relational database management system (RDBMS) space.Apache HCatalog, another opensource project, is attempting to meet this demand but this should also



be considered immature.

Driven by hype introduced from multiple vendors, Hadoop SQL interfaces advance to the Peak ofInflated Expectations for 2014. The technologies are still very immature, with incomplete or nascentfeatures. Compatibility with industrystandard Transaction Processing Performance Council (TPC)benchmarks lags. However, the potential impact of this technology, as well as market demand to getmore value from big data, will force it to progress through the Hype Cycle quickly — with a goal ofreaching the Plateau of Productivity within two to five years.

User Advice:

Recognize that SQL interfaces for Hadoop are evolving rapidly and each variant will have differentcapabilities and performance characteristics. Advanced features, like costbased optimizers, arestill early and may not provide substantial performance improvement.

Don't rely on benchmarks as an indicator of performance characteristics. Test SQL queries againstreal data and workloads.

Understand that implementations are unlikely to implement the popular ANSI SQL standards intheir entirety. This may impact integration efforts with thirdparty BI tools and applications.

Realize that vendor lockin is very probable because SQL integration exists at the top of theHadoop stack, which is where vendors traditionally try to add value through distributionspecificextensions.

Business Impact: By making data stored on Hadoop clusters accessible to a broader base of businessanalysts and BI tools, the business impact of SQL interfaces for Hadoop is moderate. The impact willincrease as implementations encompass additional data sources, such as relational databases andfederated data services, but this integration is currently several years away.

Allowing business users to conduct ad hoc analysis and data exploration across several disparate datasources will enable new forms of process improvement and customer intimacy. However, at present,businesses should treat this technology as immature and developing.



Maturity: Emerging

Sample Vendors: Cloudera; Hadapt; Hortonworks; IBM; Simba Technologies

Recommended Reading: "Choosing Your SQL Access Strategy for Hadoop"

Information Stewardship ApplicationsAnalysis By: Andrew White; Debra Logan; Saul Judah

Definition: Information stewardship applications are business solutions used by business users in therole of information steward (those who enforce information governance policies on the informationassets for which they are responsible). These developing solutions represent, for the most part, anamalgam of a number of disparate ITcentric tools already on the market, but organized in such a waythat business users can use them for the few minutes a week it takes for them to "do" informationstewardship.

Position and Adoption Speed Justification: The emergence of consolidated tools (as a fullblownapplication) for information stewardship will soon challenge established implementation efforts, becausethe majority of legacy and new information management (IM) programs lack this level of capability. Thefunctionality spans monitoring, enforcement, and root cause analysis related to issues identified inpolicy violations spanning data quality and consistency, security, privacy, retention and standards. Forexample, too many master data management (MDM) programs never address the needs of monitoringthe performance of the MDM program, and so the requirements for stewardship go unmet, even if MDMis hailed as a "successful implementation."

This puts any information governance program at risk, since the business has no need to drive ongoingimprovement and has no solution to support the operational responsibilities of governance andstewardship. In the long term, perhaps five to 10 years, these solutions will accompany many, if not all,information governance programs and efforts that need to steward information, spanning MDM,enterprise content management (ECM), content management, records management, data warehousing,business intelligence, big data, analytics, application integration, cloud and so on.

The work of information governance is very much based on people, not technology. That being said, theoutput of information governance needs to be operationalized — and enforced day to day — bybusiness people, for business people. That is where information stewardship solutions come in. Theyemerged a few years ago, mostly oriented toward MDM programs. Today there are a wide range ofstewardship dashboard in many areas (records management is one example), but these are reallyforerunners and earlier versions of the more complete stewardship solutions we refer to here. Almost allsolutions covered by this tech profile started out as dashboards, but evolved as business users (i.e.,information stewards) needed and demanded more functionality to do their work.

User Advice: Recognize the general lack of maturity (and the wide range of different capabilities) intechnology offerings related to governing data across multiple hubs and application data stores. Today,most solutions are best suited for ITfocused users (while they need to be consumable by businessusers) and for specific scenarios and business applications (for example, data quality projects related toan application migration).



Some applications focus on stewardship of content (such as those offered by RDS) and others onstructured data (such as those offered by Collibra). For organizations focused on master data, thoseMDM solutions offer rudimentary capabilities, but have led to the greatest interest and hype in this newinformation stewardship technology. For the next two to three years, most information governanceimplementations will focus on tools to manually define and manage governance with limited help fromtechnology vendors across IM systems or the enterprise as a whole. Work with your technologyproviders to help them understand what must be made operational in the tools. If you have need tosteward other data outside an information governance program, tread more carefully, as the lack of aunifying driver such as MDM or ECM could possibly lead to fewer vendor options.

Business Impact: The governance of information is a core component of any enterprise informationmanagement (EIM) discipline. Information governance cannot be sustained and scaled without anoperational information stewardship role and function. At worst, the lack of effective governance willlead to the failure of EIM initiatives. At best, it will result in lowerthandesired benefits; the businesscase for EIM, for example, won't be realized. A successful stewardship routine will lead to sustainableand persistent benefits from programs like EIM, such as increased revenue, lower IT and businesscosts, reduced cycle times (in new product introductions, for example) and increased business agility.



Maturity: Emerging

Sample Vendors: BackOffice Associates; Collibra; IBM; Informatica; RDS; SAP


"How Chief Data Officers Can Help Their Information Stewards"

"Governance of Master Data Starts With the Master Data Life Cycle"

Internet of ThingsAnalysis By: Hung LeHong

Definition: The Internet of Things (IoT) is the network of physical objects that contain embeddedtechnology to communicate and sense or interact with their internal states or the external environment.

Position and Adoption Speed Justification: Enterprises vary widely in their progress with the IoT.At a simple level, adoption can be classified into three categories. But even within an enterprise, therecan be groups that have different levels of progress with the IoT — therefore, the enterprise wouldexhibit a combination of these categories:

Enterprises that already have connected things but want to explore moving to an IoT — Theseenterprises are no strangers to the benefits and management of connected things/assets. Theyare experienced in operational technology, which is an industrial/business internal form of digitalmodernization. However, they are unfamiliar with the new Internetbased, bigdatabased,mobileappbased world. They can be equally optimistic and hesitant to move their assets (and toadd new connected assets) to this unfamiliar Internet world.

Enterprises that are unfamiliar with the IoT, but are exploring and piloting use cases — Most ofthese enterprises are focused on finding the best areas to implement the IoT while trying tounderstand the technology.

Product manufacturers that are exploring connecting their products to provide new value andfunctionality to their customers — It seems that every week there is a new story about aconsumer or industrial product that is now connected. However, the large enterprises often waitand see how the startups are doing before moving forward.

Standardization (data standards, wireless protocols, technologies) is still a challenge to morerapidadoption of the IoT. A wide number of consortiums, standards bodies, associations andgovernment/region policies around the globe are tackling the standards issues. Ironically, with so manyentities each working on their own interests, we expect the lack of standards to remain a problem overthe next three to five years.

In contrast, dropping costs of technology, a larger selection of IoTcapable technology vendors and theease of experimenting continue to push trials, business cases and implementations forward.

Technology architecture for the IoT is evolving from one where the thing/asset contains most of thecomputing resources and data storage to an architecture in which the thing/asset relies on the cloud,smartphone or even the gateway for computing and connectivity capabilities. As the IoT matures, weexpect to see enterprises employ a variety of architectures to meet their needs.

User Advice: Enterprises should pursue these activities to increase their capabilities with the IoT:

CIOs and enterprise architects:

Work on aligning IT with OT resources, processes and people. Success in enterprise IoT is foundedin having these two areas work collaboratively.

Ensure that EA teams are ready to incorporate IoT opportunities and entities at all levels.

Look for standards in areas such as wireless protocols and data integration to make betterinvestments in hardware, software and middleware for the IoT.



Product managers:

Consider having your major products Internetenabled. Experiment and work out the benefits toyou and customers in having your products connected.

Start talking with your partners and seek out new partners to help your enterprise pursue IoTopportunities.

Strategic planners and innovation leads for enterprises with innovation programs:

Experiment and look to other industries as sources for innovative uses of the IoT.

Information management:

Increase your knowledge and capabilities with big data. The IoT will produce two challenges withinformation: volume and velocity. Knowing how to handle large volumes and/or realtime datacosteffectively is a requirement for the IoT.

Information security managers:

Assign one or more individuals on your security team to fully understand the magnitude of howthe IoT will need to be managed and controlled. Have them work with their OT counterparts onsecurity.

Business Impact: The IoT has very broad applications. However, most applications are rooted in fourusage scenarios. The IoT will improve enterprise processes, asset utilization, and products and servicesin one of, or in a combination of, the following ways:

Manage — Connected things can be monitored and optimized. For example, sensors on an assetcan be optimized for maximum performance or for increased yield and up time.

Charge — Connected things can be monetized on a payperuse basis. For example, automobilescan be charged for insurance based on mileage.

Operate — Connected things can be remotely operated, avoiding the need to go onsite. Forexample, field assets such as valves and actuators can be controlled remotely.

Extend — Connected things can be extended with digital services such as content, upgrades andnew functionality. For example, connected healthcare equipment can receive software upgradesthat improve functionality.

These four usage models will provide benefits in the enterprise and consumer markets.



Maturity: Emerging

Sample Vendors: Atos; Axeda; Bosch; Cisco; Eurotech; GE; Honeywell; IBM; Kickstarter; LogMeIn;Microsoft; QNX; Schneider Electric; Siemens


"Uncover Value From the Internet of Things With the Four Fundamental Usage Scenarios"

"The Internet of Things Is Moving to the Mainstream"

"The Information of Things: Why Big Data Will Drive the Value in the Internet of Things"

"Agenda Overview for Operational Technology Alignment With IT, 2013"

Logical Data WarehouseAnalysis By: Mark A. Beyer

Definition: The logical data warehouse (LDW) is a growing data management architecture for analyticsthat combines the strengths of traditional repository warehouses with alternative data management andaccess strategy — specifically federation and distributed processing. It also includes dynamicoptimization approaches and multiple use case support.

Position and Adoption Speed Justification: In early 2014, Gartner clients reported an increasedinterest in the LDW, now between 4% to 9% of data management and integration inquiries — up fromjust over 3% in 2013 — and is now approximately 30% of data warehouse inquiries (see"Understanding the Logical Data Warehouse: The Emerging Practice"). The inquiries are not specific toany vertical industry but more attuned to aggressive analytics and data warehouse modernizationattempts in all verticals. At the same time, as many as 23% of data warehouse implementationsboasting large and aggressive user populations have also deployed data federation to combine big datawith traditional data warehouses or with virtualized source data.

Large and midsize banks and investment services firms were already pursuing the approach but arenow using the nomenclature as well. The big data phenomenon has many rethinking traditionalstrategies, recognizing that a centralized repository approach alone cannot meet the full range ofdiverse requirements for data accessibility, latency and quality. Vendors have incorporated the LDW



naming convention into their marketing and messaging. The LDW stepped backward to the Peak ofInflated Expectations early in 2014 as organizations began to seek practical implementation advice andas some successes "rehyped" the practice.

During the next three years, organizations will begin to encounter the more difficult issues of managingSLAs for each delivery type possible under the LDW. Many will fail to manage the prevalentperformance and availability issues of virtualization and batch distributed processes running on serverclusters external to the warehouse. These difficulties will lead the LDW into the Trough ofDisillusionment.

At the same time, the emergence of the highly publicized "data lake" (which is a programmer's answerto federation) will create a temporary distraction as the "data lake" concept is inappropriately promotedand applied to moderately skilled analysts and casual user use cases. In the middle of 2014, the uptakeof federation/virtualization as a semantic tier and the prevalence of interest in deployment practicesappear to indicate the LDW is becoming a new best practice.

Gartner sees clear indications that the current technologies and expected advances in dynamicmetadatadriven services engines will advance the LDW from the Trough of Disillusionment and ontothe Plateau of Productivity by the beginning of 2019 — a slight modification from previously stated "endof 2018" in previous iterations of this profile. (Note: The Plateau of Productivity implies more than 20%adoption in the market.)

User Advice:

Centralize governance and metadata strategies — information about the data — into a singlesemantic tier, including taxonomic, ontological, performance metrics and servicelevel quantifiermetadata. Do not confuse data centralization with metadata and governance centralization.

Gain experience in managing diverse data and analytic needs by identifying and conducting a pilotfor a single area of analytics that requires a combination of three information access andmanagement approaches — traditional repository approaches, realtime access to operationalsystems or copies of untransformed source data (to pilot virtualization), and an embedded use ofdistributed processing of large datasets, graph analysis of networks of information or contentanalytics.

Reexamine data strategies that are focused on the creation of a central data warehouse todetermine the nearly 80% of use cases' traditional practices are capable of meeting. At the sametime, develop a strategy for combining it with new and emerging business demands for big datastyle data and analytics (see "How to Design and Implement the NextGeneration DataWarehouse").

Assess current queries and analytics to determine how the current system performs (the currentwarehouse, mart or federated views) and what data/information is included. Use the results of thisanalysis to identify when users are leaving the warehouse to obtain data from other informationresources.

Business Impact: The LDW is effectively an evolution and augmentation of data architecturepractices, not a replacement. It reflects the fact that not all analytical, query and reporting needs canbe supported by a traditional centralized repository style data warehouse. It implies that a muchbroader and more inclusive data management solution for analytics is about to emerge.

Benefits of the LDW include:

Elimination (or minimization) of the need to compromise across comprehensive data needs,performance optimization and timetodelivery cycles. The pairing of virtualization and distributedprocesses with traditional data warehouses makes it possible to select the deploymentarchitecture based on the driving servicelevel expectation, instead of defaulting to existingpractices.

The ability to satisfactorily respond to new analytical or reporting demands with short timetodelivery requirements, even if the analytic model and source identification model are in flux. TheLDW allows a large number of datasets to be made available via query tools and applications. Insome verticals, compliance may continue to force centralized data repositories.

Accelerates data warehouse modifications and provides a rapid deployment capability for new datasources that can be matured over time (a term used in the market is "late binding").

Avoiding the traditional challenges of inconsistent, standalone data marts.



Maturity: Emerging

Sample Vendors: Cisco; Cloudera; IBM; Oracle; Pivotal; Teradata


"How to Design and Implement the NextGeneration Data Warehouse"

"The Future of Data Management for Analytics Is the Logical Data Warehouse"

"Understanding the Logical Data Warehouse: The Emerging Practice"

ContextEnriched Services



Analysis By: Gene Phifer

Definition: Contextenriched services are those that combine demographic, psychographic andenvironmental information with other information to proactively offer enriched, situationaware,targeted, personalized and relevant content, functions and experiences. The term denotes services andAPIs that use information about the user and the session to optionally and implicitly finetune thesoftware action to proactively push content to the user at the moment of need, or to suggest productsand services that are most attractive to the user at a specific time.

Position and Adoption Speed Justification: Context enrichment refines the output of services andimproves their relevance. Contextenriched services have been delivered since the early days of portalsin the late 1990s. With the advent of mobile technologies, additional context attributes like locationhave become highly leveraged. And now, with digital marketing, a wide array of contextual attributes isused to target users with advertisements, offers and incentives.

The most recent thrust of contextenriched services are websites, portals and mobile apps that areconsumerfacing — in mobile computing, social computing, identity controls, search and ecommerce —areas in which context is emerging as an element of competitive differentiation.

Enterprisefacing implementations, which use context information to improve productivity and decisionmaking by associates and business partners, have slowly begun to emerge, primarily in offerings fromsmall vendors (see "ContextEnhanced Performance: What, Why and How?"). While personalization isnot a new concept (portals have used a level of personalization for many years), contextenrichedservices extend that model beyond portal frameworks into a multitude of Web and mobile applications.Contextenriched services are typically delivered via proprietary approaches, as an industrystandardcontext delivery architecture (CoDA) has not yet been created.

The focus on big data has created a favorable environment for the development of contextenrichedservices. Many big data use cases are focused on customer experience, and organizations areleveraging a broad range of information about an individual to hyperpersonalize the user experience,creating greater customer intimacy and generating significant revenue lift. Examples include:

Walmart — Its Polaris search engine utilizes social media and semantic search of clickstream datato provide online customers with moretargeted offers (leading to a 10% reduction in shoppingcart abandonment).

VinTank — This website analyzes over 1 million winerelated conversations each day, to predictwhich customers will be interested in specific wines at specific price points, and combines that withlocation information and alerts wineries when a customer who is likely to be interested in theirwines is nearby.

Orbitz — This site utilizes behavioral information from user history and search to developpredictive patterns that would increase hotel bookings by presenting users with hotels that moreclosely match their preferences. This project resulted in the addition of 50,000 hotel bookings perday — a 2.6% increase (see "Orbitz Worldwide Uses Hadoop to Unlock the Business Value of 'BigData'").

The term "contextenriched services" is not popular in the industry vernacular, but its capabilities areubiquitous and pervasive. Contextenriched services have moved slightly forward this year, indicatingthat there is room for more maturity, and more hype. The advent of new locationbased services, suchas beacons, will add a new source of hype. We expect that the continued focus on big data analytics willdrive significant movement in 2016 and beyond.

User Advice: IT leaders in charge of information strategy and big data projects should leveragecontextual elements sourced both internally and externally for their customerfacing projects. Contextisn't limited to customers, however; employeefacing and partnerfacing projects should also beconsidered as targets for contextenriched services. In addition, investigate how you can leveragecontextual services from providers such as Google and Facebook to augment your existing information.

Business Impact: Contextenriched services will be transformational for enterprises that are lookingto increase customer engagement and maximize revenue. In addition, context enrichment is the nextfrontier for business applications, platforms and development tools. The ability to automate theprocessing of context information will serve users by increasing the agility, relevance and precision of ITservices. New vendors that are likely to emerge will specialize in gathering and injecting contextualinformation into business applications. New kinds of business applications — especially those driven byconsumer opportunities — will emerge, because the function of full context awareness may end upbeing revolutionary and disruptive to established practices.




Sample Vendors: Adobe; Apple; Facebook; Google; IBM; Microsoft; Oracle; SAP; Sense Networks


"ContextAware Computing Is the Next Big Opportunity for Mobile Marketing"

"An Application Developer's Perspective on ContextAware Computing"

"Drive Customer Intimacy Using ContextAware Computing"



"The Future of Information Security Is ContextAware and Adaptive"

Document Store Database Management SystemsAnalysis By: Nick Heudecker

Definition: Document store database management systems (DBMSs) contain objects stored in ahierarchical, treelike format. The documents contained within these stores typically lack a formallydefined schema and do not have references to other documents within the collection. Documents arecommonly described as JavaScript Object Notation (JSON) or XML. This allows for easy mapping to Webapplications that must frequently scale dynamically and change rapidly.

Position and Adoption Speed Justification: Interest in document stores has continued to increase.Developers are motivated by perceived productivity gains and ease in scaling for large datasets.Vendors continue to add large customers and projects, demonstrating the viability of the technology inspecific use cases. Vendors are expanding partnerships with traditional enterprise vendors, as well asfellow startups in related areas such as data discovery, analytics and Hadoop vendors.

In addition to removing the requirement for a predefined, fixed schema, document store DBMSstypically do not provide the traditional relational DBMS (RDBMS) notion of transaction atomicity,consistency, isolation and durability (ACID). Instead, these databases achieve their scalability byadopting the optimistic BASE model: basic availability, softstate and eventual consistency. The BASEmodel reflects a need in Webscale applications to be always available while sacrificing some level ofconsistency.

These tradeoffs have not hampered adoption. Technology startups, particularly those in mobile,gaming and advertising, have been quick to use document store DBMSs, and established enterprisesare also adopting the technology for new projects. According to Gartner inquiries, flexible data schemasand application development velocity are cited as primary factors influencing adoption. Secondaryfactors attracting enterprises are global replication capabilities, high performance and developerinterest.

Awareness of the benefits of document stores continues to grow in the enterprise sector, and documentstore DBMS vendors are rounding out their enterprise features. Management tools are improving, asare security, backup and restore capabilities. Traditional vendors and opensource databases are alsoadding support for JSON data types. PostgreSQL introduced a JSON data type in 2013 and Teradataadded support for JSON in version 15 of its DBMS. IBM also added JSON support to DB2 and partneredwith MongoDB to establish data access standards around the MongoDB protocol and query language.Additionally, IBM acquired Cloudant, a distributed document store database platform as a service(dbPaaS). Other vendors will likely add support for JSON throughout 2014.

Document store DBMSs are advancing along the Hype Cycle quickly due to rapid adoption and someinstances of negative hype in the marketplace. Owing to their ease of use, document store DBMSs arefrequently the first NoSQL DBMS enterprises use, regardless of how well they fit a given use case. Thisoften leads to performance problems and replacement of the technology with something more suitable.

User Advice:

For Webscale applications requiring large data stores with high performance, especially when thetransactions are readonly or not complex and can be supported by a nonACID model, the moremature document store DBMSs may be used. For transactions that do not require ACID propertiesand have complex, mixed data types, these databases can be very effective.

The core advantage of document store DBMSs, schema flexibility through the JSON data type, isalready being adopted by traditional database vendors. It will increasingly be possible to get thisadvantage with existing vendors.

Document store DBMSs differ widely and skills are not typically transferrable between products.

Business Impact: The overall impact of document store DBMSs is moderate. In the short term, thetechnology will be generally limited to specialized, largescale Web applications. Additional usecases may emerge as tool support improves with regard to operational concerns (such as backupsand monitoring) and data integration with data warehousing tools.



Maturity: Emerging

Sample Vendors: Cloudant; Couchbase; MarkLogic; MongoDB



"Does Your NoSQL DBMS Result in Information Governance Debt?"


"Decision Point for Selecting the Right NoSQL Database"

ComplexEvent ProcessingAnalysis By: W. Roy Schulte; Nick Heudecker; Zarko Sumic



Definition: Complexevent processing (CEP), sometimes called event stream processing, is acomputing technique in which incoming data about what is happening (event data) is processed as itarrives to generate higherlevel, moreuseful, summary information (complex events). Complex eventsrepresent patterns in the data, and may signify threats or opportunities that require a response fromthe business. One complex event may be the result of calculations performed on a few or on millions ofbase events (input) from one or more event sources.

Position and Adoption Speed Justification: CEP is transformational because it is the only way to getinformation from event streams in real time. It will inevitably be adopted in multiple places withinvirtually every company. However, companies were initially slow to adopt CEP because it is so differentfrom conventional architecture, and many developers are still unfamiliar with it. CEP has moved slightlyfurther past the Peak of Inflated Expectations, but it may take up to 10 more years for it to reach itspotential on the Plateau of Productivity.

CEP has already transformed financial markets — the majority of equity trades are now conductedalgorithmically in milliseconds, offloading work from traders, improving market performance, andchanging the costs and benefits of alternative trading strategies. It is also essential to earthquakedetection, radiation hazard screening, smart electrical grids and realtime locationbased marketing.Fraud detection in banking and credit card processing depends on correlating events across channelsand accounts, and this must be carried out in real time to prevent losses before they occur. CEP is alsoessential to future Internet of Things applications where streams of sensor data must be processed inreal time.

Conventional architectures are not fast or efficient enough for some applications because they use a"saveandprocess" paradigm in which incoming data is stored in databases in memory or on disk, andthen queries are applied. When fast responses are critical, or the volume of incoming information isvery high, application architects instead use a "processfirst" CEP paradigm, in which logic is appliedcontinuously and immediately to the "data in motion" as it arrives. CEP is more efficient because itcomputes incrementally, in contrast to conventional architectures that reprocess large datasets, oftenrepeating the same retrievals and calculations as each new query is submitted.

Two forms of stream processing software have emerged in the past 15 years. The first were CEPplatforms that have builtin analytic functions such as filtering, storing windows of event data,computing aggregates and detecting patterns. Modern commercial CEP platform products includeadapters to integrate with event sources, development and testing tools, dashboard and alerting tools,and administration tools.

More recently the second form — distributed stream computing platforms (DSCPs) such as AmazonWeb Services Kinesis and opensource offerings including Apache Samza, Spark and Storm — wasdeveloped. DSCPs are generalpurpose platforms without full native CEP analytic functions andassociated accessories, but they are highly scalable and extensible so developers can add the logic toaddress many kinds of stream processing applications, including some CEP solutions.

User Advice:

Companies should use CEP to enhance their situation awareness and to build "senseandrespond"behavior into their systems. Situation awareness means understanding what is going on so thatyou can decide what to do.

CEP should be used in operational activities that run continuously and need ongoing monitoring.This can apply to fraud detection, realtime precision marketing (crosssell and upsell), factoryfloor systems, website monitoring, customer contact center management, trading systems forcapital markets, transportation operation management (for airlines, trains, shipping and trucking)and other applications. In a utility context, CEP can be used to process a combination ofsupervisory control and data acquisition (SCADA) events and "last gasp" notifications from smartmeters to determine the location and severity of a network fault, and then to trigger appropriateremedial actions.

Companies should acquire CEP functionality by using an offtheshelf application or SaaS offeringthat has embedded CEP under the covers, if a product that addresses their particular businessrequirements is available.

When an appropriate offtheshelf application or SaaS offering is not available, companies shouldconsider building their own CEPenabled application on an iBPMS, ESB suite or operationalintelligence platform that has embedded CEP capabilities.

For demanding, highthroughput, lowlatency applications — or where the event processing logicis primary to the business problem — companies should build their own CEPenabled applicationson commercial or opensource CEP platforms (see examples of vendors below) or DSCPs.

In rare cases, when none of the other tactics are practical, developers should write custom CEPlogic into their applications using a standard programming language without the use of acommercial or opensource CEP or DSCP product.

Business Impact: CEP:

Improves the quality of decision making by presenting information that would otherwise beoverlooked.

Enables faster response to threats and opportunities.

Helps shield business people from data overload by eliminating irrelevant information andpresenting only alerts and distilled versions of the most important information.

CEP also adds realtime intelligence to operational technology (OT) and business IT applications. OT ishardware and software that detects or causes a change through the direct monitoring and/or control of



physical devices, processes and events in the enterprise. For example, utility companies use CEP as apart of their smart grid initiatives, to analyze electricity consumption and to monitor the health ofequipment and networks.

CEP is one of the key enablers of contextaware computing and intelligent business operations. Much ofthe growth in CEP usage during the next 10 years will come from the Internet of Things, digitalbusiness and customer experience management applications.




Sample Vendors: Amazon; Apache; EsperTech; Feedzai; IBM; Informatica; LG CNS; Microsoft;OneMarketData; Oracle; Red Hat; SAP; SAS (DataFlux); ScaleOut Software; Software AG; SQLstream;Tibco Software; Vitria; WSO2


"Use ComplexEvent Processing to Keep Up With Realtime Big Data"

"Best Practices for Designing Event Models for Operational Intelligence"

Sliding Into the TroughBig DataAnalysis By: Mark A. Beyer

Definition: Big data is highvolume, velocity and variety information assets that demand costeffective,innovative forms of information processing for enhanced insight and decision making.

Position and Adoption Speed Justification: Big data has crossed the Peak of Inflated Expectations.There is considerable debate about this, but when the available choices for a technology or practicestart to be refined, and when winners and losers start to be picked, the worst of the hype is over.

It is likely that big data management and analysis approaches will be incorporated into a variety ofexisting solutions, while simultaneously replacing some of the functionality in existing market solutions(see "Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016").The market is settling into a more reasonable approach in which new technologies and practices areadditive to existing solutions and creating hybrid approaches when combined with traditional solutions.

Big data's passage through the Trough of Disillusionment will be fast and brutal:

Tools and techniques are being adopted before expertise is available, and before they are matureand optimized, which is creating confusion. This will result in the demise of some solutions andcomplete revisions of some implementations over the next three years. This is the very definitionof the Trough of Disillusionment.

New entrants into this practice area will create new, shortlived surges in hype.

A series of standard use cases will continue to emerge. When expectations are set properly, itbecomes easier to measure the success of any practice, but also to identify failure.

Some big data technologies represent a great leap forward in processing management. This isespecially relevant to datasets that are narrow but contain many records, such as those associated withoperational technologies, sensors, medical devices and mobile devices. Big data approaches toanalyzing data from these technologies have the potential to enable big data solutions to overtakeexisting technology solutions when the demand emerges to access, read, present or analyze any data.However, inadequate attempts to address other big data assets, such as images, video, sound and eventhreedimensional object models, persist.

The larger context of big data is framed by the wide variety, and extreme size and number, of datacreation venues in the 21st century. Gartner clients have made it clear that big data technologies mustbe able to process large volumes of data in streams, as well as in batches, and that they need anextensible service framework to deploy data processes (or bring data to those processes) thatencompasses more than one variety of asset (for example, not just tabular, streamed or textual data).

It is important to recognize that different aspects and varieties of big data have been around for morethan a decade — it is only recent market hype about legitimate new techniques and solutions that hascreated this heightened demand.

Big data technologies can serve as unstructured data parsing tools that prepare data for dataintegration efforts that combine big data assets with traditional assets (effectively the firststagetransformation of unstructured data).

User Advice:

Focus on creating a collective skill base. Specifically, skills in business process modeling,information architecture, statistical theory, data governance and semantic expression are requiredto obtain full value from big data solutions. These skills can be assembled in a data science lab ordelivered via a highly qualified individual trained in most or all of these areas.

Begin using Hadoop connectors in traditional technology and experiment with combining



traditional and big data assets in analytics and business intelligence. Focus on this type ofinfrastructure solution, rather than building separate environments that are joined at the level ofanalyst user tools.

Review existing information assets that were previously beyond analytic or processing capabilities("dark data"), and determine if they have untapped value to the business. If they have, makethem the first, or an early, target of a pilot project as part of your big data strategy.

Plan on using scalable information management resources, whether public cloud, private cloud orresource allocation (commissioning and decommissioning of infrastructure), or some otherstrategy. Don't forget that this is not just a storage and access issue. Complex, multilevel, highlycorrelated information processing will demand elasticity in compute resources, similar to theelasticity required for storage/persistence.

Small and midsize businesses should address variety issues ahead of volume issues whenapproaching big data, as variety issues demand more specialized skills and tools.

Business Impact: Use cases have begun to bring focus to big data technology and deploymentpractices. Big data technology creates a new cost model that has challenged that of the data warehouseappliance. It demands a multitiered approach to both analytic processing (many contextrelatedschemasonread, depending on the use case) and storage (the movement of "cold" data out of thewarehouse). This resulted in a slowdown in the data warehouse appliance market while organizationsadjusted to the use of newly recovered capacity (suspending further costs on the warehouse platform)and moving appropriate processing from a schemaonwrite approach to a schemaonread approach.

In essence, the technical term "schema on read" means that if business users disagree about how aninformation source should be used, they can have multiple transformations appear right next to eachother. This means that implementers can do "late binding," which in turn means that users can see thedata in raw form, determine multiple candidates for reading that data, determine the top contenders,and then decide when it is appropriate to compromise on the most common use of data — and to load itinto the warehouse after the contenders "fight it out." This approach also provides the opportunity tohave a compromise representation of data stored in a repository, while alternative representations ofdata can rise and fall in use based on relevance and variance in the analytic models.




Sample Vendors: Cloudera; EMC; Hortonworks; IBM; MapR; Teradata


"Big Data Drives Rapid Changes in Infrastructure and $232 Billion in IT Spending Through 2016"

"'Big Data' Is Only the Beginning of Extreme Information Management"

"How to Choose the Right Apache Hadoop Distribution"

"The Importance of 'Big Data': A Definition"

KeyValue Database Management SystemsAnalysis By: Nick Heudecker

Definition: Keyvalue stores keep data as a binary object and lend themselves to use cases with manyreads and writes. They evolved to support rapid scaling for simple data collections by automating the"sharding" process — splitting the data and distributing it across multiple nodes in a massively parallelenvironment. In keyvalue systems, data is added and read, but rarely updated. There are no "fields" toupdate — rather, the entire value, other than the key, must be updated if changes are to be made.

Position and Adoption Speed Justification: Interest in keyvalue database management systems(DBMSs) dropped off in late 2013 and early 2014. While keyvalue DBMSs are ideal for certain usecases, such as caches or Web sessions, hype has shifted to NoSQL DBMSs capable of addressingbroader use cases. This is driving the rapid evolution of keyvalue DBMSs. They are adopting featuresnormally found in tablestyle and documentstore DBMSs in order to appeal to a wider set ofapplications. For example, keyvalue stores have not been wellsuited to ad hoc queries and analytics.

This evolution of keyvalue DBMSs to include features of other NoSQL DBMSs continues theadvancement along the Hype Cycle toward the Trough of Disillusionment.

User Advice:

Modeling applications using keys and values is a substantial departure from the relational model.Application access patterns should be considered before deciding whether a keyvalue store is aviable option.

The various keyvalue DBMSs have little in common. Developing an application with one platformdoes not guarantee that it will be transferable to another. Features available in one product maynot be available or function the same way in other products.

Some keyvalue DBMSs are available in lightweight or embedded footprints, making them ideal foruse on devices with a constrained footprint, such as mobile devices, wearables or sensors.

Business Impact: The advantages of keyvalue stores have a moderate impact on enterprises. When



applied to appropriate use cases, keyvalue stores provide a lowcost approach to storing largeamounts of data. They also perform exceptionally well when handling large amounts of reads andwrites. The caveat is that applications must fit into the access models defined by the underlyingproduct.

Enterprises should expect vendors of keyvalue DBMSs to continue to expand the range of addressableuse cases. Vendors can also be expected to improve management and development tools.



Maturity: Emerging

Sample Vendors: Aerospike; Amazon Web Services; Basho Technologies; Oracle




Multidomain MDM SolutionsAnalysis By: Saul Judah; Andrew White; Bill O'Kane

Definition: Multidomain master data management (MDM) is a technologyenabled discipline thatsupports the management of any number of master data domains for a given implementation style.

Position and Adoption Speed Justification: Multidomain MDM moves forward slightly toward thetrough on the Hype Cycle curve this year. Despite the target market becoming more aware of thebenefits that this technologyenabled discipline can offer, the rate of adoption has accelerated slowlycompared with singledomain MDM.

The first generation of MDM technology offerings focused on provision of singledomain MDM (e.g.,customer MDM, product MDM). Multidomain MDM represents secondgeneration technology that:

Can be implemented in a single instance

Results in a single, uniform data model that is interoperable for different data domains, withcapabilities for managing crossdomain intersections (e.g., rules hierarchy across both productand customer data)

Has a workflow and user interface elements that are uniform or interoperable

Supports at least one use case, implementation style (e.g., centralized, registry), and organizationand governance model for specific industry scenarios

Although multidomain MDM is one of the most hyped topics on the MDM landscape, its meaning is notconsistently understood by either users or vendors. Many vendors continue to use the term"multidomain MDM" to indicate that they can meet all of an organization's MDM needs, whatever thenumber of products that vendors may need to install. Furthermore, many users are seeking to fulfill awide range of divergent data requirements. We often see such conversations breaking down when anorganization cannot decide whether a specific technology needs to be used and extended — such aswith singledomain MDM — or if several singledomain MDM solutions (called "multipledomain" MDM)are needed, or, indeed, if true multidomain MDM is the right response to an organization's needs.

We see a growing number of enduser organizations deploying what they call "multidomain MDM." Insome cases, this is in fact two different hubs, each from a different vendor, and each focused on aspecific data domain (commonly customer/party and product/service). We actually call this "multipledomain MDM," since there are two different systems. Another set of selfnamed programs are, in fact,really multidomain MDM, in that there is one system — but the complexity of the objects in the systemvaries greatly. For example, the system may include the customer (most complex object), supplier,employee and location. This is, by definition, multidomain. However, it does not convey the intent theend user is seeking, which is "support for any and all of my most complex objects."

User Advice: Multidomain MDM solutions remain less mature than their singledomain counterparts.Although several vendors market and sell a multidomain message, in most cases the implementedsolution is closer to a multipledomain MDM, in that different vendors address specific MDM domainneeds, or the same vendor implements multiple single domains. For an end user, the adoption of amultidomain MDM solution will be dependent on:

A specific vendor's ability to meet the necessary complexity across different data domains andprovinces for desired use case, implementation style and industry

The organizational ability to establish and operate MDM design capabilities (e.g., multidata domainmodeling, crossdomain business rules, crossbusiness area workflows) that serve businessasusual operational needs for performance, reliability and security over and beyond the basicfunctionality that is provided out of the box

Governance maturity, political or cultural readiness, and the ability to establish organizationalleadership across multiple business areas spanning multiple data domains

As such, users should continue to be cautious in their approach to selecting multidomain MDM as theresponse to their business needs until they are satisfied that their organization is ready for it. Vendorswill be hyping their capabilities (see point 1, above); users need to explore their business and IT



readiness as well as their MDM maturity to ensure the right kind of technology is matched to theirorganizational culture and business goals.

Business Impact: Multidomain MDM solutions offer the enterprise a consistent, trusted semantic viewof all its key master data domains. This offers significant advantages that accrue from strategic,enterpriselevel information governance, management of consistent business data rules acrossorganizational units, alignment of business data definitions and the effective execution of informationstewardship.

As such, it supports more advanced, enterprisewide information strategies that seek to support marketdifferentiation using information as a business driver. However, technology adoption alone does notassure success, since greater effort is needed in design, governance, process and organizational changemanagement.



Maturity: Emerging

Sample Vendors: IBM; Informatica; Orchestra Networks; Riversand; SAP; Stibo Systems; TibcoSoftware


"Mastering Master Data Management"



InMemory Database Management SystemsAnalysis By: Roxane Edjlali

Definition: An inmemory DBMS (IMDBMS) is a DBMS that stores the entire database structure inmemory and accesses all the data directly, without the use of input/output instructions to store andretrieve data from disks, allowing applications to run completely inmemory. This should not beconfused with a caching mechanism, which stores and manages disk blocks in a memory cache forspeed. IMDBMSs are available in both rowstore and columnstore models, or a combination of both.

Position and Adoption Speed Justification: IMDBMS technology has been around for many years(for example, IBM solidDB, McObject's eXtremeDB and Oracle TimesTen). However, we have seenmany new vendors emerging during the past three years. SAP has been leading with SAP Hana, whichnow supports hybrid transactional/analytical processing (HTAP). Other major vendors (Teradata, IBMand Microsoft), except Oracle, have added inmemory analytic capabilities as part of their DBMSs.Oracle is due to deliver it in 2014, and Microsoft SQL Server 2014 has also added inmemorytransactional capabilities. Small, innovative vendors also continue to emerge — both in the relational(MemSQL, for example) as well as in the NoSQL area (Aerospike, for example).

The adoption by all major vendors demonstrates the growing maturity of the technology and thedemand from customers looking at leveraging IMDBMS capabilities as part of their informationinfrastructure. While SAP Hana is leading the charge, with 3,000 customers with hundreds inproduction, the addition of inmemory capabilities by all major players should further accelerateadoption of IMDBMS technology during the next two years.

Many use cases are supported by IMDBMS. For example, solidDB and TimesTen were originallydeveloped for highspeed processing of streaming data for applications such as fraud detection, withthe data then written to a standard DBMS for further processing. Others, such as Altibase, Aerospikeand VoltDB, focus on highintensity transactional processing. Some IMDBMSs — such as Exasol,ParStream or Kognitio — are dedicated to inmemory analytical use cases. Finally, the ability to supportboth analytical and transactional (aka HTAP) use cases on a single copy of the data is gaining traction inthe market — led by SAP and now Microsoft, along with smaller emerging players such as Aerospike orMemSQL.

The promise of the IMDBMS is to combine, in a single database, both the transactional and analyticaluse cases without having to move the data from one to the other. It enables new business opportunitiesthat would not have been possible previously, by allowing realtime analysis of transactional data. Oneexample is in logistics, where business analysts can offer customers rerouting options for potentiallydelayed shipping proactively, rather than after the fact; hence, creating a unique customer experience.

Another example comes from online gambling, whereby computing of the handicap could occur as amatch is ongoing. To support such use cases, both the transactional data and the analytics need to beavailable in real time. While analytical use cases have seen strong adoption, for most organizationsIMDBMS for HTAP technology remains three years away.

User Advice:

Continue to use IMDBMS as a DBMS for temporary storage of streaming data where realtimeanalysis is necessary, followed by persistence in a diskbased DBMS.

IMDBMS for analytic acceleration is an effective way of achieving increased performance.

The single most important advancement is HTAP as a basis for new, previously unavailableapplications — taking advantage of realtime data availability, with IMDBMS for increased



performance and reduced maintenance. Organizations should monitor technology maturity andidentify potential business use cases to decide when to leverage this opportunity.

Vendor offerings are evolving fast and have various levels of maturity. Compare vendors fromboth the technology and pricing perspectives.

Business Impact:

These IMDBMSs are rapidly evolving and becoming mature and proven — especially for reliabilityand fault tolerance. As the price of memory continues to decrease, the potential for the business istransformational.

The speed of the IMDBMS for analytics has the potential to simplify the data warehouse model byremoving development, maintenance and testing of indexes, aggregates, summaries and cubes.This will lead to savings in terms of administration, improved update performance, and increasedflexibility for meeting diverse workloads.

The high performance implies that smaller systems will do the same work as much larger servers,which will lead to savings in floor space and power. While the cost of acquisition of an IMDBMS ishigher than a diskbased system, the total cost of ownership of an IMDBMS should be less over athree to fiveyear period because of cost savings related to personnel, floor space, power andcooling.

HTAP DBMSs will enable an entire set of new applications. These applications were not possiblebefore, because of the latency of data moving from online transaction processing systems to thedata warehouse. However, this use case is still in its infancy.




Sample Vendors: Aerospike; Exasol; IBM; Kognitio; McObject; MemSQL; Microsoft; Oracle;ParStream; Quartet FS; SAP; Teradata; VoltDB


"Who's Who in InMemory DBMSs"

"Cool Vendors in InMemory Computing, 2013"

"Taxonomy, Definitions and Vendor Landscape for InMemory Computing Technologies"

"SAP's Business Suite on Hana Will Significantly Impact SAP Users"

Data Quality Software as a ServiceAnalysis By: Ted Friedman

Definition: Data quality software as a service (SaaS) refers to data quality functionality (such asprofiling, matching, standardization and validation) delivered using a model in which an externalprovider owns the infrastructure and provides the capabilities in a scalable, "elastic" and multitenantcloud environment used by its customers. These capabilities can be used as alternatives to inhousedeployments of data quality software or the development of customcoded solutions.

Position and Adoption Speed Justification: The impact of the cloud on infrastructure capabilities isaccelerating and will increasingly impact how data quality operations are executed. The model throughwhich data quality capabilities have been delivered by IT organizations has largely been a deploymentof data quality packaged tools that reside on enterprises' internally owned computing infrastructures.

However, the impact of cloud services on the data quality domain, typically in the form of SaaSdelivery, continues to gain interest as an alternative to onpremises data quality tool deployments or asan augmentation of data quality efforts. Data quality capabilities and technologies are increasinglybeing procured and deployed as a service (such as payperuse and low or zeroupfrontcost models).Buyers are interested in such licensing models because it helps them procure these capabilities as anoperational rather than a capital expense.

Data quality SaaS solutions are emerging, largely to ensure the quality of customer data for discreteoperations such as postal address validation and cleansing, email address validation, and telephonenumber validation, as well as various forms of data enrichment. However, they generally don't cover anorganization's entire range of data quality needs (such as data profiling).

Common data quality issues, often in the domain of customer data, are increasingly being resolvedusing SaaS as organizations diversify their deployment models. Demand for data quality SaaS is alsorising in a range of other data domains, and capabilities are gradually becoming available to addressother forms of master data, such as for ensuring the compliance of product data with specificstandards.

Some data quality SaaS capabilities are also becoming available as part of data integration platform asa service (PaaS) offerings, and the market for data integration tools and data quality tools continues toconverge. Vendors will widen the functional capabilities available as services as adoption of SaaSmodels and the maturity of cloudbased infrastructure and information risk models grow.

This capability is now approaching the Trough of Disillusionment; rapidly increasing demand will



pressure providers and capabilities to move beyond the limitedscope data enrichment servicescurrently available toward a broader range of data quality capabilities that includes data profiling,monitoring and generalized data cleansing. At the same time, adoption is accelerating — a recent studyon adoption and usage of data quality capabilities showed that dataqualityasaservice usage doubledfrom 2012 to 2013, to approximately 14% market adoption.

However, while we believe there will be widening adoption for data quality SaaS in the next five years,it will be most often a simplistic approach of simply executing base data quality processes on data ineither a SaaS or PaaS environment — more advanced data quality and governance options will lag inthe cloud just as they do onpremises.

User Advice: Organizations should evaluate whether SaaSbased data quality capabilities can providebenefits in advancing or augmenting their current approaches. IT leaders, information governancestakeholders and data stewards responsible for data quality will seek to reduce the complexity oftechnology deployments to help justify and achieve improvements in data quality.

Organizations should consider using data quality SaaS not only to avoid the cost and challenge oftraditional deployments, but also to reduce the cost of reliance on specialized skills. Some aspects ofdata quality improvement, such as the cleansing and validation of customers' addresses, are alreadyhandled successfully by the SaaS model because they have a welldefined scope and are largelyrepeatable across enterprises. Organizations can take advantage of data quality SaaS as a rest point intheir overall governance strategy and pursue it as a chance to extend the governance reach in theorganization and information infrastructure.

Providers offer an increasing range of functionality (including matching and enrichment) via servicemodels, but as yet, few data quality SaaS offerings have the capabilities to address all facets of thedata quality discipline. Buyers should ensure that SLAs with providers of data quality SaaS protect thesecurity and privacy of data that crosses organizational boundaries.

Business Impact: Early benefits, in the form of cost reductions and faster implementation of dataquality capabilities, will result from resourceconstrained organizations applying data quality SaaS tospecific issues, such as the cleansing of customer contact data and ensuring conformance of productspecification data with industry standards. As SaaS and cloudbased infrastructure matures, dataquality services will become increasingly common components of organizations' informationmanagement infrastructures. Additionally, businesses adopting SaaS will increasingly need to ensuredata quality rules are consistently applied across SaaS applications and onpremises applications.




Sample Vendors: Acxiom; Experian QAS; Informatica; Loqate; ServiceObjects; StrikeIron


"Assessing the Cloud's Impact on Data Quality Capabilities"

"How Data Management Technologies and Providers Are Adapting to Meet the Demands of Data in theCloud"


"The State of Data Quality: Current Practices and Evolving Trends"

Enterprise Taxonomy and Ontology ManagementAnalysis By: Mark A. Beyer

Definition: Taxonomy and ontology is a guiding set of unifying principles that allows for therepresentation of information. The ontological method determines "how alike" things are. Thetaxonomic design process determines "how different things are" inside a category. In relationalsystems, they are expressed as entities and attributes (taxonomy) in a single model (ontology).

Position and Adoption Speed Justification: Taxonomy and ontology management exhibits a difficultand slowmoving maturity curve. As a result, not much has changed since the preceding Hype Cycle. Atthe beginning of 2014, we see continued slow momentum in developing business glossaries anddeploying technology to solicit business user inputs, which indicates that the technology is nowexpected to work. Yet, despite the apparent progress, successes are limited to individual teams or, atmost, departmentlevel efforts. Certain aspects of the consumerization of IT could accelerate a widerapproach, but for all the promise, significant barriers to expansion and adoption remain.

Physical data models and the formats holding data will become increasingly inconsequential asgovernance, taxonomy and ontology become the new core of data and information practices. While stillvery early in any form of wide adoption, gradually increasing demand for combined metadatamanagement with taxonomy and ontology management is inevitable. Importantly, while modeling anddatabase design tools have addressed table and column management for decades, the ability tomanage the formalized rationalization process of taxonomy to taxonomy, ontology to ontology andtaxonomy to ontology management has only recently gained interest.

Taxonomies and ontologies are hard to develop and use, but they do have an indispensable aspect forcertain highvalue, lowfaultrate situations. Applications that automate taxonomy development "behind



the scenes" (transparent even to the developer's concerns) can provide significant support to taxonomymanagement (for example, Digital Harbor). Scientific disciplines and situations that require nearabsolute certainty of communications (like national laboratory and testing projects) use taxonomymodels to good effect. Content taxonomies have similar effectiveness in the pharmaceuticals sector,case management and publishing. IT data modelers will rarely get a chance to engage in taxonomymodeling, and such activities have limited business value given the costs, but continued progresstoward a simpler means of modeling and maintaining taxonomies will prove useful to data search andretrieval, as well as data visualization.

Early indications that glossaries and other rudimentary taxonomy/ontology approaches were starting toshow adoption gave the false impression that these technologies were advancing. Any advance hasproven to be almost imperceptibly slow, due to the difficulty of this kind of work and the constraintspresented by the current generation of tools. Ultimately, the expectations for significant introspectionfrom these tools are overly optimistic. The model that must be pursued includes tools that presentuseful metadata to human users, and the bulk of ontological work will remain humandriven. When thisunderstanding is reached, the tools will do less, but adoption will increase and maturity will finally bepossible. We see the market developing less in the direction of generalpurpose, crossplatformsemantic modeling and more toward ontological and taxonomic tools that will be used in specific casesand for specific projects, like extraction, transformation and loading, and master data management(MDM).

User Advice: Information architects troubled by taxonomy and ontology issues should:

Understand that business glossary tools have started to see success. These tools permitpredominant definitions, but also localized and domainspecific definitions and terminologymanagement. Determine whether significant crosspurpose communication is taking place in theorganization and consider using a glossary tool to at least quantify the problem. For the moreadvanced features, it is unfortunate that vendors' solutions are currently ahead of demandbecause they could exhaust their capital before the market is ready to adopt them.

Educate designated or selected business personnel in roles for the creation of information assetsand in the importance of metadata as a precursor to introducing these practices. Exercise extremecaution, however, as end users should not be subjected to the rigor or terminology involved inmetadata management. The focus here is ensuring the business understands the process, thebenefits and users' level of commitment. Vendors should focus on these early benefits and deliverfunctionality that serves this less complex market demand.

Initiate text mining/analytics against metadata containing various business definitions usingshared data models as a taxonomic guide, with the aim of aligning relational data, newinformation types (such as social and media data) and content analysis metadata.

Look for opportunities to federate various metadata support efforts that are being established indifferent disciplines — for example, business intelligence and MDM — and begin to build a registryof shared taxonomies and ontologies. This is one possible workaround for the limitations ofcurrent approaches. At least begin to evolve the semantic disciplines within your organization.

Business Impact: Enterprise metadata taxonomy and ontology management will bring tighterintegration between business process changes and IT system changes. It will also enable betterassessment by business analysts of the risks and benefits that accrue in the business regarding themaintenance and security of information assets.

The pursuit of taxonomy and ontology management will begin the process of aligning risk managementwith operations management, finally bridging the gap between compliance and margin management.For example, identifying the use of differing terminology will allow for more consistent reporting acrossmultiple compliance bodies and regulators. In managing taxonomy and ontology resolution, data qualityefforts will be easier to manage (for example, nonshared terms will be resolved to each other, andsources will recognize common models for data quality resolution and MDM support).



Maturity: Emerging

Sample Vendors: Attunity; Global IDs; IBM (Rational); Mondeca; Pragmatic Solutions; SAS


"The Nexus of Forces Is Driving the Adoption of Semantic Technologies, but What Does That Mean?"

Database Platform as a ServiceAnalysis By: Donald Feinberg

Definition: A database platform as a service (dbPaaS) is a database management system (DBMS) ordata store engineered as a scalable, elastic, multitenant service, with a degree of selfservice and soldand supported by a cloud service provider. We do not restrict the definition to relational DBMSs onlyand include NoSQL DBMSs that are cloudenabled and based on nonrelational file structures.

Position and Adoption Speed Justification: DbPaaS products are growing in number, due to thedemand and the maturing cloud platforms, and many offerings currently available are relatively new.There are fully relational dbPaaS offerings with atomicity, consistency, isolation and durability (ACID)properties, but many of the newer dbPaaS offerings are nonACID dbPaaS offering support for eventualconsistency, which restricts them to less complex and normally singleuser transactions.



NonACID dbPaaS technology is becoming more widely used for Web 2.0 development projects, wheresharing data among multiple tenants in the cloud is not a requirement of the application. Redis Labs(formerly Garantia Data) is an example of such NoSQL dbPaaS and further, is an inmemory DBMS.

Most DBMS engines are available on a cloud infrastructure, such as Amazon's Elastic Compute Cloud,but these are not dbPaaS according to our definition (see "What IT Leaders Need to Know AboutApplication PaaS Models and Use Patterns"). Standard DBMSs are not specifically engineered to takeadvantage of the cloud, which includes Amazon's Relational Database Service (available for MySQL andOracle implementations), IBM's DB2, Microsoft's SQL Server, Oracle's DBMS, and many others.

These are offered as hosted services, not as cloud services, since the data store software in questionmakes no provision for elasticity or other cloud capabilities and users are expected to manage theDBMS instances and infrastructure as a service (IaaS). In addition, users normally purchase thelicenses separately from the IaaS.

Many Web 2.0 application users may be experimenting with some of these services, but most still relyon noncloudbased DBMS implementations. One exception is where all the data already exists in thecloud where it is desirable to have the application with the data, for example, with SaaS applicationdata. One advantage of dbPaaS is that it doesn't use licensebased pricing, but rather "elastic" pricing(the more you use, the more you pay; the less you use, the less you pay) or fixed subscription pricing(a flat price per user). This flexibility is an advantage as long as the "rental" price does not exceed thestandard licensing cost.

The rate of adoption of dbPaaS will depend on the acceptance of cloud system infrastructure in generaland the maturation of dbPaaS offerings. It will also depend on the usage model and whether therelaxed consistency model can be used by an application. Gartner believes additional dbPaaS productswill become available as true cloud services during the next few years, in line with what Microsoft hasdone with the Azure SQL Database.

This increase in maturity will enable the replacement of lesscritical, smaller workloads currently onpremises. The time to the Plateau of Productivity for dbPaaS has changed over the past few years as itis not widely used for production databases. This has caused us to keep it in the two to fiveyear rangewith no movement on the curve. As more products become available and their maturity increases, weexpect to see usage grow, although this will be closer to the fiveyear horizon.

Currently, dbPaaS is used primarily for the development and testing of applications — where databasesizes are smaller and issues of security and sharing with multiple users are not a major concern.Recently, we have seen examples of applications using dbPaaS in production applications deployed inthe cloud on Microsoft Azure SQL Database, Database.com, DynamoDB and others.

This growing use for development and production, coupled with the growing number of offerings,moves the technology closer to the Trough of Disillusionment.

User Advice: Advice for users in the next two years:

Use dbPaaS to develop and test systems, such as smaller production systems with a low numberof users, hosting Webspecific content or for file storage in the cloud. This is especially importantwhen the time to delivery is short and resources and funding are scarce.

Be cautious about dbPaaS, as there may still be issues with security and reliability — and withsome nonrelational DBMSs, there are issues with concurrent user control.

Exercise care with systems with high levels of data transfer — most cloud infrastructure vendorscharge for movements of data in and/or out of the cloud.

Recognize that latency is another data transfer issue — the time needed to transfer large amountsof data to the cloud (for example, to support a data warehouse in the cloud) can be restrictive.

Business Impact: Initially, dbPaaS had an impact on software vendors (especially smaller ones)requiring a less expensive platform for development. Increasingly, Gartner's clients report similar usefor application development within IT organizations. As dbPaaS gains maturity (especially in scalability,reliability and security), implementations used for shortterm projects (such as small departmentalapplications and rapid development platforms) will show some marked cost reductions, compared withimplementations within IT departments.

These cost savings will be primarily based on the ability to set up a dbPaaS environment without capitalexpenditure and the use of expensive IT personnel. The speed of setup will be a primary driver of therapid deployment of systems — without the usual requirements and planning necessary for IT projectswithin the IT department. This will also reduce the need for IT staff to respond to shortnotice andshortduration projects, thus reducing overall IT costs.

Some vendors, such as Microsoft, now offer both dbPaaS and cloud hosting (SQL Server on MicrosoftAzure Virtual Machines), allowing customers to decide where to locate applications and to have theflexibility to move them as desired. This does require careful attention to functionality (not allfunctionality is available in both), but it does allow customers to use dbPaaS and then decide later (astheir requirements grow) to move to hosted cloud, if desired (or vice versa).

Elasticity is a requirement for a DBMS to be classified as a dbPaaS and to deliver the benefits expectedof a cloud platform service. Elastic resource allocation for the virtual machines and the storage must beprovided by the DBMS for both simple and complex transactions. This can have an impact on overallcost as usage requirements change over time, especially if usage is seasonal (as, for example, in theretail sector). Elasticity allows the database to grow and contract as needed and so has the same effecton cost, growth and contracting.



As dbPaaS offerings mature during the next two to five years, it will be possible for an organization tohost its entire DBMS infrastructure as dbPaaS, with potential reductions in the cost of servers, storage,DBMS licenses, maintenance and support, storage management and database administration. This willbe of interest, particularly for financial managers monitoring costs and keen to reduce the overall costof IT.



Maturity: Emerging

Sample Vendors: Amazon; EnterpriseDB; Google; IBM; Microsoft; Oracle; Redis Labs; salesforce.com


"Platform as a Service: Definition, Taxonomy and Vendor Landscape, 2013"

MDM of Supplier Data SolutionsAnalysis By: Deborah R Wilson

Definition: Master data management (MDM) of supplier data solutions involves standalone enterpriseapplications dedicated to housing and governing supplier master data for use in multiple systems. MDMof supplier data solutions is often implemented with links to external information services, such as Dun& Bradstreet, for ongoing data enrichment and/or validation.

Position and Adoption Speed Justification: MDM of supplier data solutions has rapidly moved pastthe Peak of Inflated Expectations, as a result of many implementations delivering modest businessvalue. While semantically consistent, governed supplier master data is valuable, and business usersoften find that MDM of supplier data, by itself, is insufficient to deliver a return on investment.

We label MDM of supplier data as obsolete before plateau because long term, singledomain MDMsolutions (including ones for supplier data) are waning in favor of broader, multidomain MDM systemsthat handle multiple master data domains. Today, this would typically mean using MDM for product orcustomer data solutions that can accommodate supplier records in their data model. The shift tomultidomain MDM solutions will take some time, but five years or more out, we expect most MDM ofsupplier data implementations to leverage a multidomain MDM hub, which will eventually accommodatethe MDM of supplier data out of the box.

For now, most organizations choose to manage supplier master data through one or more of thefollowing approaches:

Leveraging governance and access control to ERPbased master files to keep supplier records asclean as possible

Using a spend analysis tool for afterthefact supplier deduplication to clean up transaction datafor spend analysis

Deploying a supply base management solution to collect the data needed to master a supplierrecord

User Advice: Scope an MDM of supplier data technology strategy in the context of the businessproblems to be addressed. Ensure that your program will deliver the desired business impact. MDM ofsupplier data adds value when it is implemented as an enabler for business initiative such as suppliercollaboration, supplier risk management and category management.

If evaluating MDM for supplier data, include multidomain MDM solutions in your assessment.

Business Impact: MDM of supplier data solutions can ensure the accuracy, uniformity, accuracy,stewardship, semantic consistency and accountability of supplier data through data cleansing andgovernance. Successful MDM of supplier data programs improves reporting and analytics, and enablesmuch more scalable multienterprise collaboration.




Sample Vendors: Oracle; Verdantis



"Magic Quadrant for Strategic Sourcing Application Suites"

Hadoop DistributionsAnalysis By: Nick Heudecker

Definition: Hadoop distributions are commercially packaged and supported editions of Apache Hadoop,a popular opensource framework for processing large volumes of data in parallel on clusters ofcompute nodes. Its popularity has spawned several competing vendors, each offering Hadoop



distributions with varying attributes and maturity levels.

Position and Adoption Speed Justification: Interest in, and adoption of, Hadoop has grown sincelast year's Hype Cycle. The Apache Software Foundation now lists approximately 30 organizations thatdistribute Hadoop, although a number of new entrants have created Hadoop distributions that includeadditional Apache projects and alternatives, including replacements for Hadoop Distributed File System(HDFS). Complete file system replacements are available from numerous vendors, such as MapRTechnologies and IBM. These new file systems typically provide better performance, enhanced securityfeatures, or better stability and availability than HDFS.

Cloud infrastructure providers are providing options as well. Amazon Web Services (AWS) offers ElasticMapReduce (EMR), which integrates with the various data services it provides, such as Amazon SimpleStorage Service (S3) and WebHDFS. Additionally, several Hadoop vendors make their distributionsavailable on AWS. Microsoft's Windows Azure cloud platform offers HDInsight, a Hadoopbased platformthat also integrates with other Microsoft tools, such as Power Pivot. The emergence of Hadoopasaservice providers, like Qubole, Altiscale and Xplenty, help address skills challenges but may complicategovernance and regulatory concerns.

The number of vendors offering additional features for "enterprisereadiness," such as security andadministration, performance tuning and high availability, complicates the selection process. Enterprisereadiness remains uneven and is an issue for adoption by large enterprises looking at deployment intotheir data centers. However, overall market understanding of Hadoop's value and adoption has beenprogressing for early adopters, but an unclear value proposition restrains broader market adoption.

The Hype Cycle position for Hadoop distributions remains unchanged from 2013. While enterprisefeatures have improved and the release of Hadoop 2.0 signifies an important maturity milestone, thedefinition of "Hadoop" is changing. As competing vendors introduce new, lessmature components intotheir offerings, the hype has remained constant over the last 12 months. These components includeApache Spark for inmemory processing, Apache Falcon for data lineage, and Apache Knox Gatewayand Apache Sentry for security.

Complicating positioning on the Hype Cycle is the shifting marketing of Hadoop by various vendors.Introducing nascent concepts like data lakes and data hubs has confused potential business adopters.Furthermore, Gartner research indicates that the most fundamental problem of determining businessvalue persists.

User Advice: While the various Hadoop distributions are changing rapidly, most vendor offerings aremore similar than different. Key considerations when evaluating Hadoop distributions are:

Integration with existing information management and security infrastructure

Availability of alternative data processing and access methods, like SQL and search interfaces, insupport of business objectives

Maturity of tools to configure, monitor and manage the cluster

Business Impact: Hadoop distributions enable enterprises to process large volumes of disparatesources of structured and unstructured data using commodity hardware to conduct new types ofanalysis. Additionally, Hadoop distributions support data warehousing operations by providing a lowercost storage alternative for infrequently used data, while still making it available for processing andanalytics. The other key use of Hadoop is as a data integration tool to improve performance in reducingvery large amounts of data to just what is needed in the data warehouse. However, integration andmanagement of Hadoop in data integration scenarios is not standardized across the vendor landscape— if, indeed, it is available at all.

Many enterprises will find it challenging to deploy Hadoop solutions due to the highly technical nature ofthe various distributions, the shortage of skilled personnel and limited enterprisereadiness. Vendorsand third parties are working to close the skill gap by offering managed solutions, improved toolsupport and integration across product suites.




Sample Vendors: Amazon Web Services; Apache Software Foundation (Hadoop); Cloudera;Hortonworks; IBM; MapR Technologies; Pivotal


"Hadoop Evolves to Face New Challenges"

"How to Choose the Right Apache Hadoop Distribution"

CrossPlatform Structured Data ArchivingAnalysis By: Garth Landers

Definition: Crossplatform structured data archiving software moves data from custom orcommercially provided applications to an alternate file system or DBMS while maintaining seamlessdata access and referential integrity. Reducing the volume of data in production instances can improveperformance; shrink batch windows; and reduce storage acquisition costs, facilities requirements, thecost of preserving data for compliance when retiring applications, and environmental footprints.



Position and Adoption Speed Justification: Structured data archiving tools have been available formore than a decade and have historically seen more adoption in larger enterprises. These productsprovide functionality to identify old or infrequently used application data and manage it appropriately.Although ROI can be very high, developing policies for retaining and deleting old application data isdifficult and not seen as a priority. Organizations generally tend to add more database licenses or usenative database capabilities, such as purging and partitioning, to address application growth. Thetechnology has long been seen as a cost avoidance measure used to contain operational and capitalexpenditures related to data growth as well as improve factors like application performance. The marketis changing and growing due to growth in data, application retirement, information governance and bigdata analysis opportunities.

Today's data archiving products are mature and will face challenges as various distributions of Hadoopadd capabilities such as retention management. At the same time, new approaches to applicationretirement and curbing structured data growth are happening through areas like virtualization and copydata management. Although the market is seeing growth, awareness remains low. Applicationretirement continues to be a significant driver. Organizations are looking for ways to cut costsassociated with maintaining nolongerneeded legacy applications while preserving application data forcompliance or its historical value. Data center consolidations, including moving to the cloud, andmergers and acquisitions are contributing to the interest in structured data archiving solutions to reducethe number of enterprise applications.

Competition often comes from internal resources who want to build it themselves and fromimprovements in storage technology that transparently improve performance while reducing storageacquisition and ownership costs — more specifically, autotiering, SSDs, data compression and datadeduplication. Nonetheless, the allure of tools that can support multiple applications and underlyingdatabases and the added capabilities these tools provide for viewing data as business objectsindependent of the application are driving administrators to consider them as viable solutions. Newcapabilities — such as better search and reporting, retention management, support for databasepartitioning, and support for SAP archiving — are broadening their appeal.

User Advice: The ROI for implementing a structured data archiving solution can be exceptionally high,especially to retire an application or to deploy a packaged application for which vendorsuppliedtemplates are available to ease implementation and maintenance. Expect that the planning phase maytake longer than the implementation. Among the roadblocks to implementation are requiring theconsulting services, gaining application owner acceptance (especially through testing access to archiveddata), defining the archiving policies and building the initial business case. All vendors in this space canprovide good references, and organizations should speak with references that have similar applicationportfolios and goals for managing their data.

Gartner has found that some clients have attempted to build their own approaches, but these systemsfall flat in terms of overall cost and lack capabilities relative to data access/security and masking.

Business Impact: Creating an archive of less frequently accessed data and reducing the size of theactive application database (and all related copies of that database) improves application performanceand recoverability and lowers costs related to database and application license, server andinfrastructure costs. Transferring old, rarely accessed data from a disk archive to tape can furtherreduce storage requirements. Most vendors in this space are supporting cloud storage as the repositoryfor archived data. Retiring or consolidating legacy applications cuts the costs and risks associated withmaintaining these systems. Overall, organizations can experience better information governance,including reduced risk associated with litigation.



Maturity: Mature mainstream

Sample Vendors: Delphix; HP; IBM; Informatica; OpenText; RainStor; Solix Technologies


"Magic Quadrant for Structured Data Archiving and Application Retirement"

"Application Retirement Drives Structured Data Archiving"

"Build a Leaner Data Center Through Application Retirement"

"Best Practices for Storage Management: Developing an Information Life Cycle Management Strategy"

Entity Resolution and AnalysisAnalysis By: Mark A. Beyer

Definition: Entity resolution and analysis (ER&A) is the ability to resolve multiple labels for individuals,products or other noun classes of information into a single resolved entity in order to analyzerelationships between resolved entities. Advanced forms of entity analytics include the abilities toreverse a resolution, when appropriate, and to communicate such a reversal to all client systems.

Position and Adoption Speed Justification: As a technology and practice, ER&A forms a bridgebetween expectations and practices regarding the input and management of data across varioussystems and even different information channels (for example, social versus backoffice channels, andcustomerdriven versus operations management portals).



Given the rise of cloud deployments, the proliferation of data sources external to the organization, andthe onpremises operations applications present in many large organizations, the difficulty ofdetermining whether many instances of the same individual are a single entity is increasing. At thesame time, demand for this type of resolution is becoming even more important to business. It is alsobecoming more difficult to determine whether networks of individuals are composed of multiple partiesor fictitious personas and aliases. This situation is complicating fraud detection and the identification ofindividuals and networks of people with a view to improving communication, marketing and customerservice.

In 2013, master data management (MDM) practices and data quality solutions began to take on therole of entity resolution. As a result, some vendors have positioned their products as accomplishingER&A simply through data quality practices and MDM, but they have failed to complete the workflowexpectation of resolving networks into individuals (and vice versa), and to track changes in all clientsystems. Analytics technology that extracts identifying information — used to reconcile entities rangingfrom audio, video and images to other types of unstructured data — is needed for ER&A technology tofulfill its potential.

ER&A is a vital functionality, but it will be obsolete before it reaches the Plateau of Productivity becauseof the effort required to achieve the business value it promises to deliver. The less potent ER&Afunctionality will be subsumed into data quality and MDM practices over time. The more functionally richrole of being able to forwardrationalize and then derationalize resolved entities will be lost.

User Advice: Social media sites and social media in general are fraught with misidentification and evenintentional misdirection, which could be prevented by ER&A or omnichannel analysis (the technicalresolution of many channels into one resolved delivery channel, which took Amazon, for example, 15years to achieve).

Organizations involved in fraud detection and other criminal investigations can use ER&A solutions toenhance their use of available information. Data quality stewards and data governance experts can usethem as forensic tools.

However, through 2017, commercial applications of this technology will be challenged because dataquality problems will diminish interest in the analytics aspects of ER&A, and only half the problem ofER&A will be solved.

Use of this technology should involve data stewards and dedicated analysts to interpret and process theresults. It is important to understand that the greatest benefit from this type of technology emergesfrom both entity resolution and analytics, and that the use of entity resolution tools only sets the stagefor analytics. In fact, failure to include analytics (specifically the absence of network awareness) willcause ER&A technology to drop into the Trough of Disillusionment, while entity resolution moves up theSlope of Enlightenment as a muchreduced, "battle scarred," standalone solution.

Business Impact: Leading organizations in all industries should start adopting ER&A technology, eventhough it will eventually be subsumed by other data management techniques. Consumers will becomeincreasingly intolerant of service models that fail to recognize their identity and will migrate tocompetitors that exhibit a higher level of recognition. Identity resolution and analytics will be a criticalingredient of informationaware organizations in the global market as these technologies contribute tocustomer satisfaction, customer acquisition and the exclusion of undesirable customers. Resolvingidentity issues will enable organizations to significantly improve relationships with their clients andbusiness partners.

Benign uses of ER&A — for alternative billofmaterials analysis and consumer reference networks, forexample — will produce rapid results by quickly identifying data issues and suppliers, and enablingbetter communications with service partners and distribution partners. Additionally, after resolvingindividual identities and their networks of activity, it will be possible to detect the presence of "shadow"processes that, in practice, replace an organization's formal or approved processes. A potentially bigbenefit lies in the analysis of social networks.

Unfortunately, criminals will exercise their creativity in circumventing these systems. Linking ER&Atechnology with voice and speech recognition solutions, as well as text analysis, will help organizationsstay ahead of criminal countermeasures.




Sample Vendors: IBM; Infoglide; Informatica




InMemory Data GridsAnalysis By: Massimo Pezzini

Definition: IMDGs provide a distributed, reliable, scalable and consistent inmemory data store — thedata grid — that is shareable across distributed applications. These applications concurrently performtransactional and/or analytical operations in the lowlatency data grid, thus minimizing access to high



latency, diskbased data storage. IMDGs maintain datagrid consistency, availability and durability viareplication, partitioning and ondisk persistency. Although they are not full application platforms, IMDGscan host distributed application code.

Position and Adoption Speed Justification: Numerous commercial and opensource IMDG productsare available, and vendors are experiencing fast revenue growth and installedbase expansion. Mergerand acquisition activity has slowed down, but some notable pureplay IMDG vendors are still in themarket.

Established products, some in the market for more than 10 years, continue to mature, and extend infunctionality, manageability, high availability, and support for multiple technology platforms,programming environments and computing styles. IMDGs are traditionally used to support largescale,highperformance transaction processing applications.

However, their adoption for big data analytics applications has been growing rapidly over the past threeyears because of the addition of analyticsoriented capabilities (such as Hadoop and NoSQL DBMSsintegration, SQL query features and MapReduce functionality) that enable the use of IMDG topreprocess (that is, absorb, filter, aggregate, process and analyze) highvolume, lowlatency data andevent streams that must be stored in traditional databases for further processing.

IMDGs are increasingly used as performance and scalability boosters in packaged applications, as wellas application infrastructure packages, such as portal products, application servers, enterprise servicebuses (ESBs), complexevent processing (CEP) platforms, business process management (BPM) suites,business activity monitoring (BAM) tools, analytical tools and packaged business applications. IMDGsare also increasingly used in SaaS, PaaS, social networks and other cloud services.

Consequently OEM activity has grown, as established application infrastructure and packagedapplication vendors and cloud/SaaS/PaaS/social network providers secured access to this key, yet hardtodevelop, technology. A relatively small, but growing, ecosystem of system integrators (SIs) and ISVpartners supports the most popular IMDG products.

A growing number of organizations in multiple vertical segments successfully leverage IMDG to enablelargescale production, businesscritical transactional systems, big data analytics and hybridtransaction/analytical processing (HTAP) applications. IMDG adoption will continue to grow because ofthe everincreasing need for greater scalability and performance, push from large middleware vendors,and the maturation of opensource products, as well as IMDGs being bundled into larger softwarestacks.

Products will continue to mature rapidly, and adoption will further accelerate as users kick off newinitiatives such as digital business, mobile apps, intelligence business operation, context awareness andInternet of Things. However, bundling IMDG technologies into software products and cloud services willlikely continue to be the main contributor to IMDG installedbase expansion. Nevertheless, IMDGs willremain a visible and growing standalone market that serves the most demanding requirements andusage scenarios.

Factors that continue to slow IMDG adoption include:

The still relatively small size of the available skills pool for both individual products and the IMDGtechnology overall

The need to reengineer, at times significantly, established applications to leverage IMDGtechnology for improved performance and scale

Limited SI and ISV support

Complexity in deployment, configuration and management

Vendors are aggressively trying to address these issues, and the recent announcement of the JCacheJSR 107 (Java Temporary Caching API) specification by the Java Community Process is a step, if partial,in the direction of standardizing IMDG capabilities. As these efforts come to fruition, they will furtherencourage adoption.

User Advice: Mainstream organizations should consider IMDGs when they are:

Looking to retrofit for IMC (in a minimally invasive fashion) established dataintensive applicationsto boost their performance and scalability, or to offload legacy system workloads

Developing native IMCapplications of the following type:

Webscale

Highvelocity big data analytics

Low latency

Hybrid transaction/analytical processing

Business Impact: Through the use of IMDG, organizations can enhance the productivity of theirbusiness processes, speed operations, support lowlatency requirements, improveuser/customer/supplier/employee satisfaction, extend application reach to larger user constituenciesand enable realtime decision support processes, while protecting investments in their establishedapplication assets.

IMDGs enable organizations to develop highscale, highperformance, lowlatency applications thatcannot otherwise be supported by traditional platforms alone. The use of IMDGs to support HTAP willenable organizations to explore new business opportunities and introduce innovation in their businessprocesses.






Sample Vendors: Alachisoft; CloudTran; Couchbase; Fujitsu; GigaSpaces Technologies; GridGain;Hazelcast; Hitachi; IBM; Microsoft; Oracle; Pivotal; Red Hat (JBoss); SanDisk; ScaleOut Software;Software AG; Tibco Software


"Taxonomy, Definitions and Vendor Landscape for InMemory Computing Technologies"

"InMemory Data Grids Enable GlobalClass Web and Mobile Applications (While Not Bankrupting YourCompany)"

"Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation"

"Cool Vendors in InMemory Computing Technologies, 2014"

Data Warehouse Platform as a Service (dwPaaS)Analysis By: Roxane Edjlali

Definition: Data warehouse platform as a service (dwPaaS) is a specific use case of database platformas a service (dbPaaS) supporting the needs of data warehousing. It is a cloud service that offers thefunctionality of a data warehouse DBMS. It is engineered as a scalable, elastic, multitenant service andsold and supported by a cloud service provider. The provider's services must also include databasesupport, such as database administration.

Position and Adoption Speed Justification: There are now a few solutions for dwPaaS (see "MagicQuadrant for Data Warehouse Database Management Systems"). Some of the vendors have been in themarket for some time (Kognitio or 1010data, for example), while others are more recent (such asAmazon Redshift, in 2012) and others are now entering the market (such as Teradata).

We have seen an acceleration in the adoption of dwPaaS with the entrance of Amazon Redshift.However, takeup overall continues to be between 5% and 20% of the target audience.

With the growing diversity of offerings, we can expect faster adoption in the next two years. However,dwPaaS continues to suffer from confusing vendor marketing around cloud, and challenges indifferentiating between platform as a service (PaaS), infrastructure as a service (IaaS) and hosting.Specific differences should be noted:

Managing a database on a cloud infrastructure (IaaS) removes any of the hardware procurementaspects, but not any of the database deployment or administration.

Managed services (hosting) from IBM and HP (Vertica) are not direct competitors to dwPaaSvendors, but customers view them as an equal alternative from more established vendors. One ofthe major differences in the case of a managed service will be in the pricing model — wheresoftware licenses and hardware are owned by the customer but are being hosted and managed bythe provider. These products have experienced growing acceptance and penetration in the market.

While governance, security and reliability continue to challenge dwPaaS adoption, we are seeing greateradoption driven by the ease of implementation and the flexibility of the pricing model — going from acapital expenditure (capex) to an operating expenditure (opex) model. Moreover, managing data in thecloud is becoming the norm, with many organizations running SaaS applications. However, initial dataload and data integration continues to challenge dwPaaS adoption. As cloud computing matures, forexample, with data integration PaaS, we believe that additional vendors will begin to offer dwPaaS.Finally, for many small or midsize businesses (SMBs) or Webnative companies, cloud services are thepreferred architecture — driving greater adoption of dwPaaS offerings.

Customer acceptance of the solutions currently available is progressing (as evidenced by our customerinteractions), and the solutions will continue to gain in popularity as customers realize that they offermore rapid deployment than may be available from internal IT organizations. Budgetary constraintsalso make dwPaaS more attractive, because the ticket entry price is less than an onpremises datawarehouse.

User Advice:

dwPaaS is useful for new initiatives such as data marts, where the lack of expertise and initialinvestment can be an issue. This applies particularly to SMBs or departmental projects.

Organizations considering dwPaaS need to decide if the technology can consistently meet theproject requirements, especially issues of governance, security and latency. This assessment willneed to be revisited as initial projects mature.

dwPaaS is useful for specific applications that an organization may not want to acquire, develop orsupport itself, or in which a vendor has greater expertise.

Business units that contract for these services must consider the political implications for their inhouse IT support. It is important to secure the backing of the IT organization for these services orthe business will face relationship issues with IT in the longer term.

Clients considering dwPaaS should look at the overall cost over a three to fiveyear period whencomparing it to onpremises options, because the overall cost of dwPaaS may not be cheaper.



Business Impact: There are three primary benefits to dwPaaS: ease and elastic deployment,simplicity of its administration, and the pricing model. (Currently, dwPaaS can be used for thedeployment of data warehouses or data marts.) It is used not only for pilot projects or proofs ofconcept; some examples of dwPaaS have been in the market (with production deployments) for morethan 10 years. Some organizations are managing their full data warehouse with dwPaaS.

This technology will be of particular interest either to small organizations lacking the resources to createtheir own data warehouse infrastructure, or to organizations starting greenfield data warehouse DBMSprojects. For organizations with existing onpremises deployments, adoption of dwPaaS will be slowerbecause moving a full data warehouse to the cloud will be challenging.



Maturity: Early mainstream

Sample Vendors: 1010data; Amazon; Kognitio; Microsoft; Teradata


"Magic Quadrant for Data Warehouse Database Management Systems"

"The State of Data Warehousing in 2014"

OpenSource Data Integration ToolsAnalysis By: Ted Friedman

Definition: Opensource data integration tools provide various styles of integration technology, such asextraction, transformation and loading (ETL); data federation/virtualization; messaging; and datareplication and synchronization. They conform to the opensource model for software development,deployment and support. They enable programmers to read, redistribute and modify their source code.

Position and Adoption Speed Justification: Various opensource data integration tools exist and areincreasingly being considered by enterprises for a variety of use cases. Most continue to be used in apure ETL context, but opensource data integration technologies for federation and replication have alsoentered the market. Increasingly, we also see organizations using opensource data integration tools tosynchronize data across disparate operational sources, collecting and disseminating master data, and insupport of various other use cases.

Despite the cost benefits expected to be derived from using these tools, their adoption as a standard byIT organizations in larger enterprises remains relatively low compared with that of commercial dataintegration tools. This is because of the functional and user experience limitations of most opensourceofferings, relative to market leaders. For example, the maturity of metadata management capabilitiesand the robustness of development environments are lacking in comparison with those of commercialtools. This often relegates the usage of opensource data integration tools to workloads wherelimitations are addressed elsewhere in the architecture.

However, as general interest in opensource software builds, more organizations are asking aboutopensource data integration tools. An increasing number of enterprises are deploying such tools toaugment their enterprise standards, for cases where they are budgetconstrained or require morefocused functionality. As deployment of opensource software for other infrastructure areas (such asoperating systems and database management systems) grows, adoption of opensource dataintegration tools will also increase — particularly for smallscale deployments and data migrationscenarios.

User Advice: Recognize that the opensource movement still has modest but rapidly growing impacton the overall data integration tools market. The initial hype around opensource data integration toolshas waned, as organizations realize that comprehensive deployments — even if open source — stillrequire significant budgets. This is because expertise, practices and shared skills are delivered viapeople, as opposed to being embedded in the tools. However, opensource data integration tools arenow more frequently evaluated as viable options in vendor selection processes, and their functionality isviewed as mature enough for a range of scenarios.

Although opensource data integration tools continue to improve, they are generally much weaker insome functional areas than commercial offerings — for example, metadata management and prebuilttransformations (especially those guided by industry standards such as Health Insurance Portability andAccountability Act [HIPAA], and SWIFT). As a result, at present they are comparatively rarely adoptedin IT organizations for large and highly complex scenarios. More importantly, most opensource dataintegration vendors offer both opensource community and commercially licensed versions with moresignificant functionality.

Data integration is a complex problem requiring a combination of solid technology, services andavailable skills. Today, few opensource offerings bring a significant combination of these elements.Recognize that the overall cost of ownership of data integration tools includes more than just thepurchase price of the software. Data integration is at the heart of the modern informationinfrastructure, and any weaknesses in capabilities (whether in opensource or commercially licensedtools) will result in additional cost and pressure on other infrastructure capabilities and components.Use your organization's overall opensource strategy and experience with other opensource offeringsas a way to judge the potential opportunities and risks of deploying opensource data integration tools.



Business Impact: Opensource tools can reduce the softwarerelated costs of data integration processdeployment, and represent a growing trend in this market toward pricing models that align to buyerinterest in lowercost, "good enough" capabilities. More vendors are evolving toward "commercial opensource," offering a community version of the product that is freely available but has limitedfunctionality, while the commercial version is licensed in a manner more similar to other marketofferings.

In addition, the increase in interest in Hadoop will likely cause adoption of opensource data integrationtools to progress as more providers of Hadoop distributions add the tools into their offerings. Weanticipate that more vendors with purely commercial licensing models will begin to adopt the principlesof opensource providers, by likewise evolving toward hybrid licensing models where they offer lowcostand/or opensource capabilities as well as retain their current traditional licensing and pricingapproaches.




Sample Vendors: CloverETL; Javelin; Jitterbit; JumpMind; Pentaho; Red Hat; Talend



"Critical Capabilities: Data Delivery Styles for Data Integration Tools"

"Data Integration Enables Information Capabilities for the 21st Century"

Data Integration Tool SuitesAnalysis By: Ted Friedman

Definition: Data integration tool markets have traditionally been of a siloed nature. Separate marketsand vendors have existed for various styles of data integration technology such as extraction,transformation and loading (ETL), data federation/virtualization, messaging, and datareplication/synchronization. Convergence continues in data integration technology submarkets asvendors organically extend their capabilities by adding other data integration styles and as largervendors acquire technology to address the range of capabilities.

Position and Adoption Speed Justification: Buyers of data integration tools increasingly seek a fullrange of capabilities to address multiple use cases because their data integration requirements havebecome very diverse. Converged tools will include the core elements of data integration (connectivity,transformation, movement and so on) but with the added ability to deploy these elements in a range ofdifferent styles, driven by common metadata and modeling, design and administration environments.The goal is to model integrated views and data flows once and to be able to deploy them in variousruntime styles — from batch to real time, from bulk to granular, from physical to virtualized.

The move toward data integration tool suites is being driven by buyer demand as organizations realizethey need to think about data integration holistically and have a common set of data integrationcapabilities that they can use across the enterprise. Organizations need to provide an architecture thatcan deliver data at all latencies, granularities and degrees of virtualization across a range of use cases— including data integration for the purposes of master data management, business intelligence,operational system data consistency, migrations, modernization and interenterprise data sharing. Asthe breadth of usage of the tools broadens in this way, strong metadata management functionality toprovide transparency and reuse becomes critical.

As emerging demands to support big data scenarios (which may involve extreme volumes and greaterdiversity of data source types) and the logical data warehouse (which will drive growing needs for datasynchronization and federation) evolve, this will be an increasingly important requirement. To completetheir portfolios, larger vendors continue to acquire small specialists that support a single style of dataintegration. At this point, over 30 vendors offer tool suites that support more than one data deliverystyle.

However, progress toward tightly integrated suites (where all the components share metadata, designenvironments, administrative tooling and more) remains slow. In recent studies on data integration toolusage, deployments continue to be biased toward a single style of data integration (most commonly,bulk/batch data delivery). Alternative and complementary styles of data integration are less commonlyused, but beginning to accelerate.

The more progressive offerings in the market typically combine bulk/batch data delivery capabilities(ETL) with one other style (such as granular changecapture and propagation for data synchronizationpurposes, or data federation/virtualization). However, it is still rare to find single tools supporting agreater diversity of delivery styles and deeply integrated tool suites achieving the same effect, drivenby common metadata and with the ability to reuse models across delivery styles. In most of these toolsuites, common metadata is addressed in one of two ways: unified metadata architectures, in which asingle metadata store is directly accessed by the various parts of the tools suite, or distributedmetadata architectures, in which components of the suite share metadata among themselves viaimport/exportlike approaches.

New entrants in the market hold the promise of leapfrogging the incumbents by delivering technologybased on the idea of a single code base to support many integration styles. Such offerings are starting



to gain traction in the market.

Vendor consolidation is progressing rapidly, but deep technology convergence is occurring at a moremoderate pace. This is largely due to the optimized architectures and parochial representation ofmetadata that are common in singlepurpose tools. Data integration tool suites will continue toapproach the Trough of Disillusionment as user demand for seamless integration across the range ofdata integration styles exceeds vendor ability to meet that demand.

In addition, many organizations continue to deploy data integration capabilities in standalone, singlestyle fashion, which does not enable them to capture the full benefits provided by these tool suites.New requirements emerging from the use of Hadoop distributions and inmemory DBMS, and theincreasing adoption of logical data warehouse architectures, also challenge the current data integrationstrategy of organizations and requires them to rethink the role of the data integration tool suite as partof their information infrastructures. These tools support the "Integration" and "Share" commoncapabilities described in Gartner's Information Capabilities Framework — a conceptual model for amodern information infrastructure.

User Advice: Organizations must recognize that data integration is a strategic discipline and thathaving the right set of technologies at the core of their information infrastructure is critical to success.There is value — in the form of leverage, reuse and consistency — in applying an integration tool suiteto a range of data integration problems.

Assess the range of data integration requirements in your business, and use your findings to measurevendors' breadth of coverage. Analyze the strategies and product road maps of data integration toolvendors to ensure they recognize the convergence trend and are taking steps (through organic productdevelopment, partnerships or acquisitions) to build a comprehensive data integration tool suite thataddresses multiple styles of data integration.

However, simply having a set of tools is not enough. Organizations should seek tight integration acrossthe components supporting the various styles (because this is the key to optimal value) and balance thevalue of tight integration against the completeness of each capability taken separately. They shouldoptimize the skills and resources used to drive data integration projects and potentially establish asharedservices structure for data integration.

In addition, as they evaluate vendors and capabilities in the data integration tools market, buyers mustbe aware of the ongoing convergence with the data quality tools market and ideally seek solutionsoffering both sets of capabilities in a complementary fashion. While toolsets offering comprehensive andwellintegrated functionality for both data integration and data quality will be readily available andincreasingly adopted in the future, we expect vendors to continue to sell the two sets of capabilitiesseparately, enabling customers to choose to deploy them in a modular and focused way.

Business Impact: Data integration tool suites with broad applicability will bring value to many types ofinitiative, including business intelligence (supporting batchoriented, eventdriven andfederated/virtualized delivery of data to business intelligence tools and applications), master datamanagement, system migration and consolidation activity, interenterprise data sharing, and thesynchronization of data between operational applications (potentially in real time through interactionwith message queues and the enterprise service bus). The cost savings, productivity improvements,quality enhancements and flexibility provided by these tools will bring substantial benefits to thoseorganizations adopting them.




Sample Vendors: Adeptia; IBM; Informatica; Information Builders; Oracle; SAP (BusinessObjects);SAS; Stone Bond Technologies



"The State of Data Integration: Current Practices and Evolving Trends"

SaaS Archiving of Messaging DataAnalysis By: Alan Dayley

Definition: SaaS archiving of messaging data includes email, instant messaging and social media.Compliance and regulatory requirements drive retention of messaging data, with hosted archivingincreasingly becoming the repository of choice. Capture of messaging content occurs at either the timeof creation or as it enters the organization's communications systems, where it can be stored onimmutable write once, read many (WORM) storage.

Position and Adoption Speed Justification: SaaS archiving solutions are mature. Many users findadministration tools for SaaS archiving solutions to be more userfriendly than those available from onpremises solutions. As the journaling feature is turned on in the email administration console, capture isas simple as pointing the journaled email to the hosted provider's site. Instant message archiving is asmature as email and is usually stored in an email format in the archive repository. Social mediaarchiving is newer, and capture is usually through APIs provided by the social media applications.Though social media data can be stored in an email format in the archive, the industry trend is to storeit in native format.



Unlike backup or disaster recovery as a service, archive users are less concerned about latency andmore about accurate capture of metadata and chainofcustody of data; therefore, the speed ofInternet connections is not a major concern. This, coupled with easytouse administrative andsupervision tools, has led many organizations to choose a hosted solution, enabling archiveexpenditures to shift to an operating expenditure (opex) model and away from a capital expenditure(capex) model.

As government and industry regulations proliferate, hosted archiving vendors have been nimble inquickly updating the compliance requirements of offered solutions. Most SaaS archiving vendors offerend users access to messaging data through either a search interface or, in some cases, a nativeapplication folder view. Basic ediscovery capabilities of hosted solutions receive high marks fromcustomers and are noted as another reason for adoption.

User Advice: Organizations in highly regulated industries will find hosted message archiving solutionsto be mature, secure and reliable enough to meet the most stringent requirements. Any organizationwith message archiving needs will find the hosted option easy to administer, priceattractive and anopportunity to optimize internal IT resources. Most organizations do not face internal or externalrequirements or regulations that require that data reside onpremises, so the willingness to considercloud revolves primarily around company culture regarding risk, security, data sovereignty and costs.

When considering a solution, focus on indexing, search and discovery capabilities to ensure needs aremet either within the offering or through integration with a thirdparty ediscovery vendor. Themigration of legacy email archives, including into and out of a hosted solution, can be expensive andshould be scoped during the selection phase. When determining the costs versus the benefits of SaaSarchiving, include soft expenses associated with an onpremises solution for personnel and ITinvolveddiscovery requests.

Business Impact: Organizations switch capex for opex costs when selecting a hosted archive solution.Pricing is typically based on a permailbox or peruser basis paid as a monthly subscription. ITdepartments are relieved of updating legacy onpremises archive systems when hardware and softwareneed to be refreshed. Compliance and legal personnel within organizations directly access the hostedsolution without IT involvement and more easily provide access to the hosted archive message data tooutside parties as required.




Sample Vendors: ArcMail; Bloomberg; Global Relay; Google; HP (Autonomy); Microsoft; Mimecast;Proofpoint; SilverSky; Smarsh; Sonian; Symantec


"Magic Quadrant for Enterprise Information Archiving"


"How to Determine Whether Your Organization Needs Website Archiving"

"Five Factors to Consider When Choosing Between Cloud and OnPremises Email Archiving Solutions"

Climbing the SlopeContent IntegrationAnalysis By: Gavin Tay

Definition: Content integration refers to the consolidation of enterprise content, typically dispersedthroughout enterprises in a myriad of repositories, into a single view. Integration tools may sit abovethese repositories as data integration middleware, or above workflow and business processmanagement systems, to provide a unified interface with federated content.

Position and Adoption Speed Justification: Content integration became obsolete before reachingthe Plateau of Productivity. This was largely because the longterm prospects for custom connectorswere limited, partly due to the difficulty of maintaining them and partly due to the emergence of Webservices and representational state transfer application programming interfaces (APIs).

Other integration options, such as the Java Specification Request (JSR) 170/283 standard, also did nottake off. Many of the enterprise content management (ECM) suites that used connectors were foldedinto their own products to integrate among solutions that vendors themselves acquired, and were nolonger offered for commercial use. Examples include IBM's Content Integrator, which federates contentwithin the IBM portfolio, and EntropySoft, which had original equipment manufacturer agreements withIBM and EMC (Documentum) but was acquired by salesforce.com.

Content integration continues to be hyped, however, because the vast majority of enterprises continueto have multiple content repositories — increasingly with the combination of onpremises ECM suitesand cloud content repositories such as enterprise file synchronization and sharing (EFSS) solutions.EFSS vendors have brought about a resurgence and continued use of Content ManagementInteroperability Services (CMIS) or proprietary connectors, as a major advantage of amalgamatingmultiple repositories for the purpose of mobile access to enterprise content.



There is a potential impact from CMIS, the most important industrysponsored standard, having gainedthe support of IBM and Alfresco but also emerging with most of the other major ECM vendors or EFSSproviders. Many enterprises are also considering using user experience platforms (UXPs), portals andfederated or contextual search as options for the virtual consolidation of frequently used content atdifferent levels of abstraction.

User Advice: Enterprises should pick content management vendors that have standardized and easilyaccessible repositories. Longer term, the focus should be on CMIS version 1.1, which was approved as astandard in December 2012 by the Organization for the Advancement of Structured InformationStandards (OASIS). As with all standards in their infancy, it will take a while before all vendors becomecompliant.

The preliminary aim with CMIS is to provide information sharing across CMISenabled repositories, butthe value may ultimately increase by allowing those repositories to coexist — even as they feed searchengines, portals and UXP applications with more information at lower cost and with less complexity.One immediate benefit may be a single view into content repositories via a CMISenabled "contentclient" that is richer than what has typically been delivered by ECM vendors. Mobileenabled CMISapplications or browsers have not gained as much traction even as organizations look to bring theircontent and connectivity out into the field.

Enterprises must look beyond both JSR 170/283 and Web Distributed Authoring and Versioning(WebDAV), which either did not bear fruit or are very old. Integration architectures from vendors suchas IBM, Oracle (Context Media) and Adobe, and thirdparty offerings such as those of TSystems'Vamosa have also become obsolete or defunct. Most system integration partners would also havetoolkits to connect the products they support with multiple repositories and business applications, suchas ECMG, which has built connectors.

Business Impact: Content integration technology was slated to have improved interoperabilitybetween a company's content, its contentcentric processes and related data. Despite this promise, theECM market underwent consolidation itself, so these tools became increasingly unnecessary. Many ofthe content integration solutions were subsequently acquired by these large ECM vendors, to fulfillinteroperability among their own solution offerings and those of the newly acquired solutions — but arenot being resold for use on their own.

Connecting content to structured data and to end users in a more engaging manner has taken over, buthas many implications for commercial applications. Content analytics is becoming an alternative tohardwired integration approaches. As a result, this will support both governance and costreductioninitiatives by optimizing information assets for availability.




Sample Vendors: Adobe; Alfresco; EMC; HP; IBM; Nuxeo; OpenText; Oracle


"New Information Use Cases Combine Analytics, Content Management and a Modern Approach toInformation Infrastructure"

"The Emerging User Experience Platform"

MDM of Customer DataAnalysis By: Bill O'Kane; Saul Judah

Definition: Master data management (MDM) of customer data enables business and IT organizationsto ensure the uniformity, accuracy, stewardship, governance, semantic consistency and accountabilityof an enterprise's official shared customer master data assets. Such implementations enable theauthoring of customer master data in workflow, batch or transactionoriented processes that conformto one or more MDM implementation styles (or a hybrid of those styles).

Position and Adoption Speed Justification: The market for packaged (as opposed to custombuilt orcoded) technology supporting the discipline of MDM of customer data has grown rapidly since thediscipline of customer data integration emerged in 2003. This supporting technology continues tomature slowly (just slightly exceeding 5% market penetration), with many vendors increasingly focusedon fleshing out facilities for data stewardship and data governance. They are also taking their first stepswith cloudbased offerings, integration with social networks, big data and mobile initiatives.Additionally, some more mature enterprises continue to report failures risked by attempts to overloadMDM environments with nonmaster data, or failures to operationalize data governance successfully.

Megavendors (IBM, Oracle, and SAP) see MDM of customer data as key to their overall vision forinformation management and as an enabler for applications and business processes supported bysystems such as those for CRM and ERP. These megavendors are successfully selling to their extensivecustomer bases, and they continue to command over 50% of the market for MDM of customer data. Inaddition to the megavendors and the ongoing market consolidation, there are large independentvendors, such as Informatica and Tibco Software, that are wellestablished in the market and othervendors, such as Information Builders, Software AG and Teradata, that maintain a presence. There arealso viable smaller vendors, such as Ataccama, Kalido, Orchestra Networks, Talend and VisionWare,and newer entrants like Pitney Bowes and Dell Boomi.



User Advice: Large and midsize organizations with heterogeneous IT portfolios containing customerdata fragmented across many systems should think in terms of implementing MDM of customer data ina style that integrates with established source systems and becomes the system of record for customermaster data. MDM of customer data programs typically focuses on improving business processes in theupstream, operational environment, but this can also have beneficial effects for the downstream,analytical environment. If you are looking to provide realtime, inprocess analytics (like businessactivity monitoring), MDM could potentially help improve that as well.

Success with an MDM program is not just about having the right technology. You also need to create aholistic, businessdriven MDM of customer data vision and strategy that focuses on key businessproblems. It is important to keep the longterm MDM vision in mind and to approach the individualprojects of an MDM of customer data program based on business priorities. An MDM of customer datastrategy should be part of a wider multivector MDM strategy. A multivector strategy adds additionalcapabilities to the multidomain approach — the ability to meet requirements spanning multiple usecases, implementation styles and industries, as well as any governance and organization modelssupporting MDM.

This may include multipledomain MDM (multiple, separate bestofbreed singledomainsolutions/vendors) or a tradeoff between a bestofbreed, purposebuilt singledomain offering and ageneralist multidomain MDM offering. An MDM program is also a key part of a commitment toenterprise information management, enabling greater enterprise agility and simplifying integrationactivities.

Evaluate MDM of customer data solutions based on a set of objective, balanced criteria, includingfacilities for data modeling, data quality, integration, data stewardship and governance, businessservices and workflow, measurement, and manageability. Additionally, consider any multivector MDM,cloud and social data interface capabilities that may be important now or in the future.

Business Impact: Trusted customer data and a single view of the customer are fundamental to thesuccess of a CRM strategy or any other customercentric strategy. The ability to identify customerscorrectly and to draw on a trusted, accurate and comprehensive single customer view in customercentric processes and interactions is valuable for marketing, sales and service, and other functions thatinteract with customers.

It can help organizations deliver the appropriate customer experience, crosssell (between products andmarkets), retain customers and execute endtoend processes in an efficient and effective manner. Itcan also help them manage risk and enable regulatory compliance. As we enter the era of socialnetworks and other big data, MDM of customer data is key to managing the links between thefragments of customer data in these new data sources, thereby enabling a trusted understanding ofcustomers' sentiment and behavior.




Sample Vendors: Ataccama; IBM; Informatica; Information Builders; Kalido; Oracle; OrchestraNetworks; SAP; SAS; Software AG; Talend; Teradata; Tibco Software; VisionWare


"Magic Quadrant for Master Data Management of Customer Data Solutions"



"The Impact of CloudBased Master Data Management Solutions"

MDM of Product DataAnalysis By: Andrew White; Bill O'Kane

Definition: Master data management (MDM) of product data enables the business and the ITorganization to ensure the uniformity, accuracy, stewardship, governance, semantic consistency andaccountability of an enterprise's official, shared product master data assets. Such implementationsenable the authoring of product master data in workflow, batch or transactionoriented processes thatconform with one or more MDM implementation styles.

Position and Adoption Speed Justification: More and more enduser organizations are eitherassuming, without any real ROI or business case, that they need to adopt MDM, or finding that theirinitial MDM efforts are failing to deliver the value they expect. This situation is typical for an importanttechnologyenabled effort stuck in the Trough of Disillusionment. MDM of product data is becomingmore recognized as a "good thing to do," but organizations are rushing to select technology beforedoing their "homework" — that is, formulating a business case, establishing governance anddetermining the necessary change management.

In terms of additional hype, 2014 has seen further emphasis on the linking of product data to socialdata that provides, for example, insight into consumer sentiment about a new product release.

Other big datarelated phenomena are playing out beyond the sphere of social data, as productinformation is once again central to many other heavily hyped technologies. These include technologies



in the mobile sphere, where consumers may, for example, compare products and services acrossdifferent channels when out shopping. Other examples are in the healthcare sector, where productinformation can help identify and even track medical devices and compounds. Additionally, public sectororganizations are adopting MDM of product data solutions (and other MDM solutions) in support ofopendata and opengovernance mandates.

Megavendors — IBM, SAP and Oracle — claim the largest share of this maturing market, but thereremain many smaller vendors that are continuing to grow. They include Riversand, Stibo Systems andTibco Software. The past year saw two important acquisitions. Informatica, a specialist in MDM ofcustomer data, acquired a third MDM offering with its purchase of Heiler Software, to bolster itsmultidomain credentials. SAP purchased hybris primarily for its ecommerce solutions. This acquisitionis likely to lessen the desire of hybris (though it still operates as a separate company) to invest heavilyin its own productcentric offering for other master data domains, given SAP's corporate strategy in thisarea.

User Advice: Large and midsize organizations with heterogeneous IT portfolios containing product datafragmented across many systems should think in terms of buying or building an MDM of product datasolution that integrates with established source systems and becomes the system of record for productmaster data:

Make MDM of product data part of your overall MDM strategy.

Focus on business outcomes when seeking to govern product master data.

Review your organization's capabilities and readiness, and compare your findings with current andpotential challenges.

Create a vision for what can be achieved.

Focus on key business problems and build a business case based on benefits.

Monitor vendor capabilities for multidomain and multivector MDM, as well as for informationstewardship. Evaluate vendors based on references, not hype.

Ensure MDM of product data systems have rich, tightknit facilities, including a comprehensivedata model, information quality tools, workflow engine and integration infrastructure.

Think big and act small, by delivering early and often.

An MDM of product data strategy needs to be part of a wider multivector MDM strategy. A multivectorstrategy adds to a multidomain approach the concept of meeting requirements that span other vendors— for example, use cases, implementation styles and industries, as well as governance/organizationmodels. This may include multidomain MDM — that is, two separate, bestofbreed, singledomainsolutions or vendors — or a tradeoff between a bestofbreed, purposebuilt, singledomain offeringand a generalist multidomain offering.

An MDM program is a key part of a commitment to enterprise information management. It helpsorganizations and business partners break down operational barriers, enabling greater enterprise agilityand simplifying integration activities.

Business Impact: Inconsistent product master data can be very costly, but for many organizations itrarely leads to bankruptcy or failure on its own. Such data will slowly eat away at an organization'sability to perform and achieve its goals. Eventually, so much of the IT organization's budget will bespent coping with bad data that there will be insufficient funding to support innovation.

By establishing the necessary governance and stewardship of product data, enterprisewide, a customeror consumeroriented strategy has a much greater chance of succeeding.

Effective implementation of an MDM of product data solution will help deliver a range of benefits:

Increased revenue through better upselling and crossselling, by knowing what products havebeen acquired.

Improved customer service through better management of product rules, and integration acrossmultiple channels.

Reduced time to market for new products.

More efficient business process optimization within ERP.

More effective product performance, analysis — that is, business intelligence — and resultingmarketing and product strategy, since analytics will use, and insight will derive from, globallyconsistent data.

Reduced costs.

Regulatory compliance, in industries where this is required.

Improved risk management.

MDM of product data impacts all business applications and intelligence data stores because itbecomes the centralized governance framework for all data stores.




Sample Vendors: Agility Multichannel; IBM; Informatica; Kalido; Oracle; Orchestra Networks;Riversand; SAP; Software AG; Stibo Systems; Talend; Teradata; Tibco Software




"Magic Quadrant for Master Data Management of Product Data Solutions"


"Consider Three Specific Scenarios for MDM of Product Data"

"Software Vendors That Augment Your MDM of Product Data Program"


Content MigrationAnalysis By: Gavin Tay

Definition: Content migration refers to the process of consolidating and transferring content, itsrelated metadata, user permissions, compound structure and linked components that are storedpermanently in one or more enterprise content management (ECM) repositories, to a new environment.During the migration process, enterprises typically choose to cleanse content management repositoriesby archiving old and outdated content.

Position and Adoption Speed Justification: As organizations reevaluate their existing ECMinvestments due to market consolidation and commence supporting "easy content management," bothdemand for and the complexity of content migration has increased sharply. Such complexities aretwofold: Old deployments increasingly employ hybrid content architectures featuring a linked repositoryof records across a myriad of content repositories, and new deployments in enterprise filesynchronization and sharing (EFSS) want to move content to the cloud or back to a regulated, onpremises environment.

Content migration to ECM alternatives uses connectors for the onetime, oneway transfer of largevolumes of content currently stored on file servers or in obsolete repositories — most often as part ofupgrades to Microsoft SharePoint. Migration tools are occasional and typically oneway bulk loaders,although they are increasingly becoming more granular in approach.

User Advice: Enterprises should pick content management vendors that have standardized and easilyaccessible repositories. At present, migration tools that support the movement of large volumes ofcontent from expensive or poorly managed network drives or endoflife repositories to newertechnologies, such as those in the cloud with EFSS, are the biggest story in the market. These can addimmediate value.

Evaluate the opportunity to employ inhouse IT expertise with the acquisition of such migration tools, orto hire a system integrator that would use its own migration frameworks. Using inhouse staff mayabsorb the cost involved, but external expertise should ease the rigorous efforts required. It is essentialto realize that investment in a migration tool will be shortlived once the migration has been completed,and that such expenditure would be better allocated to the improvement of evolving contentgovernance or an organizational taxonomy driven by business outcomes.

Business Impact: Content migration technology can accelerate a migration process overall, facilitatecontent availability, and provide a single view of a single source of truth for customer, employee ororganizational informational assets. This can be done by enabling volumes of stored content to movefrom legacy silos to more strategic — and more often consolidated — repositories. Of considerableinterest now is the pushing of information — and users — toward cloud platforms such as MicrosoftOffice 365/SharePoint Online, Google Apps, IBM SmartCloud or EFSS solutions. At the same time,hybrid content architectures are trying to manage the security risks associated with the integration ofpublic cloud or hosted repositories (as they have done with hosted email).

Content migration tools have improved significantly during the past year to cater for onpremisestocloud migration. However, the nonexistence of standards and uncertainty of service availability makecloudtocloud migrations complex — although few organizations are embarking on such an efforttoday. Highly regulated industries such as financial services and health sciences, which typically accountfor all their content, will benefit from a stringent, auditable, yet automated migration process usingcontent migration tools. Organizations that have conducted an inventory of all their content repositoriesand found that less than half their content is outdated or casual should not waste their IT budget onsuch purchases.


Market Penetration: More than 50% of target audience


Sample Vendors: AvePoint; Axceler; Casahl; Dell; Metalogix; MetaVis Technologies; TSystems(Vamosa); Tzunami


"Strategic Best Practices for SharePoint 2013 Migration"

"Five Ways to Ensure Your Taxonomy Initiative Yields the Desired Cost Savings"

"Redefine Microsoft's Role in Your Web Strategy as SharePoint Moves to the Cloud"



Enterprise Information ArchivingAnalysis By: Alan Dayley

Definition: Enterprise information archiving (EIA) solutions provide tools for capturing all or selecteddata in a distributed or centralized repository for efficient storage and access. EIA supports multipledata types (including email, file system, social media, Web, mobile and Microsoft SharePoint). Thesetools provide access to archived data via a stub or pointer or via browserbased access to the archive,and some manage the data in place. EIA tools support operational efficiency, compliance, retentionmanagement and ediscovery.

Position and Adoption Speed Justification: The number of vendors offering EIA solutions continuesto increase, with most offering functionality and deployment models appropriate for the markets theytarget. Market growth remains healthy, particularly as the utilization of archiving as contributingtechnology for compliance and ediscovery gains favor with organizations implementing informationgovernance programs. Archiving software as a service (SaaS) for messaging data, including email andsocial media, has gained significant traction as an alternative to onpremises deployments (and is nowgrowing at a faster pace).

Support for the capture and supervision of social media (for example, Twitter, Facebook and LinkedIn)has become a requirement in the regulated financial services industry (and is interesting to otherindustries). File system archiving as a component of enterprise information archiving is evolving with aneven stronger focus on storage management as unstructured data grows in volume.

Overall, enterprise information archiving products that support multiple content types are replacingapplicationspecific archiving solutions. Some companies are looking to replace their current archivingproducts with others (particularly as public cloud solutions gain traction), and more and more consultingcompanies are offering migration services. In addition, there is growing interest in managing thecompliance and retention of data "in place" instead of moving to a different repository.

Companies with large volumes of data and long retention periods overtax the system so that it mightnot be scalable or reliable, thus requiring improved index methods and, in some cases, majorarchitectural changes. The appetite for emailonly archiving solutions remains, but most organizationsare looking to vendors with existing solutions or a road map for enterprise information archivingproducts.

User Advice: As requirements to store, search and discover old data grow, companies shouldimplement an enterprise information archiving solution now, starting with email as the first managedcontent type. Mailbox size for onpremises email implementations continues to grow, creating bothstorage and compliance concerns. Many organizations are alternatively looking to migrate to cloudemail and productivity solutions such as those offered by Microsoft and Google, and when migrating,associated compliance and regulatory retention requirements need to be considered.

Consolidating archived data into regional repositories, a centralized repository or the cloud can supporta quick response to discovery requests and will facilitate a quick implementation of the organizationalretention policies. Migrating personal stores, such as PSTs, to the archive should be part of thedeployment of an email archive system.

Business Impact: Enterprise information archiving improves application performance, deliversimproved service to users, and enables a timely response to legal discovery and business requests forhistorical information. Archived data can be stored on less expensive storage, with the opportunity totake some data offline or delete it. Moving old data to an archive also reduces backup and recoverytimes.

Archiving is designed to keep the active data stores as small as possible, improve applicationperformance and reduce recovery times. Email remains the predominant content type archived as partof an enterprise information archiving implementation. In this case, the need for users to maintainpersonal stores is eliminated, and established stores can be migrated to the archive, leading to less riskassociated with loss or theft of devices housing these personal archives.

Archiving offered via SaaS is increasing in popularity because of the benefits associated with offloadinglowbusinessvalue tasks, such as the management of aging data, to a third party, as well as reducedcapital and operational expenses. SaaSbased message data archiving (namely email, but withincreased interest in social media and Web content) is leading the way because it is currently priced ona per user, per month (PUPM) basis with no storage overages. Over time, as cost structure andintegration issues are ironed out, look for more file system data and application data to be archived inthe cloud.

Enterprise information archiving has become an important part of ediscovery, providing functionalityidentified as part of the information management category of the Electronic Discovery Reference Model(EDRM). Features such as legal hold, retention management, search and export are used to meetdiscovery and compliance requirements. Supervision tools for sampling and reviewing messages (email,instant messages and, in some cases, social media content) are available with many enterpriseinformation archiving products in response to requirements specific to the regulated portion of thefinancial industry. To meet the requirements of mobile workers, enterprise information archiving offersa way for organizations to have the option of keeping data compliant within an archive while providingaccess to it via mobile devices.






Sample Vendors: ArcMail; Barracuda Networks; Bloomberg; C2C Systems; CommVault Systems;dataglobal; EMC; Global Relay; Google; Gwava; HPAutonomy; IBM; MessageSolution; MetalogixSoftware; Microsoft; Mimecast; OpenText; Proofpoint; SilverSky; Smarsh; Sonian; Symantec; ZLTechnologies


"Magic Quadrant for Enterprise Information Archiving"


"How to Determine Whether Your Organization Needs Website Archiving"

"Five Factors to Consider When Choosing Between Cloud and OnPremises Email Archiving Solutions"

Data Federation/VirtualizationAnalysis By: Ted Friedman

Definition: Data federation and virtualization technology is based on the execution of distributedqueries against multiple data sources, federation of query results into virtual views, and consumption ofthese views by applications, query/reporting tools or other infrastructure components. It can be used tocreate virtualized and integrated views of data in memory (rather than executing data movement andphysically storing integrated views in a target data structure) and provides a layer of abstraction abovethe physical implementation of data.

Position and Adoption Speed Justification: Data federation/virtualization is a specific style of dataintegration, supported by features from tools in the data integration tools market. This technology isnot new — its origins are in distributed relational query technology, developed two decades ago,combined with more contemporary serviceoriented virtualization capabilities. Architectural componentsof this technology include adapters to various data sources and a distributed query engine that acceptsqueries and provides results in a variety of ways (for example, as an SQL row set, XML, RESTful or Webservices interface). Data federation/virtualization technology has an active metadata repository at itscore that permits the logical mapping of physical data stores to a more applicationneutral model.

While still not reaching mainstream adoption, an increasing number of organizations are exploring anddeploying this technology. This is reflected in a steady, continued increase in the volume of Gartnerclient inquiries about data federation/virtualization concepts and providers of the technology. Inaddition, recent Gartner primary research studies for data integration tool deployments indicate thatusage is increasing. Implementations currently tend to be of limited scope, typically confined to a singletype of use case (such as business intelligence) and with modest numbers of data sources, applicationsand users. Various other use cases exist, including in operational processes and in support of serviceoriented architecture (SOA). Performance of federated views across large and diverse sources, as wellas limitations in the richness of transformation and tools capability, continues to create challenges inbroader and more complex deployments.

This style of data integration offers limited ability to deal with data quality issues across participatingsources. Deployments with broader use and greater scale are growing more common, but are generallyseen only in more aggressive organizations. An increasing use of this technology is in support ofprototyping and proofofconcept work (where requirements for integrated views of data are beingrefined), as well as for the creation of accessoriented data services as part of organizations' movementtoward SOA.

In the latter case, the abstraction properties of the technology are used to create semanticallystandardized views of data, independent of the underlying physical/syntactical representation, whichcan then be exposed directly to consuming applications or possibly provisioned via other infrastructuretechnologies, such as an enterprise service bus (ESB). This style of technology will increasingly be usedin combination with other types of integration tooling, such as extraction, transformation and loading(ETL) and ESB, to create more agile and loosely coupled data flows.

As new architectural approaches, such as registrystyle master data management implementations(where master data remains distributed and a single view is created in real time upon request) and thelogical data warehouse (where data assets for analysis are likewise distributed across multiplerepositories) grow in interest, data federation/virtualization will play a more prominent role in theinformation infrastructure.

User Advice: The potential of data federation/virtualization technology is compelling. In theory, it cancreate an abstraction layer for all applications and data, thereby achieving flexibility for change,pervasive and consistent data access, and greatly reduced costs because there is less need to createphysically integrated data structures. The result is greater agility from and freer access to anorganization's data assets. Among other benefits, this style of technology offers an opportunity fororganizations to change and optimize the manner in which data is physically persisted while not havingan impact on the applications and business processes above.

As with most introductions of nonmainstream technology in current market conditions, enterprises willlimit their deployments to projects where the technology solves an immediate tangible need and whererisks can be minimized. Common contemporary use cases include a virtualized approach to data martconsolidation or federated data warehouses, deployment of data access services in SOA, andcomposition of integrated "single views" of master data objects. Newer use cases include datamigrations, federation of data between cloud and onpremises environments, and "sandboxes" and datapreparation capabilities for analyticsfocused business roles.



Additionally, data federation technology can support enterprise mashup enablement as part of anorganization's efforts to embrace Weboriented architecture. Organizations need to carefully considerperformance implications, security concerns, availability requirements for data sources and data qualityissues during their design and deployment of federated views of data. Consider federation capabilitiesas useful components of an overall data integration technology portfolio, and seek ways in which it cancomplement or extend existing data integration architectures such as those suited to physical datamovement and persistence.

Business Impact: A variety of ways exist in which data federation/virtualization technology can addvalue as part of the data integration and sharing capabilities of information infrastructure. Most of theseopportunities involve augmenting the physically integrated data structures (for example, extending datawarehouses and data marts with realtime access to operational data) and providing consistent serviceoriented approaches for applications and business services to access data.

In particular, this style of technology will be used to deliver federated data services — services thatformulate an integrated view of data from multiple databases and enable this view to be accessed via aservice interface. The value of this technology will increase further as vendors expand their capabilities(for example, adding strong metadata management, caching and security management) and as enduser organizations work to deal with an information landscape that is growing more complex anddistributed, with data sources of varying degrees of structure, both internal to the enterprise and in thecloud.




Sample Vendors: Attunity; Cisco (Composite Software); Denodo Technologies; IBM; Informatica;Information Builders; Oracle; Red Hat; SAS; Stone Bond Technologies


"Harness Data Federation/Virtualization as Part of Your Enterprise's Comprehensive Data IntegrationStrategy"

"The Logical Data Warehouse Will Be a Key Scenario for Using Data Federation"

"Critical Capabilities: Data Delivery Styles for Data Integration Tools"


Data Quality ToolsAnalysis By: Ted Friedman

Definition: The term "data quality" refers to the process and technology for identifying and correctingflaws in data that supports operational business processes and decision making. Data quality tools arepackaged software that provides critical capabilities that enable an organization to address their dataquality issues and deliver fitforpurpose information to business consumers in the organization.Functionalities typically provided include: profiling, parsing, standardization, cleansing, matching,enrichment and monitoring.

Position and Adoption Speed Justification: The majority of organizations continue to neglect dataquality as a critical factor for successful business intelligence and analytics, data migration, CRM, B2Band many other informationintensive initiatives. And its impact on critical business processes ismassive — inefficiencies, risks, and loss of value are prevalent. Organizations with more matureinformation management processes are making the connection between accurate data and gooddecision making, process efficiencies, reduced risk and increased revenue. And contemporary interest inbig data initiatives even further thrusts data quality into the spotlight — and demands automatedsupport for data quality control given the volumes, complexity and speed of movement of the data ofinterest.

As a result, more organizations are beginning to focus strongly on data quality as a major component oftheir information governance work. Recent Gartner client interactions and surveys of IT leaders haveshown that data quality issues are considered significant inhibitors to gaining value from big datainvestments. In addition, the substantial interest and investment in master data management (MDM)initiatives, in which data quality is a fundamental requirement, further drives adoption of these tools.For all these reasons, the speed at which these tools are adopted continues to increase, and they willreach the Plateau of Productivity within two years. The data quality tools market is the fastest growingof all enterprise infrastructure software markets per Gartner's forecasts, with a projected CAGR ofnearly 16% through 2017.

User Advice: When evaluating offerings in this market, organizations must consider not only thebreadth of the functional capabilities (for example, data profiling, parsing, standardization, matching,monitoring and enrichment) relative to their requirements, but also the degree to which thisfunctionality can be readily understood, managed and exploited by business resources — rather thanjust IT resources. In addition, they should consider how readily an offering can be embedded intobusiness process workflows or other technologyenabled programs or initiatives, such as MDM andbusiness intelligence. In keeping with significant trends in data management, business roles such asthat of the data steward will increasingly be responsible for managing the goals, rules, processes andmetrics associated with data quality improvement initiatives.



Other key considerations include the degree of integration of this range of functional capabilities into asingle architecture and product, and the available deployment options (traditional onpremises softwaredeployment, hosted solutions and SaaS or cloudbased). Finally, given the current economic andmarket conditions, buyers must deeply analyze the nontechnology characteristics — such as pricingmodels and total cost — as well as the size, viability and partnerships of the vendors. Small and midsizeorganizations should seek application solution providers that offer data quality capabilities or arealigned with data quality tools providers in order to simplify and reduce the cost of embedded dataquality operations into the environment.

Business Impact: In a 2013 Gartner study on data quality usage and adoption (of nearly 400companies in various industries and geographic regions), participating organizations estimated theywere losing an average of $14.2 million annually as a result of data quality issues. The study resultsreflect a trend in the past few years of a growing awareness of data quality issues in all industries.

Organizations are increasingly identifying datarelated issues, including poor quality, as a root cause oftheir inability to optimize the performance of people and processes, manage risk and reduce costs. As aresult, they are more actively seeking best practices for data quality improvement, as evidenced by anincreasing volume of Gartner client inquiries on the topic. Among the best practices is the effective useof data quality tools to proactively measure, monitor and track data quality issues, as well as to provideautomated support for remediation of data quality flaws.




Sample Vendors: Datactics; DataMentors; Human Inference; IBM; Informatica; Innovative Systems;Microsoft; Oracle; Pitney Bowes; RedPoint Global; SAP BusinessObjects; SAS; Talend; TrilliumSoftware; Uniserv; X88



"Make Your Information Infrastructure GovernanceReady"

"Toolkit: RFP Template for Data Quality Tools"

"The State of Data Quality: Current Practices and Evolving Trends"

Information ExchangesAnalysis By: Andrew White

Definition: Information exchanges (IEs) and global data synchronization (GDS) synchronize masterand other data between businesstobusiness organizations through the use of a central sharedinformation "common" or authority model.

Position and Adoption Speed Justification: Organizations continue to struggle to align their owninformation supporting their interactions with trading partners. At a specific, pointtopoint level, thismight seem like a simple task, but when you consider the vast range and complexity of information(content as well as highly structured data) the number of data objects (which can be thousands), thenumber of trading partner pairs (millions), and the range of business requirements using thatinformation (increasingly real time), you can quickly realize how complex this can be.

In the early 2000s an effective IT solution was architected called the information exchange that at leasttackles the core needs of information synchronization. Several information exchanges have formed,mostly around specific industry sectors. Some instances are led by data and process industry standardssupport, such as healthcare information exchanges (HIEs) in the healthcare sector and the Global DataSynchronization Network (GDSN) in the consumer goods and retail sector. Others are less standardsbased and driven more by vendors seeking to support the same idea. In the chemicals sector, forexample, Elemica hosts product data exchanges for some of its customers.

In the healthcare sector, HIEs are used to distribute and share semantically aligned informationbetween independent healthcare organizations. In the consumer goods and retail sector, systemshosted by 1WorldSync manage and distribute a single definition of basic product data. These help toalign business process integrity — for example, promotion management, forecasting and replenishmentin the case of GDSN. This is achieved by the technology as well as supported by data standards. Thereare growing links between the two sectors, as GS1 continues to promote its product data standards tothe healthcare industry with the introduction of a unique device identification system intended to tagmedical devices.

Information exchanges and GDS technologies are categorized under one heading due to theirsimilarities, though there are differing industry nuances. The GDSN is more mature in some areas(where basic and core items attribute data synchronization by region) and immature in others (a lack ofglobal implementations or more complex data types). The GDSN is operational and acts as a "steppingstone" to the synchronization of more complex data, even if the more valuable use cases (such assynchronizing product price or rich product content, remain difficult to perform).

In the U.S., HIEs acting as patient registries have been operating for various healthcare value chains forseveral years under the strict regulatory control of the U.S. Health Insurance Portability andAccountability Act.



These technologies are on the Slope of Enlightenment collectively because they are relatively mature,even if specific industry implementations vary in terms of hype and position. Interest in them has beenpicking up, as has demand. In 2014, growing hype related to open data as well as cloud computingalternatives (including data as a service) are creating more hype for information exchange formats. Awholly new Technology Trigger will very likely replace this technology as the need to synchronize richer,unstructured content, consumerlevel information will increase.

User Advice:

Connect to regional data pools and information exchanges on the basis of clear customer demand,requests and mandates, and take account of the degree of partner maturity — informationexchanges require and offer a mature understanding for data integration technologies andautomation.

Evaluate your internal capability to publish and consume highquality master data and enricheddata, and to introduce an appropriately implemented master data management program as aprerequisite to achieving a "single view" of master data in your enterprise and among your supplychain partners.

If you are not demanding of your suppliers, or not being demanded to by customers, evaluatethese technologies as part of an overall corporate value chain information strategy.

Note that some vendors offer technology that is used behind a firewall to gather data and preparefor synchronization; some vendors move data between organizations; and other vendors provideservices to support the effort. A few vendors offer all these capabilities. Evaluate yourrequirements in terms of performance, price, availability and standards compliance.

Do not expect any information exchange to handle all your needs. New data element requirementsare inevitably ahead of the standards development process and must be dealt with usingadditional "out of network" technologies and processes.

Business Impact: Users have reported benefits from using information exchange and datasynchronization technologies. These remain targeted (when the business case is clearly identified, as inthe case of healthcare) and elusive (as in the case of the GDSN, where the benefits have yet to blossomin relation to syncing real and valuable information, such as product pricing or rich product content).

Depending on your industry and role in the supply chain, these technologies can yield the followingcommercial benefits:

Increased revenue by reducing instances of stock shortage (for example, right product, rightplace, right time).

Improved customer service (for example, healthcare quality).

Better collaboration and joint business planning with partners (for example, improved patientsafety).

Lower procurement and planning costs (for example, material procurement).

Reduced chargebacks for nonreconciled billing/invoicing.

Improved risk mitigation (for example, reduced fraud and misuse).

Shorter time to market for new product introductions.

Improved marketing and brand messaging.

Ability to meet compliance and regulatory reporting requirements.

The benefits will increase as multienterprise business processes grow in popularity, using datasynchronized by these technologies.




Sample Vendors: 1WorldSync; Covisint; Elemica; GHX; GS1; Harris Computer Systems; IBM;OpenText; Orion Health; RelayHealth


"Avoid the Pitfalls of HIE Selection by Using New Market Definitions"

"Best Practices: Checklist for Issues to Consider in Multienterprise Collaboration"

"Gain Value Sooner With a Clear Understanding of the Global Data Synchronization Road Map"

"Information Commons: Emerging Online Centers of Gravity and the Impact on Business Strategy andEnterprise Architecture"

Appendixes




Source: Gartner (July 2013)

Hype Cycle Phases, Benefit Ratings and Maturity Levels

Table 1. Hype Cycle Phases

Phase Definition

InnovationTrigger

A breakthrough, public demonstration, product launch or other event generates significantpress and industry interest.

Peak ofInflatedExpectations

During this phase of overenthusiasm and unrealistic projections, a flurry of wellpublicizedactivity by technology leaders results in some successes, but more failures, as the technologyis pushed to its limits. The only enterprises making money are conference organizers andmagazine publishers.

Trough ofDisillusionment

Because the technology does not live up to its overinflated expectations, it rapidly becomesunfashionable. Media interest wanes, except for a few cautionary tales.

Slope ofEnlightenment

Focused experimentation and solid hard work by an increasingly diverse range of organizationslead to a true understanding of the technology's applicability, risks and benefits. Commercialofftheshelf methodologies and tools ease the development process.

Plateau ofProductivity

The realworld benefits of the technology are demonstrated and accepted. Tools andmethodologies are increasingly stable as they enter their second and third generations.Growing numbers of organizations feel comfortable with the reduced level of risk; the rapidgrowth phase of adoption begins. Approximately 20% of the technology's target audience hasadopted or is adopting the technology as it enters this phase.

Years toMainstreamAdoption

The time required for the technology to reach the Plateau of Productivity.


Table 2. Benefit Ratings

Benefit Rating Definition

Transformational Enables new ways of doing business across industries that will result in major shifts inindustry dynamics

High Enables new ways of performing horizontal or vertical processes that will result insignificantly increased revenue or cost savings for an enterprise

Moderate Provides incremental improvements to established processes that will result in increasedrevenue or cost savings for an enterprise

Low Slightly improves processes (for example, improved user experience) that will be difficult totranslate into increased revenue or cost savings


Table 3. Maturity Levels

Maturity Level Status Products/Vendors



Embryonic In labs None

Emerging Commercialization by vendors

Pilots and deployments by industry leaders

First generation

High price

Much customization

Adolescent Maturing technology capabilities and processunderstanding

Uptake beyond early adopters

Second generation

Less customization

Early mainstream Proven technology

Vendors, technology and adoption rapidly evolving

Third generation

More out of box

Methodologies

Maturemainstream

Robust technology

Not much evolution in vendors or technology

Several dominant vendors

Legacy Not appropriate for new developments

Cost of migration constrains replacement

Maintenance revenuefocus

Obsolete Rarely used Used/resale market only


© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproducedor distributed in any form without Gartner’s prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines forGartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims allwarranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. Thispublication consists of the opinions of Gartner’s research organization and should not be construed as statements of fact. The opinions expressed herein are subject tochange without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its researchshould not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities coveredin Gartner research. Gartner’s Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its researchorganization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see“Guiding Principles on Independence and Objectivity.”

About Gartner | Careers | Newsroom | Policies | Site Index | IT Glossary | Contact Gartner