smart consolidation for smarter warehousing · white paper: smart consolidation for smarter...

29
Smart Consolidation for Smarter Warehousing A Key IBM Strategy for Data Warehousing and Analytics

Upload: others

Post on 12-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

Smart Consolidation for Smarter WarehousingA Key IBM Strategy for Data Warehousing and Analytics

Page 2: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

Written by: George Davies Jr., IBM Netezza Strategic Marketing, IBM Software Group, Information Management

Table of ContentsOverview ..........................................................................................................................go to 3

Introduction .....................................................................................................................go to 4

Executive Summary: Smart Consolidation in a Nutshell ............................................go to 4

The Centralized Enterprise Data Warehouse ...............................................................go to 5

Rethinking Data Warehousing and Analytics: The Logical Data Warehouse ...........go to 8 Smart Consolidation’s Guiding Principles .................................................................. go to 8 Foundational Technology Requirements .................................................................... go to 9 Next Steps.................................................................................................................. go to 9

Tenet One: Consolidate Infrastructure to Simplify Analytics ...................................go to 10 Technology Requirement.......................................................................................... go to 11 Technology Proof Points .......................................................................................... go to 11 Use Cases ................................................................................................................ go to 12

Tenet Two: Process Workloads on “Fit for Purpose” Nodes ....................................go to 13 Technology Requirements ........................................................................................ go to 15 Technology Proof Points .......................................................................................... go to 15 Use Cases ................................................................................................................ go to 16

Tenet Three: Coordinate Management and Data Across the Logical Warehouse .......go to 17 Technology Requirements ........................................................................................ go to 18 Technology Proof Points .......................................................................................... go to 18 Use Case .................................................................................................................. go to 19

Technology Requirements ...........................................................................................go to 20 Workload-Optimized Systems and Appliances ........................................................ go to 20 Seamless Data Flow ................................................................................................. go to 20 Data Virtualization and Query Redirection ............................................................... go to 21 Sophisticated Enterprise Data Management Tools .................................................. go to 22 Performance ............................................................................................................. go to 22

The Components of a Logical Data Warehouse ........................................................go to 23

Smart Consolidation Entry Points ...............................................................................go to 27

Conclusion ....................................................................................................................go to 28

Page 3: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

3White Paper: Smart Consolidation for Smarter Warehousing

Overview

High-Performance Analytics and the Logical Data Warehouse

Business intelligence is built on data warehouses. These systems commonly integrate data from internal data sources, which are often transactional systems that record an organization’s interaction with customers, prospective customers, suppliers, business partners, competitors and regulators. An enterprise data warehouse is commonly implemented as a centralized system: a single data-management technology running on a single computer system. This approach leads to very large databases—the warehouse is commonly an organization’s largest database.

Data warehouses are usually built on database management systems that were originally designed for processing online transactions, not online or offline analyses of data generated by web applications and social media. Online transaction processing systems typically manage much smaller data sets than those used in analytic processing, which suggests that a database management system optimized for processing online transactions is not always the best choice for analyzing data.

The quest for information—and competitive advantage—extends business intelligence beyond reports and dashboards to analytic applications that deliver predictions and insight to guide decision making. Simple reporting and advanced analytic applications create very different workloads, the former fulfilled by relatively straightforward read operations expressed in SQL, while analytic applications demand heavy computation from languages such as C, C++, Java and others.

New data sources (such as email, log files, sensor data, and user-generated content on social media websites) and new data types (such as images and audio), unconsidered at the inception of data warehouses in the 1990s, are now driving growth opportunities for business intelligence and analytics.

Page 4: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

4White Paper: Smart Consolidation for Smarter Warehousing

Introduction

Executive Summary: Smart Consolidation in a Nutshell

At its BI Summit on May 2, 2011, Gartner observed that the traditional enterprise data warehouse vision has, in general, not been achieved. These industry analysts refer instead to a logical data warehouse. In June, IBM announced a strategy, Smart Consolidation for Smarter Warehousing, that adopts Gartner’s terminology and recommends an evolutionary change in direction from a single, centralized physical system to a distributed architecture, where computation is provided by individual systems, with each node optimized for specific workloads.

In this paper, we elaborate on the Smart Consolidation strategy, emphasizing fresh use cases and proof points along the way.

As organizations grow their business intelligence portfolios, deploying analytic applications to derive greater value from their expanding data stores, their data warehouse systems assume ever more importance. These systems, originally conceived to support offline reporting and rudimentary analytic queries, are now expected to support enormous and growing data volumes, new unstructured and semi-structured data sources, and expanding communities of knowledge workers running a wide range of analytic workloads. For many organizations, the traditional model of a single, centralized enterprise data warehouse (EDW) system has become too rigid, unable to keep pace with modern demands for mixed workloads, extreme performance, and advanced analytic applications.

This paper describes an evolutionary strategy, a smarter vision for harvesting business advantage from large and varied data stores with advanced analytics: Smart Consolidation and the logical data warehouse.

Smart Consolidation’s Guiding Principles

1. Consolidate infrastructure to simplify analytics.

2. Process workloads on “fit for purpose” platforms.

3. Coordinate system management and data governance across the enterprise.

Smart Consolidation Technology Requirements

• Workload-optimized systems and appliances

• Seamless data flow

• Data virtualization and query redirection

• Sophisticated enterprise data management tools

• Performance

Page 5: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

5White Paper: Smart Consolidation for Smarter Warehousing

The Centralized Enterprise Data Warehouse

A Sample Road Map to Smart Consolidation

• Consolidate sprawling data marts by offloading analytics workloads from the EDW to workload-optimized systems.

• Introduce queryable archiving to provide cost-effective analytics on massive data sets.

• Accommodate new—particularly “Big Data”—sources into the analytic infrastructure.

• Consolidate enterprise data management and logical warehouse management.

Before expanding on these topics, let’s review the case for change.

Many large enterprises have adopted centralized enterprise data warehouses as their analytic infrastructure. While this model served reasonably well into the mid 2000s, the growth in data volume, variety, and complexity has combined with the exploding demand for analytics to severely limit the EDW’s utility as an enterprise solution.

Figure 1: The EDW as Originally Envisioned

Vision: All enterprise data storage, analytic and operational processing takes place in one central data warehouse.

Reality:• ManysingleEDWscannothandletoday’svolume,velocityandvariety,ofdataandworkloads.

• Lackofagility,increasinglatency.

• Businessneedsarenotbeingmet.

Data Sources

CRM

ERP

External Sources

DataIntegration Enterprise

Data Warehouse

Traditional Centralized

Page 6: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

6White Paper: Smart Consolidation for Smarter Warehousing

While this depiction of a comparatively bleak “reality” does not describe all installations—many organizations have performed quite nicely with a “monolithic” warehouse in place— the accelerating future of business analytics requirements, and their effects on the single-store vision, is not in doubt.

In 2010, Gartner estimated that over 70% of EDWs had performance issues (Data Warehouse Magic Quadrant, 2010). In a November 2010 global database survey, Forrester reported that 65% of enterprises found it difficult to deliver performance with their existing architectures. With these levels of performance dissatisfaction as a starting point, adding complex, high-volume unstructured “big data” handling, while satisfying the growing demand for still more sophisticated analytics, would appear to deliver an insurmountable challenge to the centralized EDW model.

Note: It is worth emphasizing that if your workloads and data stores can indeed be satisfied by a single, high-performance warehouse system, you have already achieved Smart Consolidation. That is, some organizations will find that they can satisfy all of Smart Consolidation’s guiding principles on a single system.

As data volumes, variety, and complexity continue to grow, and analytic workloads multiply, a single computer system, running a single database management system, may begin failing to meet expected service levels. Workload performance declines. When the centralized data warehouse does not deliver, line-of-business users take predictable action.

The typical first response to long-running workloads and underperforming queries is to tune and partition the system. These actions may work for a time—for selected workloads— but inevitably, they draw valuable technical staff into an endless cycle of warehouse care and feeding. It is hard to overstate the negative business impact of diverting highly skilled technical resources away from business-driving innovation and applying them instead to ineffective system maintenance.

When tuning fails to bring EDW performance in line, frustrated user communities react quite logically: they begin to extract data subsets and move them to secondary systems or data marts. But attempting to solve one problem creates multiple new ones: data silos limit enterprise-wide analytics, creating blind spots; governance becomes impossible; data extract-and-offload operations create additional load on the already teetering EDW; and costs and complexity escalate.

Page 7: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

7White Paper: Smart Consolidation for Smarter Warehousing

Figure2:TheReturnofDataMartSprawl

Asingle,centralizedEDWissimplyunabletohandletoday’svolumeandvarietyofdata. Linesofbusinessresorttoadhocsolutions,creatingdatamartsprawlresultingin:

• Limitationstoenterprise-wideanalyticsandvisibility;

• Alackoftruegovernance;

• IncreasedstrainontheEDW,shorteningitslifespan;

• Aninabilitytoscale;and

• Escalatingcostandcomplexity.

Ultimately, the complexity and cost of a single EDW outweigh the business benefits.

The spreadmart warehouse topology depicted above is too complex to administer, too reliant on tuning, too inefficient at analytics, and too costly to maintain. Data governance is impossible, analytic performance is unreliable, and analytic innovation is effectively derailed. In short, the costs of this installation have come to outweigh its value to the business. Unfortunately, at many sites, with large IT investments in play, this realization does not come quickly or easily.

The sprawl-makers’ ad hoc efforts at distributed processing are not entirely mistaken, but while a distributed approach does have merit, enterprise-level success requires some important architectural adjustments and investments in new software tools. We call the required tools and adjustments Smart Consolidation, and the resulting infrastructure a logicaldatawarehouse(LDW).

DataIntegration

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

EnterpriseData Warehouse

Traditional Centralized

Data Sources

CRM

ERP

External Sources

Page 8: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

8White Paper: Smart Consolidation for Smarter Warehousing

Figure3:EvolvingtoaLogicalDataWarehouse

Key Tenets

1. Consolidate infrastructure to simplify analytics.

2. Processworkloadson“fitforpurpose”platforms.

3. Coordinate system management and data governance across the enterprise.

A very nice picture, but how do we get there?

Smart Consolidation’s Guiding Principles

This paper suggests a way forward from a single physical system to a logical data warehouse, implemented as a distributed-computing infrastructure integrated by software utilities. This evolution is guided by three architectural tenets, which illuminate the evolutionary pathway, and suggest a roadmap for stepwise action:

1. Consolidate infrastructure to simplify analytics. Appliances and specialized systems reduce complexity by consolidating sprawling data marts into a small number of workload-optimized systems.

2.Processworkloadson“fitforpurpose”platforms.Computation is mapped to appliances and systems specifically designed for well understood workloads. These specialized systems offer optimal performance at affordable prices, their simplicity

EnterpriseData Warehouse

Traditional Centralized

Enterprise Data Warehouse

Evolution Logical

Rethinking Data Warehousing and Analytics: The Logical Data Warehouse

Page 9: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

9White Paper: Smart Consolidation for Smarter Warehousing

accelerates time-to-value, and their deployment frees the EDW to assume a more focused role as orchestration engine and data management hub.

3. Coordinate system management and data governance across the enterprise. Centralize data management, not data and compute resources. IBM’s industry-leading software portfolio of data management, governance, replication, and integration tools makes logical data warehouse management easy and affordable.

Foundational Technology Requirements

Each of these guiding principles implies several additional design principles, or technology requirements:

• Workload-OptimizedSystemsandAppliances—Consolidated infrastructure and distributed data/compute nodes demand high-performance, cost-effective processing platforms to handle the assigned workloads.

• SeamlessDataFlow—Data flows smoothly between warehouse nodes. Synchronization operations—replication, change data capture (CDC) updates, etc.—are integrated, automated, and reliable.

• DataVirtualizationandQueryRedirection—Transparent distribution of systems and data. Warehouse topology is invisible to business users and applications. Data virtualization and automated query redirection hide system complexity, and data is accessed through a discrete set of well-defined access points—browsers, application clients, and APIs.

• SophisticatedEnterpriseDataManagementTools—Data integration, data governance, and their related disciplines and sub-disciplines (master data management, changed data capture, data quality, data cleansing, and so on) require (a) an orchestration platform, and (b) enterprise-level applications with cross-system visibility.

• Performance—Everything has to be fast.

Next Steps

Later, we will describe a variety of entry points, or adoption strategies, for Smart Consolidation. Here is just one practical pathway, and the step order is not required:

Step:Consolidatesprawlingdatamarts,andoffloadanalyticworkloadsfromtheEDW to workload optimized systems. The EDW is now the central locus for data governance and metadata management. Use simple, effective tools for data flow planning, data movement, and governance. This step delivers high performance analytics and simplifies infrastructure and its management.

Page 10: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

10White Paper: Smart Consolidation for Smarter Warehousing

Step: Introduce queryable archiving to provide cost-effective analytics on massive data sets at an economical price point. IBM Netezza has announced the C1000 family of High Capacity Appliances, which minimize the per-terabyte cost to store large historical data sets, while keeping the data accessible for on-demand user queries.

Step:Accommodatenew—particularly“BigData”—sourcesintotheanalyticinfrastructure,using systems with real-time streaming analytic engines and Hadoop platforms to undertake pre-processing and initial analysis, and to ingest data into the logical warehouse. Data can then flow from those platforms to other analytics appliances and systems for further downstream processing.

Step: Consolidate enterprise data management and logical warehouse management, incorporating data integration, cleansing, governance, metadata management, and distribution of data flows to the appropriate analytical platforms. (This longer-term “step” is actually a series of steps that, while highly desirable, can be viewed as an ideal end state rather than a strict requirement.)

The remainder of this paper expands on Smart Consolidation’s guiding principles, technology requirements, processing nodes, and adoption strategies.

Because unsatisfied analytic demands drive the rise and proliferation of data marts, consolidating this portion of your architecture may provide the quickest, highest value for business users. Appliances and specialized systems reduce complexity by consolidating sprawling marts to a small number of workload-optimized systems. That is, offload analytics from the EDW to an environment optimized for analytics—the data warehouse appliance.

Figure4:EvolvingtoaLogicalDataWarehouse:ConsolidateInfrastructure

Smart Consolidation is an evolutionary strategy, not a disruptive one, particularly for clients who have already builtEDW-centricarchitectures.

Consolidateinfrastructurewithpurpose-builtappliancesandsystems.

• Reducedatamartsprawl.

• OffloadanalyticsfromtheEDWtoappliancesoptimizedforperformance.

• Achievetruedatagovernance.

• ReducestressonEDW.

• Lowertotalcostofownership.

• Simplifyqueriesandanalyticsagainsthistoricaldata.

(figure continued on following page)

Tenet One: Consolidate Infrastructure to Simplify Analytics

Page 11: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

11White Paper: Smart Consolidation for Smarter Warehousing

By consolidating a sprawl of ungovernable data marts into far fewer purpose-built analytic appliances, IT teams can deliver the best price-performance for analytical queries, while streamlining administrative effort. This frees valuable technical staff to develop and deploy new BI and analytics applications.

Technology Requirement

• Cost-effective, high-performance, workload-optimized systems and appliances

Technology Proof Points

• IBMNetezzaDataWarehouseAppliances—IBM Netezza data warehouse appliances are well suited to the Smart Consolidation model. Each appliance is purpose-built for advanced analytics—the ultimate workload optimized system.

• IBMSmartAnalyticsSystemforOperationalIntelligence—IBM Smart Analytics Systems are designed for warehouses that support mixed workloads, including analytics and operational decision support. Starting at under $50K—including integrated Cognos reporting—Smart Analytics System models are available on Power Systems, System z, or System x.

DataIntegration

Data Sources

CRM

ERP

External Sources

EnterpriseData Warehouse

Traditional Centralized

OperationalAnalytics

BI + Ad HocAnalytics

Data Governance, Security + Lifecycle Management

QueryableArchive

Page 12: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

12White Paper: Smart Consolidation for Smarter Warehousing

• IBMDB2AnalyticsAccelerator(IDAA)—The System z mainframe accelerator comprises one or more IBM Netezza data warehouse appliances, to which (a) selected data are synchronized, and (b) deep analytic queries are redirected automatically, and invisibly to end users. Users have no direct interaction with the IBM Netezza node(s). Data synchronization and query redirection are fully automated. Installation and deployment are quick and easy, requiring no professional services.

No other vendor can bring the same level of directed hardware innovation or the same software connectivity portfolio to its logical warehouse vision.

Use Cases

• Banking—A major U.S. bank deployed the IBM Smart Analytics System for credit risk analysis, operational BI, custom reporting, and data cleansing and management— cost-effective, high-performance support for multiple workloads.

• Healthcare—A large healthcare alliance must capture, integrate, manage, and analyze diverse high-volume data sources—clinical, financial, and operational—and share the resulting data stores and analysis with clients, partners, and practitioners. The customer uses a variety of IBM software and systems. Key components include IBM Smart Analytics and DB2 Data Warehouse Edition software for data integration, consolidation, and operational BI, and IBM Netezza data warehouse appliances for complex analytic workloads—an excellent example of consolidating workloads on “fit for purpose” processing nodes.

• FinancialServices—A financial institution must calculate value-at-risk for an equity options desk. The IBM Netezza platform was able to run a Monte Carlo simulation on 200,000 positions with 1,000 underlying stocks (2.5 billion simulations) in under three minutes. Leveraging an in-database analytics approach allowed the financial institution to analyze the data where it resides, rather than build a parallel data-processing platform to run the simulation. Faster query response time—and eliminating the time required to move data between two platforms—allowed the company to add variables to investment strategy simulations, and to run the risk analysis more frequently.

• Yale/FINRALimitRulesBacktesting—Yale researchers evaluated an IBM Netezza data warehouse appliance against cloud-based (Amazon EC2) data storage while analyzing 24 billion historical stock transactions. Bringing computation to the data with IBM Netezza In-Database Analytics yielded a 43% performance gain over the cloud-based solution—with no system tuning.

Page 13: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

13White Paper: Smart Consolidation for Smarter Warehousing

Data and compute resources are assigned to appliances and other systems specifically designed for well-understood workloads. These specialized systems offer optimal performance at affordable prices, their simplicity accelerates time-to-value, and their deployment frees the EDW to assume a more focused role as orchestration engine and data management hub.

Unburdened of the analytic processing for which it was not designed, the central data warehouse regains computational resources that can be focused on operational and orchestration activities, including data integration and data quality oversight. Eventually, this shift sees the enterprise data warehouse evolving into a new role as the enterprise data hub, mediating data flow, coordinating data integration, and distributing data to the appropriate analytics engines with a simple, appliance-based approach. Centralizing data management reduces complexity and costs and simplifies the pursuit of rigorous data governance.

Further opportunities to offload data management and computation include an appliance for operational analytics; an appliance as a queryable archive; a stream-processing system for real-time analysis of data on-the-wire (for example, from digital sensors or an email network); and a grid running the Hadoop Distributed File System for analyses of big data such as web-click streams and call-detail records. The logical data warehouse is a dynamic system: nodes may come and go, and inter-node connections permit the results of big data analyses to be moved downstream to analytic appliances for deeper, more advanced analytic processing.

Obvious first steps include offloading existing analytic workloads, absorbing rogue data marts back into the logical warehouse, and deploying a queryable archive. At many sites, however, there is a new strategic priority: Move to accommodate new (predominantly Big Data) sources into the analytic infrastructure, using systems with real-time streaming analytic engines, time series data processing, and Hadoop platforms to undertake preprocessing and initial analysis, and to ingest data into the logical warehouse. Data can then flow from those platforms to other analytics appliances and systems for further downstream processing. Data from new sources may, by design, pass initially through very different data integration and governance filters. Once data flows beyond the ingestion platform, the EDW applies governance, data flow, and life cycle rules. As new data paradigms and analytic platforms emerge, they, too, can be similarly integrated.

Tenet Two: Process Workloads on “Fit for Purpose” Nodes

Page 14: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

14White Paper: Smart Consolidation for Smarter Warehousing

Figure5:EvolvingtoaLogicalDataWarehouse:DistributeDataandComputation

DistributedataandcomputetotheLDWnodethatbestmeetstherequirementsoftheapplication orworkload—price-performance,availability,datasensitivity,etc.

Agile architecture for introducing new data types and analytic models:

• OffloaddataandanalyticsfromtheEDWtoworkload-optimizednodes

• Extendthedatawarehousebyaddingbigdataandreal-timeanalyticprocessing

• Addanalyticsfornewdatatypes

EnterpriseData Warehouse

Traditional Centralized

DataIntegration

OperationalAnalytics

BI + Ad HocAnalytics

NewData Sources

TraditionalData Sources

Data Governance, Security + Lifecycle Management

Big DataProcessing

Real-timeAnalytics

Time Series Processing

Internet/Social Media

Sensor + Meter Data

Event Data

QueryableArchive

CRM, ERP,External Sources

Page 15: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

15White Paper: Smart Consolidation for Smarter Warehousing

First, identify workloads that can be isolated and offloaded cleanly—analytic applications against discrete data sets make good candidates, as do the Big Data workloads mentioned above. Whether offloaded from the central warehouse (structured data analytics) or deployed on purpose-built nodes before ever having been deployed on the central warehouse (Big Data and queryable archive workloads), the key benefit is that these workloads can now run on more efficient, less expensive platforms than the subset of EDW resources that would be required to serve these workloads.

Technology Requirements

• Cost-effective workload-optimized systems and appliances

• Seamless data flow

• Data virtualization and query redirection

• Enterprise-level orchestration engine

Technology Proof Points

• IBMDB2AnalyticsAccelerator—The IBM Netezza accelerator for System z demonstrates automated inter-system data synchronization and query redirection. Advanced analytic workloads are offloaded from the mainframe to a fast, efficient appliance purpose-built for analytics.

• IBMInfoSphereBigInsights—IBM developed InfoSphere BigInsights to analyze unstructured data such as text, video, audio, and social media. The software, developed in part by IBM Research, is based on Hadoop and more than 50 IBM patents. IBM is also rolling out 20 new analytic services, including:

- Server and storage optimization tools for faster implementation time.

- Data center life cycle cost analysis to cut expenses and bolster sustainability efforts.

- Security analytics to automate handling of critical events.

IBM has also developed the Jaql query language and released it to the open source community. Jaql is a high level declarative language for processing both structured and nontraditional data. Its SQL-like interface means quick ramp-up for developers familiar with SQL and also makes it easier to operate on relational databases. Jaql is highly extensible, and IBM has used this capability to include pre-built Jaql modules in BigInsights that enable integration with IBM Netezza data warehouse appliances and text analytics systems such as IBM Content Analytics.

• IBMInfoSphereStreams—InfoSphere Streams is designed for real-time capture and analysis of unstructured data streams such as Tweets, video, sensor data, stock market data, blog posts, video frames, EKGs, and GPS data. Software advancements have improved previous-generation performance by roughly 350%. Note that BigInsights complements Streams by applying analytics to massive historical data at some time after the first-level, real-time analysis performed by Streams.

Page 16: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

16White Paper: Smart Consolidation for Smarter Warehousing

• IBMInfoSphereWarehouse—The DB2-powered data warehouse software at the heart of the IBM Smart Analytics System is designed for warehouses that support mixed workloads, including analytics and operational decision support. It is available across a broad set of operating systems and hardware platforms to support a wide range of custom solutions.

• IBMNetezzaHighCapacityAppliance—Introduce queryable archiving to provide cost-effective analytics on massive data sets at an economical price point.

Use Cases

• Insurance—A global insurance company uses DB2 Analytics Accelerator to offload claims analysis queries from DB2 for z/OS to a tightly integrated IBM Netezza-based “accelerator” system. In addition to accelerating complex analytic query speeds, this transparent solution also reduces load and improves performance on the System z mainframe.

• Banking—A large international bank uses InfoSphere Streams to capture Internet and ATM data streams, while a high-performance central warehouse stores consolidated, integrated data for deep analytics to drive targeted marketing efforts and derive new insights from customer data, behavior, and credit card transactions.

• BlueKaiandIntuit:HadoopandIBMNetezza—Stream tens of billions of clickstream data points to a Hadoop grid to perform first-level analysis. Then forward selected data subset(s) to an IBM Netezza data warehouse appliance and run advanced analytics to decipher consumer behavior for ad targeting and market campaign optimizations.

• DepartmentofEnergy,PacificNorthwestNationalLaboratory—InfoSphere Streams grid captures high-speed, high-volume log streams and then discards “uninteresting” status messages before forwarding exceptions and other outliers to IBM Netezza for advanced analytics.

• BigInsightsandIBMNetezza—A large retailer performs web log analysis with BigInsights for site optimization, followed by segmentation analysis on per-customer clickstreams.

As expected, using Streams and/or BigInsights to process extreme data, and then uploading filtered data or data subsets to IBM Netezza data warehouse appliances and IBM Smart Analytics Systems for advanced analytics, is rapidly becoming a common use case. Note that stream-captured and other big data can also be supplemented or enriched before it is forwarded to an advanced analytics appliance. For example, by adding information to stock market ticks, such as industry class or interested trader lists, even more sophisticated analysis and fine-tuned alerting can be achieved. Furthermore, in real time, Streams can help determine which bits of the incoming data stream should be routed to longer-term database storage and subsequent analysis. As a result, gains in analytic insight are compounded by reduced storage and administration costs.

Page 17: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

17White Paper: Smart Consolidation for Smarter Warehousing

Crucially, this data warehouse requirement—to serve as a data management control point—does not fall away as we evolve from traditional EDW to logical data warehouse. Fortunately, IBM’s exceptional portfolio of software systems for enterprise management and data governance makes logical data warehouse management simple and cost-effective. These tools and policies manage metadata, provide data governance, distribute/replicate data, and manage the data life cycle, and they do so across heterogeneous systems, including those from multiple vendors.

Figure6:EvolvingtoaLogicalDataWarehouse:CoordinatedManagementandEnterpriseData

SmartConsolidationallowsclientstoevolvefromacomplexmonolithicarchitecturetoamoreagilelogicalanddecentralizedarchitecture.ComputationismanagedcentrallybutitisexecutedonLDWnodesoptimizedforspecificworkloads,whichimprovesperformance,governance,scalability,andagility.

Tenet Three: Coordinate Management and Data Across the Logical Warehouse

EnterpriseData

NewData Sources

TraditionalData Sources

Data Governance, Security + Lifecycle Management

Big DataProcessing

QueryableArchive

Real-timeAnalytics

Time Series Processing

OperationalAnalytics

Internet/Social Media

Sensor + Meter Data

Event Data

CRM, ERP,External Sources BI + Ad Hoc

Analytics

Page 18: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

18White Paper: Smart Consolidation for Smarter Warehousing

Rather than shackling data in a central repository in the name of governance, we want to centralize the management—not the distribution and processing—of enterprise data. The goal is to simplify the administration, provisioning, and scalability of the extended analytics infrastructure by centralizing those functions. Move data management into the hub, and move analytics out to workload-optimized nodes.

Technology Requirements

• Seamless data flow

• Enterprise-level orchestration engine

• Sophisticated enterprise data management tools

Technology Proof Points

• IBMInfoSphereChangeDataCapture(CDC)—InfoSphere CDC offers cross-system change data replication for focused data synchronization between disparate systems, and the supported platform list continues to grow.

• IBMNetezzaReplicationModule,IBMInfoSphereReplicationServer—More key pieces of the enterprise data puzzle. Reliable, full-scale replication options are required to meet high availability and disaster recovery standards, and also to enable queryable archiving, for which demand is rising dramatically.

• DataStage-BigInsightsIntegration—This combination streamlines bulk data integration between HDFS and a data warehouse system. Users can launch a BigInsights analytics job from the DataStage user console.

• IBMInfoSphereBlueprintDirector—Blueprint Director is positioned to take a larger role in logical data warehouse configuration.

Page 19: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

19White Paper: Smart Consolidation for Smarter Warehousing

Figure7:AConceptualViewofProposedLogicalDataWarehouseConfigurationTools

• OptimDataManagementandIBMNetezzaHighCapacityApplianceIntegration— Another of IBM’s enterprise-level software systems, optimized to manage the new IBM Netezza C1000 series of High Capacity Appliances.

Use Case

• InfoSphereCDC—A large financial institution synchronizes several million transactions per day between internal financial data systems and customer-facing query and transactional systems, in near real time. InfoSphere CDC performs real-time data integration automatically, with high availability and single-console setup and management.

IBM Netezza 1000

Blueprint Director

IBM CognosBusiness

Intelligence

Data Warehouse

ReplicationRules

AnalyticMart Analytics

IBM InfoSphereWarehouse

IBM ChangeData Capture

1 2 3 4

Page 20: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

20White Paper: Smart Consolidation for Smarter Warehousing

Previous sections referred to some core technology requirements associated with the Smart Consolidation model’s three main tenets. Here, we’ll describe these requirements in a little more detail. They provide a good basis for tracking and evaluating new and existing technologies, from IBM and other vendors, and IBM’s product portfolio is evolving rapidly in these critical areas. Note that these requirements overlap, as do the domains of the hardware, software, and networking components designed to address them.

Workload-Optimized Systems and Appliances

Consolidated infrastructure and distributed data/compute nodes demand high-performance, cost-effective processing platforms to handle the assigned workloads. This premise is self-evident, and IBM’s Netezza acquisition, workload-optimized Smart Analytics System designs, Informix TimeSeries, and InfoSphere Streams and BigInsights “Big Data” platforms all testify to full commitment in this domain.

Simplicity—To make adding a logical warehouse node simple and cost-effective, ease of use and roughly linear scaling become non-negotiable requirements. Weeks or months of tuning and load testing a new processing node diminish the value of a distributed system, where agility and adaptability should be primary benefits.

Seamless Data Flow

The requirement is fast, reliable data flow. This requirement applies to the various data load, synchronization, and distribution tasks, and to the orchestration hub itself—the various forms of “glue” that bind the logical data warehouse nodes into a coherent system. Data movement enablers and optimizations span the InfoSphere Software portfolio and include the BigInsights

IBM Netezza connector, DataStage ETL/ELT connectors, CDC, and the IBM Netezza Replication Module. These critical components exist today, all “under the same roof” at IBM, and they are being enhanced, refined, and recombined to serve the Smart Consolidation model. The latest DataStage IBM Netezza and DataStage BigInsights optimizations are prime examples, and others will follow.

Keep in mind that in a multi-node logical warehouse, data flow traffic may not be fully predictable, nor is it one-way. Some nodes may serve primarily as pre-analysis data sources or repositories, while others function primarily as offload targets for data to be analyzed. However, data synchronization or multi-level analysis may involve data flow into, out of, or around the hub. In some cases, this may include back-and-forth traffic, or variable node-to-node flow patterns driven by dynamic rules and policies, or by data virtualization and run-time query redirection. The newly announced IBM DB2 Analytics Accelerator exemplifies the seamless data movement principle, binding IBM System z and IBM Netezza data warehouse appliances transparently with a 10GbE Private Service Network.

Technology Requirements

Page 21: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

21White Paper: Smart Consolidation for Smarter Warehousing

MinimizingDataMovement—Although frictionless data flow is an important objective, “frictionless” is not possible, and data movement incurs a cost. In general, it is preferable to move the computation to the data, not the other way around, and each of our remaining technology requirements includes minimizing data movement as an unspoken benefit.

DataVirtualizationandQueryRedirection

The logical data warehouse presents a single, cohesive, logical face to business users and applications, which do not need to know where a particular datum physically resides. In particular, queries can be fired at a logical warehouse with no need to know which nodes are collaborating to provide the analytic service. The IBM DB2 Analytics Accelerator illustrates this concept: The DB2 for z/OS accelerator comprises one or more IBM Netezza data warehouse appliances, to which (a) selected data are synchronized, and (b) deep analytic queries are redirected automatically.

Figure8:IBMDB2AnalyticsAccelerator

Atwo-nodelogicalwarehousewithasinglemanagementinterface.

Use virtualization to optimize resource usage automatically, reducing costs and gaining new agility.

Consolidate the ever growing proliferation of data marts onto a single, easily managed platform.

Here, the result is a simple two-node logical warehouse with a single management interface, exemplifying the virtualization design principle: The logical warehouse structure—node topology, connectivity, data flow tools and services—is transparent to both applications and end users. Infrastructure complexity is hidden from business users and applications, who access data through well-defined access points: browsers, application clients, and APIs.

DataMart

DataMart

AnalyticSystem

AnalyticSystem

DataMart

DataMart

DataMart

DataMart

DataMart

DataMart

IBM DB2 Analytics

Accelerator

A single platform to manage

and administer

S I M P L I F Y

Page 22: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

22White Paper: Smart Consolidation for Smarter Warehousing

Virtualizing access to a distributed infrastructure frees data consumers from concerns about where data is managed and processed, with the advantage that queries can be redirected transparently when new computational nodes or appliances are added to the infrastructure.

Sophisticated Enterprise Data Management Tools

Data integration, data governance, and their related disciplines and sub-disciplines (master data management, changed data capture, data quality, data cleansing, and so on) require (a) a common control platform, and (b) enterprise-level applications with cross-system visibility.

A product portfolio that can supply this “glue” is critical. Business professionals queried about business analytics by Computerworld in 2009 identified “data integration with multiple source systems” and “data quality” as the top two challenges they have faced, or expect to face, in achieving successful business analytics.

Replication—In addition to its fundamental role in HA/DR scenarios, replication, in its several forms, addresses a special class of data management requirements that might be called synchronization. In a multi-node logical warehouse, moving and synchronizing data between nodes for multi-level analytics, master data management, and change data capture takes on special importance. Several new and existing IBM products address these challenges: IBM Netezza Replication Module and InfoSphere Replication Server for large-scale synchronization, InfoSphere CDC for smaller-scale synchronization, Blueprint Director for replication set-up, and so on.

DataIntegration—ETL/ELT data integration tools and platforms can present significant challenges, especially at large sites. ETL/ELT subsystems often include home-built scripts and protocols, which complicate tool and process migration efforts. As a result, migrating data integration workloads from pre-existing platforms to one or more LDW nodes is typically planned as a multi-phase effort with a conservative time window.

Performance

Performance, as always, underlies any requirements set in this space. We define performance to include BI and analytic query speed, which is paramount, but also simplicity/ease-of-use efficiencies, including fast time-to-value, minimized admin/tuning/maintenance, and innovative agility. Performance without these ease-of-use factors confers little advantage.

Page 23: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

23White Paper: Smart Consolidation for Smarter Warehousing

Think of a logical data warehouse as a set of data storage and/or processing nodes. The connective tissue, or “glue,” that binds these nodes takes the form of software, services, and networking hardware and software. Unlike a typical LAN or distributed computing grid, a logical warehouse is a profoundly heterogeneous entity that may include a very diverse set of nodes—likely from multiple vendors—running an extremely diverse set of tasks. Note that node types correspond quite closely with workload types.

A sampling of the workloads expected of a modern data warehouse:

The Components of a Logical Data Warehouse

Workload Type Description

ETL/ELT/Data Integration Data staging, bulk and trickle-feed data loading, ETL, ELT.

Data Governance Master data management (MDM), changed data capture (CDC), data quality (DQ), etc.

Operational Intelligence Low-latency, real-time query and Operational BI support; BI reporting and dashboard updating.

Complex Event Processing Real-time event processing for data compliance, data security, fraud detection, etc.

Analytics/Advanced Analytics Light-to-moderate or heavy decision support, data mining, complex in-database analytics.

Line of Business Marts/Warehouses

Data warehouse appliances for specific LoB applications— retail analytics, ERP, etc.

Big Data Processing InfoSphere BigInsights (Hadoop) grid to analyze massive unstructured data sets.

Real-Time Analytics (Big Data) InfoSphere Streams system for high-volume stream capture and analysis.

Time Series Data Processing Informix TimeSeries for optimized storage and processing of time series and time interval data.

Queryable Archiving High-capacity federated storage for data to which future or intermittent access is required.

Backup/Recovery High-capacity, write-only systems for non-queryable archiving and/or disaster recovery.

Exploration Sandbox Replicated data for use in data exploration and nonproduction analytics.

Test/Dev/Prototyping Nonproduction systems for application development, prototyping, and testing.

Short-Request/Transactional OLTP or other short-request query activity.

Page 24: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

24White Paper: Smart Consolidation for Smarter Warehousing

Figure9:LogicalDataWarehouseNodes

Many of these common node types have already been discussed, but Figure 9 inspires some additional observations:

ExistingEDW—If your current enterprise data warehouse is performing as desired, some capacity planning may be in order, but leave the system in place. If the required set of virtual warehouse nodes can be encapsulated in your current system, you have already deployed the superior solution. To repeat, Smart Consolidation is a flexible, evolutionary strategy, and site-specific data stores and workloads will determine how to proceed.

InfoSphereWarehouseEnterprise Data

IBM InfoSphere Streams

Real-time Analytics

IBM Smart Analytics SystemOperational Analytics

IBM Netezza High-Capacity Appliances

Queryable Archive

IBM Netezza 1000BI + Ad Hoc

Advanced Analytics

IBM InformixTimeSeries

Time Series Processing

IBM InfoSphere BigInsights

Big Data Processing

DataIntegration

EnterpriseReporting / BI

Data Mart

Test/DevPrototypeSandbox

Oracle/TeradataODS, Operational BI

IBM DB2 Analytics Accelerator

Transactional + Deep Analytics

Page 25: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

25White Paper: Smart Consolidation for Smarter Warehousing

BigData(BigInsights,Streams)Nodes—At present, text dominates current big data generation and consumption, making natural language processing the pre-eminent form of big data processing, which puts IBM’s Watson at the apex of unstructured data analysis. Also, note some of the node-to-node connections in Figure 9. A logical warehouse resembles a hub-and-spoke system, but inter-node relationships are more flexible, and the multi-layered analytics enabled by these node-to-node connections will drive analytic sophistication to heights not previously attained.

Finally, although the trend in structured data processing is toward permanent storage and access for all data, some extraordinary big data volumes preclude this simple solution for unstructured data. Filtering “interesting” data from high-speed streams is already common-place, and the concept of data value decay remains relevant. Data management policies (data security, life cycle management, etc.) will have to reflect the node-by-node variations in data value and longevity.

Figure 10: Data Value Decay

Notalldatahasequalvalue,orlongevity.The“businessvalue”curvesfordatatakeavarietyofshapes, with most trending downward over time.

Sandbox ClickstreamQuarterly Finance

Mission Critical

Online Survey

DataValue(1–10)OverTime(Months)

Months

10

9

8

7

6

5

4

3

2

1

0

1 2 3 ... 6 ... 12 ... 24 ... 60 ... 84 96

RelativeDataVa

lue

Page 26: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

26White Paper: Smart Consolidation for Smarter Warehousing

AdvancedAnalyticsNode—Like “Data Mart” or “Enterprise Reporting,” advanced analytics nodes can take many forms. Common examples include InfoSphere Warehouse software or an IBM Netezza data warehouse appliance running in-database analytics software—SPSS, IBM Netezza Analytics, FuzzyLogix, or SAS. Other workload-optimized analytics platforms include the IBM Content Analytics system, IBM Smart Analytics System, Solutions for Retail with IBM Netezza Customer Intelligence Appliance, Solutions for Communications with IBM Netezza Network Analytics Accelerator, and so on.

DependentorIndependentDataMarts—Smart Consolidation will never eliminate data marts completely. The goals are to consolidate the vast majority onto analytics appliances, and to pull the remainder into enterprise data management regimes.

QueryableArchive—IBM Netezza has announced the C1000 family of High Capacity Appliances. Deploying a high-capacity queryable archive yields these benefits:

• Lowers TCO by minimizing the per-terabyte cost to store large historical data sets— data to which access may be required, but with something less than real-time immediacy.

• Frees the EDW, or other sub-optimal warehouse nodes, from having to house historical or other less frequently accessed data, thereby increasing the performance and capacity of those nodes.

• Increases data accessibility for historical data and other less frequently queried data sets.

• Removes the write-only “data tomb” problem.

EnterpriseDataManagement/OrchestrationHub—In its idealized end state, a logical data warehouse includes one or more “master nodes” to coordinate data integration, node synchronization, system-wide monitoring and error reporting, and LDW topology. Note that this “hub” can incorporate a full-scale data warehouse, which may include operational analytics and mixed workload support, for example, or other virtual nodes from the list above. Ultimately, orchestration software running on such a system will be able to combine with replication and MDM tools to configure and manage key aspects of a logical data warehouse.

In principle, what we have labeled “enterprise data management” or “enterprise data hub” might take a variety of forms, from a large Smart Analytics System or custom InfoSphere Warehouse system to an Oracle Database or Teradata database system. As elsewhere in the Smart Consolidation model, flexibility prevails. However, there are some general guidelines: When evolving from a large existing EDW, consolidate data marts and offload analytics processing to workload-optimized appliances first, preserving the newly unburdened EDW as the focus of enterprise data management. When building a new logical data warehouse, IBM currently recommends an IBM Smart Analytics System or custom InfoSphere Warehouse system as the best price-performance host for IBM’s diverse data governance/data management/data integration software portfolio.

Page 27: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

27White Paper: Smart Consolidation for Smarter Warehousing

The Smart Consolidation model offers a flexible, stepwise adoption path. Each forward step confers substantial benefits, as does partial adoption. The logical warehouse can include one or more nodes from a diverse (and growing) set of node types, and new nodes can be added, or old ones removed, with minimal impact on applications. Together, these facts suggest numerous entry points on the path to Smart Consolidation, many of which will already look familiar:

• StaywithTraditionalEDW—A single, high-powered EDW remains an optimal solution for smaller clients. A node set of one remains the cleanest possible warehouse configuration, delivering optimal ease of use, minimal admin costs, fastest time-to-value, and lowest total cost. Nodes are added only as required. Like FEMA’s Incident Command System (ICS), whose full structure is required only by the largest incidents, Smart Consolidation is a flexible, extensible framework that can be scaled up or down to match installation size. Few emergencies are large enough to require every component of the ICS, but the framework is there, ready and waiting.

• BuildaNewLogicalWarehouse—The guidance in this paper should help you avoid the pitfalls of a large monolithic system. If your priorities include analytics, then no modern system, warehouse or otherwise, can credibly absorb the diverse set of workloads directed at any sizable analytics infrastructure.

• OffloadanOverburdenedInfoSphereWarehouse,IBMNetezza,orOtherEDW— For fast time-to-value, offload all advanced analytics processing to one or more modern analytic appliances. To synchronize an analytics node with the central EDW, consider IBM’s InfoSphere Change Data Capture (CDC) software, which is designed to simplify this task.

• UpgradeanExistingSystemzWarehousewithHigh-PerformanceAnalytics— Add the DB2 Analytics Accelerator (a fully-integrated IBM Netezza analytics appliance) to an IBM System z, creating what is, in effect, a two-node LDW.

• ConsolidateDataMartSprawl—Again, wherever possible, consolidate data marts onto one or more modern analytic appliances.

• AddaPurpose-BuiltAnalyticsAppliance,orAddAnalyticProcessingtoanExistingWarehouse—As reported in TDWI’s “Big Data Analytics” Best Practices Report in Q4-2011, 40% of survey respondents practice advanced analytics without big data. From the TDWI report: “Much of the action in big data analytics is at the department level,” and “Analytic applications are departmental by nature.”

• AddQueryableArchiving—Frequently, this step is taken to satisfy high availability and/or disaster recovery requirements, and/or data regulatory requirements. The advantages of massive, cost-effective storage that can be queried on demand are obvious to anyone familiar with traditional tape, cartridge, disk, or off-site archiving.

• Add“BigData”Processing—Add InfoSphere BigInsights/Hadoop, InfoSphere Streams, Informix TimeSeries, or IBM Content Analytics nodes. The Smart Consolidation model confers many benefits, but perhaps the most significant is that it helps clients quickly deploy or reposition analytic resources in response to new data sources, new processing technologies, or new business requirements.

Smart Consolidation Entry Points

Page 28: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

28White Paper: Smart Consolidation for Smarter Warehousing

• AddanODSorOperationalBINode(InfoSphereWarehouse/IBMSmartAnalyticsSystem)—The IBM Smart Analytics System integrates InfoSphere Warehouse, which is designed for mixed-workload/high-throughput environments such as those associated with operational BI deployments.

• SupportGeographicExpansion—Add one or more regional data warehousing and analytics centers, and coordinate data governance policies from the existing EDW.

In IBM’s Smart Consolidation for Smarter Warehousing strategy, a logical data warehouse replaces a monolithic system with a distributed computing architecture. Data governance (a “global” consideration) and analytic applications (often LoB-oriented) are isolated from each other as separate nodes in a grid-like ecosystem. This modular approach is smarter computing. Each task is provisioned with exactly the hardware, software, and application services it requires.

Many organizations have already found a way to succeed by moving analytics processing from an already overburdened central warehouse to data warehouse appliances. IBM’s Smart Consolidation strategy, designed around the logical data warehouse concept, builds on these successes, simplifying data management, automating replication, and adding governance controls. Separating data management from its exploitation accelerates time-to-value, reduces cost, and creates a flexible architecture to add new processing nodes that will run future analytic technologies as these become available.

Virtualizing access to this distributed infrastructure frees users from concerns about where data is managed and processed, with the advantage that queries can be redirected transparently as new computational nodes or appliances are added to the infrastructure.

Over the coming months, IBM will continue to share its evolutionary road map of products and features underlying Smart Consolidation. This strategy will allow our clients to maximize the value of their investments in data warehousing and analytics, while scaling to support new data types, higher data volumes, and more complex applications, all flexibly and with appliance simplicity.

Keep in mind that the flexible three-point call to action—consolidated infrastructure, distributed data/compute, and coordinated enterprise data management—need not be pursued in a particular order. Furthermore, logical warehouse nodes should be viewed as modular units that can be added, repurposed, repositioned, or removed as your analytic requirements evolve, or as the competitive landscape dictates.

Conclusion

Page 29: Smart Consolidation for Smarter Warehousing · White Paper: Smart Consolidation for Smarter Warehousing 3 Overview High-Performance Analytics and the Logical Data Warehouse Business

Figure11:IBM’sLogicalDataWarehouse

Extensible,modularlogicalwarehouseconstructionoffersaroadmaptocontinuedexpansion, including future support for new data and workload types.

Features

• Applicationandworkloadoptimizedappliancesandsystems

• Seamlessdatamovement

• Datagovernanceandlifecyclemanagement

• Continuouslyavailable

• Frameworkforintegratedmanagement

• Transparent,virtualizedaccessforendusers,viawell-definedaccesspoints(APIsandbrowserclients, forexample)

NewData Sources

TraditionalData Sources

Internet/Social Media

Sensor + Meter Data

Event Data

CRM, ERP,External Sources

InfoSphereWarehouseEnterprise Data

IBM InfoSphere BigInsights

Big Data Processing

IBM Netezza High-Capacity Appliances

Queryable Archive

IBM InfoSphere Streams

Real-time Analytics

IBM InformixTimeSeries

Time Series Processing

Data Governance, Security + Lifecycle Management

IBM Smart Analytics SystemOperational Analytics

IBM Netezza 1000BI + Ad Hoc

Analytics

Netezza Corporation26 Forest StreetMarlborough, MA 01752

+1 508 382 8200 TEL+1 508 382 8300 FAX www.netezza.com

© 2011 Netezza Corporation, an IBM Company. All rights reserved. All other company, brand and product names contained herein may be trademarks or registered trademarks of their respective holders.

About Netezza Corporation: Netezza, an IBM Company, is the global leader in data warehouse and analytic appliances that dramatically simplify high-performance analytics across an extended enterprise. Netezza’s technology enables organizations to process enormous amounts of captured data at exceptional speed, providing a significant competitive and operational advantage in today’s data-intensive industries including digital media, energy, financial services, government, health and life sciences, retail, and telecommunications. Netezza is headquartered in Marlborough, Massachusetts, and has offices in North America, Europe and the Asia Pacific region. For more information about Netezza, please visit www.netezza.com.