evolving data warehouse architectures

21
Philip Russom April 15, 2014 Evolving Data Warehouse Architectures In the Age of Big Data

Upload: others

Post on 23-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolving Data Warehouse Architectures

Philip Russom April 15, 2014

Evolving Data Warehouse

Architectures

In the Age of Big Data

Page 2: Evolving Data Warehouse Architectures

TDWI would like to thank the following companies

for sponsoring the 2014 TDWI Best Practices research report:

Evolving Data Warehouse

Architectures

This presentation is based on the findings of that report.

STAY TUNED

At the end of this webinar, learn how to download a free copy of the report.

Page 3: Evolving Data Warehouse Architectures

Agenda

• Definitions of

Data Warehouse

Architectures

• Drivers of Change

• Benefits & Barriers

• From EDWs to DWEs

• Role of Hadoop

• Analytics versus Reporting

• Trends among Architectural

Components and Practices

• Top Ten Priorities

PLEASE TWEET @pRussom, #TDWI, #EDW,

#DataWarehouse, #DataArchitecture,

#Analytics, #Hadoop

Page 4: Evolving Data Warehouse Architectures

Upcoming

Points • There isn’t one, single

architecture for all data warehouses (DWs)

– Each org is different

• Expect multiple architectures

– A well-designed DW has multiple architectural layers

– Architectural approaches get mixed together into hybrids

– A DW architecture interacts with architectures for data integration, reporting, analytics, operational applications, etc.

• The warehouse is still vital, even central

– But it’s evolving into a multiple platform environment

– Architecture is more important than ever, but now as a logical design that’s deployed over multiple physical platforms

• Please don’t ask me to draw a Reference Architecture for DWs

– Given the current diversity, there isn’t just one. But I’ll describe many.

Page 5: Evolving Data Warehouse Architectures

What do you

think data

warehouse

architecture is? Select all that apply.

Source: TDWI survey run in late 2013.

Based on 1197 responses from 538

respondents. 2.2 responses per respondent,

on average.

Page 6: Evolving Data Warehouse Architectures

Logical versus Physical DW Architectures And Other Architectural Components that Coexist

• Logical architecture – mostly about data models

and their relationships, with a focus on how these

represent organizational entities and processes

– Data standards – including standards for data modeling,

data quality metrics, interfaces for data integration,

programming style, format standards, etc.

• Physical architecture – mostly a plan for deploying

data and data structures based on the workload and

platform requirements of each

– System architecture – a topology of hardware servers

and software servers, plus the interfaces and networks

that tie them together

Today’s

Focus

Page 7: Evolving Data Warehouse Architectures

Drivers of Change

Does your primary enterprise data warehouse

have an architectural design?

Yes 79%

No 18%

Don’t know 3%

Source: TDWI survey run in late 2013.

Based on 538 respondents.

Is the architecture of your data warehouse

environment evolving?

Yes – moderately 54%

Yes – dramatically 22%

No – except with DW updates 22%

Don’t know 2%

What technical issues or practices are driving

change in your DW architecture?

Advanced analytics 57%

Increasing data volumes 56%

Real-time operations 41%

Business performance mgt 38%

OLAP 30%

Non-relational data 25%

Virtualization of data 23%

Cloud adoption 21%

Streaming data 15%

What business issues or practices are driving

change in your DW architecture?

Competitiveness 45%

Fast-paced business processes 43%

Compliance 29%

Funding 29%

Sponsorship 26%

Reorganizations 25%

Centralizing business control 30%

Departmental power struggles 19%

Mergers and acquisitions 18%

Page 8: Evolving Data Warehouse Architectures

Benefits of Multi-Platform Architecture In priority order, based on survey responses

• All data analytics, in general (61%) – Many new platforms are built for analytics: DW appliances, columnar databases,

NoSQL databases, Hadoop.

– With a multi-platform portfolio, users can match an analytic workload to best platform.

• A diverse platform portfolio can handle a diverse range of data types. – This is key to embracing the unstructured and schema-free data types found in most

big data.

– Enables broad data exploration and discovery (43%)

• A more diverse platform portfolio can aid a business – Additional platforms are key to addressing new business requirements (36%),

especially data-oriented ones like analytics (61%), more numerous business insights (34%), business optimization (30%)…

• Handling data in real time usually requires an additional purpose-built system. – Traditional relational databases and batch-oriented Hadoop systems were not built

for real-time operations (33%), though many organizations need faster business processes (26%).

• Adding low-cost platforms to a DW environ makes big data more affordable. – DW appliances, columnar RDBMSs, Hadoop & NoSQL all lower cost for data staging

for data warehousing (20%) and data archiving (16%).

Source: TDWI survey run in late 2013.

Based on 538 respondents.

Page 9: Evolving Data Warehouse Architectures

Barriers to Multi-Platform Architecture In priority order, based on survey responses

• Inadequate staffing or skills (47%) is the most prominent barrier.

– Immaturity with new data types and sources (23%) – plus new technologies for

Hadoop, event processing, and so on – make them unprepared for the

complexity of multi-platform designs (25%).

• As usual, organizational and business issues should be settled first.

– Data ownership and other politics (43%), a lack of business sponsorship (38%),

a lack of a compelling business case (25%)

• A number of data management issues should be addressed.

– Data integration complexity (36%), poor data quality (34%), lack of data

architecture (29%), and data security, privacy, and governance issues (25%)

• As with any new IT initiative, proper funding is key.

– Account for the cost of acquiring multiple platforms (25%) and the cost of

administering multiple platforms (27%)

Source: TDWI survey run in late 2013.

Based on 538 respondents.

Page 10: Evolving Data Warehouse Architectures

WHY CAN’T A DATA WAREHOUSE DO EVERYTHING?

“Square Peg” Workloads may not fit

“Round Hole” DW Architectures

• Most data warehouses were designed and

optimized for common deliverables and methods:

– Standard reports, dashboards, performance mgt,

online analytic processing (OLAP)

– This is a design and architectural decision made by users, not a failing of

vendor platforms

• Can/should all DW & analytic workloads run on your EDW?

– If your EDW can handle multiple mixed concurrent workloads with

performance and without impeding other workloads, then run all workloads

(including analytics) on the EDW, for simplicity’s sake

– If not, you may need additional data platforms for some workloads

Page 11: Evolving Data Warehouse Architectures

Multi-Platform Data Warehouse Environments

• Many enterprise data warehouses (EDWs) are evolving into

multi-platform data warehouse environments (DWEs).

• Users continue to add additional standalone data platforms to

their warehouse tool and platform portfolio.

• The new platforms don’t replace the core warehouse, because

it is still the best platform for the data that goes into standards

reports, dashboards, performance management, and OLAP.

• Instead, the new platforms complement the warehouse,

because they are optimized for workloads that manage,

process, and analyze new forms of big data, non-structured

data, and real-time data.

Page 12: Evolving Data Warehouse Architectures

Ramifications of a Multi-Platform DW Environ

• Workload-centric DW architecture

– Assumes that some workloads and their data are best offloaded from the

core DW and taken to a platform more suited to them

– Workloads and data for advanced analytics (not OLAP), SQL-based

analytics, unstructured data, massive big data, real time

• Distributed DW architecture

– This simply means that data and data structures (as defined in a logical

architectural layer) are distributed across multiple physical data platforms

– Again, the logical layer is the “big picture” needed with many platforms

• A distributed DW architecture is both good and bad

– Good if it serves the unique requirements of multiple workloads and the

users that depend on them

– Bad if platforms proliferate like the dreaded data marts of yore

Page 13: Evolving Data Warehouse Architectures

Growing Complexity in DW System Architectures • The technology stack for DW, BI, analytics, and data integration

has always been a multi-platform environment.

• What’s new? The trend toward a portfolio of many data

platforms has accelerated.

Complex,

Event

Processing

Streaming

Data

Tools

Analytic

Sand

Box

Data

Federation

& Virtuali-

zation

DW

Appliance

Columnar

DBMS Columnar

DBMS

DW

Appliances

No-SQL

Database

Hadoop

Distributed

File Sys

Map

Reduce

No-SQL

Database

Hadoop

Distributed

File Sys

Star or

Snowflake

Scheme

Data

Warehouse

Federated

Data

Marts

Customer

Mart or

ODS

Metrics for

Performance

Mgt

Multi-

dimensional

Data Models

Federated

Data

Marts

Federated

Data

Marts

Customer

Mart or

ODS

Real

Time

ODS

Data

Staging

Areas

OLAP

Cubes

Detailed

Source

Data

Data

Staging

Areas

Data

Staging

Areas

Detailed

Source

Data

Detailed

Source

Data

OLAP

DBMSs

DW from a

Merger

Over The Passage of Time

Page 14: Evolving Data Warehouse Architectures

Which of the following best describes your

extended data warehouse environment today?

• Pure, central, monolithic EDWs are relatively rare (15%, far left)

• Likewise, environments without a DW are equally rare (15%, far right)

• EDWs mix well in hybrid environments (68%, middle three)

Other

(2%)

No true EDW, but

many workload-

specific data

platforms instead

Many workload-specific

data platforms; EDW is

present but not the center

Central EDW

with many

additional data

platforms

Central EDW with a few

additional data platforms

Central

monolithic EDW

with no other

data platforms

15% 15% 16% 37% 15%

ED

W D

WE

Source: TDWI survey run in late 2013.

Based on 538 respondents.

Page 15: Evolving Data Warehouse Architectures

Which of the following best describes your

organization’s strategy for evolving your DW

environment and its architecture, relative to big data? • Most survey respondents plan to extend an existing DW (41%, far left)

• Few will deploy new data platforms (25%)

• 29% have no strategy for DW evolution or addressing big data

Source: TDWI survey run in late 2013.

Based on 538 respondents.

41% 25% 23% 6%

Extend existing core DW to

accommodate big data and other

new requirements

Deploy new data

management systems

specifically for big data,

analytics, real time, etc.

No strategy for DW

architecture, though

we need one

No strategy for

DW architecture,

because we

don't need one

Other

(5%)

Page 16: Evolving Data Warehouse Architectures

Hadoop is a Useful Addition to DW Architectures IT COMPLEMENTS AND EXTENDS DATA WAREHOUSES

• HDFS extends DW Architectures

– Managing multi-structured data

– Repository for detailed source data

– Processing big data for analytics

– Advanced forms of algorithmic analytics

– Data staging on steroids

– ELT push-down processing

– Inexpensive compared to average DW

• Hadoop also contributes outside DWs

– Imagine HDFS as shared infrastructure,

similar to SAN & NAS

– Imagine a huge, live archive

– Imagine content mgt on steroids

Page 17: Evolving Data Warehouse Architectures

Reporting and Analytics have Different

Requirements for Data and DW Architecture

• Reporting is mostly about entities and facts you know well, represented by highly polished data that you know well.

• Carefully modeled and cleansed data with rich metadata and master data that’s managed in a data warehouse.

• Most users designed their DWs first and foremost as a repository for reporting and similar practices such as OLAP, performance management, dashboards, and operational BI.

• Advanced analytics enables the discovery of new facts you didn’t know, based on the exploration and analysis of data that’s probably new to you.

• Unlike the pristine data that reports operate on, advanced analytics works best with detailed source data in its original (even messy) form, using discovery oriented technologies, such as ad hoc queries, search, mining, statistics, predictive algorithms, and natural language processing.

Page 18: Evolving Data Warehouse Architectures

Commitment & Growth Components relative to DW Architecture

• Analytics is driving most adoption of new platforms & features.

– In-memory analytics (36%), analytic sandboxes (29%)

• Managing non-relational big data is also a pressing need for

many organizations.

– HDFS (34%), open-source MapReduce (32%), vendor-built

MapReduce (25%), NoSQL databases (24%)

• Real-time is just as important as analytics and big data.

– In-memory database (34%), in-database analytics (29%), solid-state

drives (25%), real-time data (24%)

• Relational technology is more relevant than ever, but in

updated forms.

– Columnar DBMSs (27%), DW appliances (23%)

Some components are poised for aggressive adoption by users.

Page 19: Evolving Data Warehouse Architectures

Top Ten Priorities for DW Architecture These are recommendations, requirements, or rules that can guide you.

1. Recognize that successful data warehouse architectures have integrated logical and physical layers, plus other components.

2. Determine the business and technical drivers in your organization, and let those determine the evolution of your DW architecture.

3. Beware that the leading barrier to successful DW architecture is inadequate staffing and skills.

4. Address other barriers for sponsorship, funding, and improvements to data management infrastructure.

5. Turn on unused features in existing platforms.

6. Establish DW architectures and standards, but be open to exceptions.

7. Be open to hybrids and alternate standards.

8. Consider Hadoop as a DW complement.

9. Remember that analytics and reporting have different data and DW architectural requirements.

10. Don’t expect the new stuff to replace the old stuff.

Page 20: Evolving Data Warehouse Architectures

Download a free copy

of the report that this

Webinar is based on

• Download the report in a

PDF file at:

tdwi.org/bpreports

• Feel free to distribute the

PDF file of any TDWI Best

Practices Report

EVOLVING DATA WAREHOUSE

ARCHITECTURES IN THE AGE

OF BIG DATA

Page 21: Evolving Data Warehouse Architectures

Philip Russom Research Director for Data Mgt

TDWI

[email protected]

www.bit.ly/PhilipRussom

@pRussom on Twitter

linkedin.com/in/philiprussom

Q & A