real world data warehouses

34
Data Warehousing - in the real world - Dr. Thomas Zurek November 2014 Big Data und Analytische Applikationen

Upload: ukc4

Post on 12-Aug-2015

850 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Real World Data Warehouses

Data Warehousing- in the real world -

Dr. Thomas ZurekNovember 2014

Big Data und Analytische Applikationen

Page 2: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Who am I ?

• Vice President of Development @ SAP for – Business Warehouse (BW)– Business Planning & Consolidation (BPC)– HANA Analytics

• 17 years at SAP• PhD in Computer Science• Universities of Karlsruhe and Edinburgh

2November 2014

Page 3: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Agenda

1. Examples 2. Business Intelligence (BI) + Data Warehouses (DW)3. Data Warehouses4. Layered Scalable Architecture (LSA)5. In-Memory Databases + Data Warehousing 6. Summary

3November 2014

Page 4: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 4

EXAMPLES

November 2014

Page 5: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 5

Examples of Business Intelligence Scenarios

• fraud detection- retail company- point-of-sales data & given discounts- huge amounts of data- a prototypical BI question

• long tail analysis- e-commerce companies like Amazon, Ebay, iTunes, Netflix, …- translate sales of popular products into (additional) sales in

the long tail- BI integrated into operational processes

November 2014

Page 6: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 6

Long Tail Analysis (1) – An Amazon Example

November 2014

Page 7: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 7

Long Tail Analysis (2)

Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html

November 2014

Page 8: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 8

Long Tail Analysis (3)

Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html

November 2014

Page 9: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 9

BUSINESS INTELLIGENCE +DATA WAREHOUSES

November 2014

Page 10: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 10

Business Intelligence and Data Warehouses

• Business Intelligence (BI)An environment in which business users conduct analyses that yield overall understanding of where

the business has been, where it is now, and where it will be in the near future (i.e. planning).

• Data Warehouse (DW)- An implementation of an informational database used to collect,

integrate and provide sharable data sourced from multiple operational databases for analyses.

- Provide data that is reliable, consistent, understandable.- It typically serves as the foundation for a business intelligence system.

November 2014

Page 11: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses

11

Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,

pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …

Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,

transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …

OperationalSystem

ERP, CRM, SCM, HR, …

Meta

Data

se

curit

y, m

odel

s, …

November 2014

Page 12: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses

12

OperationalSystem

ERP, CRM, SCM, HR, …

Meta

Data

se

curit

y, m

odel

s, …

simply remember:(1) BI and DW(2) BI ≠ DW

Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,

pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …

Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,

transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …

Focus today!

November 2014

Page 13: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 13

DATA WAREHOUSES

November 2014

Page 14: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Multiple Data Sources

Why are there so many DBs at an enterprise?• business processes data captured in some DB• organisation reflected in system landscape• geography reflected in system landscape• smaller systems easier to manage than big systems• mergers and acquisitions• external data: market data, supplier data, …• …

14November 2014

Page 15: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

A Typical Example for Business Processes in an Enterprise

15

source: http://thebankwatch.com/2006/09/13/simplifying-the-business-model/

November 2014

Page 16: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 16

Business transform

End-user access / Presentation

Provide data

Data Acquisition

Harmonization

Data Propagation

Reporting / Analyses / Planning

Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)

Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.

Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table

Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.

A Typical Data Warehouse Architecture

Corp.Memory

ODSBI Layer

Data Warehouse

Source 1 Source 2 Source 3 Source 4 Source 5

Proj

ect G

over

nanc

eIT

Gov

erna

nce

November 2014

Page 17: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Challenge 1: RELIABLE

• typical: data from 50-100 data sources• availability of data sources not given

– system downtimes– network failures– example:

• availability per data source = 98%• all 100 data sources available = 0.98**100 = 13%• 1 out of 100 data sources not available = 1 – 0.13 = 87%

all data in one place asserts reliable data access

17November 2014

Page 18: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Challenge 2: CONSISTENT

• Assume: each data source is consistent!• Is the union of all data sources consistent?

NO !

In a DW, data gets synchronised and harmonized to provide a consistent view spanning multiple data sources.

18November 2014

Page 19: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: Transformation, Cleansing

• Jun 1, 2011 = 1.6.2011 = 06/01/11 = …

• VW Touareg = VW TOUAREG = [product] 87654 = …

• currency and unit conversions:– box kg

– €, $, £, ¥, … €

• resolve ID clashes:product 123 [in subsiduary A] ≠ product 123 [in subsiduary B]

• enrich data:add attributes from source A to data from source B

19November 2014

Page 20: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: History / Time-Dependency

• data is time-dependent, e.g.– employee A worked in department X in 2012– employee A worked in department Y in 2013– currency exchange rates– current view vs historic view analysis

• versioning of meta data– models change– development test production– auditing

20November 2014

Page 21: Real World Data Warehouses

© SAP AG 2009. All rights reserved. / Page 21 Public

Automatisierte Überprüfung der Datenqualität in Form eines Plausibility Gates

Single Point of Truth

Quelle 1 Quelle 2 Quelle ... Quelle n

Fachliche Überprüfung der Daten verringern den Administrationsaufwand und den anschließenden „Ärger“

Harmonisierte Auswertungen

Plausibility Gate

UNSPSC-Code vorhanden?

RVO mit BVO-Bezug?

DUNS-Nummer vorhanden?

Größenordnung BVO/RVO?

real customer example

Page 22: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 22

Challenge 3: UNDERSTANDABLE

• texts for cryptic numbers• multi-language support• data provenance:

know where the data originated

• auditing: track changes• relevance:

show the user data from his "realm of command"

November 2014

Page 23: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 23

LAYERED SCALABALE ARCHITECTURE (LSA)

November 2014

Page 24: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 24

Business transform

End-user access / Presentation

Provide data

Data Acquisition

Harmonization

Data Propagation

Reporting / Analyses / Planning

Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)

Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.

Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table

Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.

A Typical Data Warehouse Architecture

Corp.Memory

ODSBI Layer

Data Warehouse

Source 1 Source 2 Source 3 Source 4 Source 5

Proj

ect G

over

nanc

eIT

Gov

erna

nce

November 2014

Page 25: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 25

Yet Another, Arbitrary Example …

Source: http://www.zentut.com/wp-content/uploads/2012/10/stand-alone-data-mart.jpg

November 2014

Page 26: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 26

The Layered Scalable Architecture (LSA)

• reference architecture for DW

• term introduced by SAP, but not SAP-specific

• layers:– each layer has a certain task

– each layer has an associated service-level

– layers describe the step-wise refinement of data

• not every DW needs all LSA-layers

• modern technology allows to remove / merge layers as less or no performance-motivated services are required

• more: http://tinyurl.com/sap-lsa

November 2014

Page 27: Real World Data Warehouses

27

LSA Reference Layers LS

A

Reporting Layer

Business Transformation LayerBusiness Transformation Layer

Operational D

ata StoreO

perational Data Store

Data Propagation LayerData Propagation Layer

Quality & Harmonisation LayerQuality & Harmonisation Layer

Corporate MemoryCorporate Memory

Data Acquisition LayerData Acquisition Layer

Virtualization Layer

1:1 from extraction,temporary

source system service level,long term, comprehensive, complete, master the unknown

create harmonised view, guarantee quality

EDW layers- application neutral- corporate owned - granular

BI Applications/Analytics Layers

digestible, integrated, unified data, ready to consume

apply business logic

reporting, analysis ready abstraction near real time, operational like

November 2014

Page 28: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 28

IN-MEMORY DATABASES +DATA WAREHOUSING

November 2014

Page 29: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 29

Why In-Memory Databases?

Type of Memory Size Latency (~)

L1 CPUCache 64K 1 ns

L2 CPUCache 256K 5 ns

L3 CPUCache 8M 20 ns

Main Memory

GBs up to TBs 100ns

Disk TBs >1.000.000 ns

need cache-conscious data-structures and algorithms ! SAP HANA is an example for an in-memory DBMS

(from 2011)

November 2014

Page 30: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 30

The Data Warehousing Quadrantda

ta v

olum

e

huge

modest

number of data models, sources, …modest huge

Very Large DW

Data Mart Enterprise DW

Big DW

November 2014

Page 31: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 31

The Data Warehousing Quadrantda

ta v

olum

e

huge

modest

number of data models, sources, …modest huge

internet scale business process(e.g. Ebay, Amazon, …) generatinghuge amounts of (sensor) data

fairly modest challenges regardingsemantics, consolidation, harmoni-zation, integration with other data

few data sources

mix of scenarios with small andlarge amounts of data

many (1000s to 10000s) of datamodels

many (100s) different datasources

data mart type of setup oroperational (OLTP) analytics

modest number of tables modest (need for) integrations

between data models

VLDW BDW

EDWData Mart

more scenarios more combinations of

scenarios

m

ore

gran

ular

dat

a

sens

or /

big

dat

a

mor

e sc

enar

ios

SAP HANA

SAP BW

November 2014

Page 32: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 32

SUMMARY

November 2014

Page 33: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 33

What You Should Take Away

1. Difference: BI vs DW

2. What are the problems that a DW handles?

3. How are those problems tackled?

November 2014

Page 34: Real World Data Warehouses

Real-World Data Warehouses / Thomas Zurek 34November 2014