lecture @dhbw: data warehouse part lx: project …buckenhofer/20182dwh/...backend frontend external...

24
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT MANAGEMENT ANDREAS BUCKENHOFER, DAIMLER TSS

Upload: others

Post on 03-Jun-2020

6 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

PART LX: PROJECT MANAGEMENTANDREAS BUCKENHOFER, DAIMLER TSS

Page 2: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB [email protected]

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

Page 3: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH

Data Warehouse / DHBWDaimler TSS 3

“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”

Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.

I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.

I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.

DHBWDOAG

xing

Contact/Connect

Page 4: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS Data Warehouse / DHBW 4

Page 5: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 5Data Warehouse / DHBW

Page 6: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

6

Page 7: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

• After the end of this lecture you will be able to

• Understand lifecycle of DWH projects

WHAT YOU WILL LEARN TODAY

Data Warehouse / DHBWDaimler TSS 7

Page 8: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 8

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

Top Down (Inmon)

Bottom Up (Kimball)

Page 9: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Top-Down (Inmon)

• Comprehensive approach regarding available data

• Design Core Warehouse Layer = integrated data model first considering all requirements

• Design data marts afterwards

Bottom-Up (Kimball)

• Approach focusing on fast delivery of first results

• Design one data mart first

• Next Marts are modeled afterwards usually using Kimball architecture

• conformed dimensions to integrate different data marts / fact tables

TOP-DOWN VS BOTTOM-UP APPROACH

Data Warehouse / DHBWDaimler TSS 9

Page 10: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

TOP-DOWN VS BOTTOM-UP APPROACHADVANTAGES AND DISADVANTAGES

Data Warehouse / DHBWDaimler TSS 10

Top-Down (Inmon) Bottom-Up (Kimball)

☺ Core Warehouse Layer is designed optimal ☺ Early involvement of end users

☺ Data from Core Warehouse Layer is reused in many Marts

☺ Fast results

Time-consuming approach with high preparatory effort

Focus on single Marts leads to risk that overall view is lost, esp. properly designed Core Warehouse Layer

High risk with changing requirements Data often not reused but inconsistently copied across Marts

Page 11: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Both approaches have their down-sides

• Top-Down takes enormous initial effort to build data model for Core Warehouse Layer

• Bottom-Up is risky as central / integrated focus is lost

→Think big, start small

• Think Big: Design conceptual data model for Core Warehouse Layer covering whole enterprise

• Start small: Implement physical data model for Core and Mart Layer in iterations by each business department

THINK BIG, START LOCAL

Data Warehouse / DHBWDaimler TSS 11

Page 12: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

• DWH is not a product

• DWH databases are more complex with different layers and data models

• Data first, code is secondary

• Data quality is a major concern

• Data integration is a challenging objective

• Business need difficult to justify quantitatively

WHAT’S DIFFERENT IN DWH PROJECTS?

Data Warehouse / DHBWDaimler TSS 12

Page 13: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

WHY DO DWH PROJECTS FAIL?

Data Warehouse / DHBWDaimler TSS 13

Page 14: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

AGILITY IN THE DWH: CASE STUDY@BOSCH

Data Warehouse / DHBWDaimler TSS 14

Source: https://www.informatik-aktuell.de/management-und-recht/projektmanagement/eine-konkrete-geschichte-der-agilitaet-im-data-warehouse.html

Page 15: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Define 3-5 criteria for the evaluation of an ETL tool

How does a relational DBMS (like Oracle, DB2, MS SQL Server) meet these requirements?

EXERCISE

Data Warehouse / DHBWDaimler TSS 15

Page 16: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

• Supplier profile

• Support

• HW/SW requirements

• License / maintenance Costs

• Usability

• Reliability

• Performance and scalability

• Multi-tenant

• Interfaces

• Scheduling

EXERCISE - DEFINE 5 CRITERIA FOR THE EVALUATION OF AN ETL TOOL

Data Warehouse / DHBWDaimler TSS 16

Page 17: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

• RDBMS provide many of the functionalities but additional programming required

• RDBMS are often used for ETL/ELT by programming with SQL, PL/SQL, SQLT, etc

EXERCISE - HOW DOES A RELATIONAL DBMS MEET THESE REQUIREMENTS?

Data Warehouse / DHBWDaimler TSS 17

ETL Tool Manual ETL

Informatica, Talend, Oracle ODI, etc. SQL, PL/SQL, SQLT, etc.

Separate license No additional license

Workflow, error handling, and restart/recovery functionality included

Workflow, error handling, and restart/recovery functionality must be implemented manually

Impact analysis and where-used (lineage) functionality available

Impact analysis and where-used (lineage) functionality difficult

Faster development, easier maintenance Slower development, more difficult maintenance

Additional (Tool-) Know How required Know How often available

Page 18: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 18

THANK YOU

Page 19: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Feasibility study Analysis Design Implementation TestOperations and maintenance

PROJECT PHASESSMALL ITERATIONS INSTEAD OF LONG PHASES

Data Warehouse / DHBWDaimler TSS 19

Page 20: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Gartner:

…a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization. The goal of DataOps is to create predictable delivery and change management of data, data models and related artifacts. DataOps uses technology to automate data delivery with the appropriate levels of security, quality and metadata to improve the use and value of data in a dynamic environment.

DATA OPS

Data Warehouse / DHBWDaimler TSS 20

Source: https://blogs.gartner.com/nick-heudecker/hyping-dataops/

Page 21: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

HYPE CYCLE FOR DATA MANAGEMENT

Data Warehouse / DHBWDaimler TSS 21

Page 22: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

• DataOps is a new way of working and collaborating (same with DevOps)

• DataOps collaboration typically occurs between technical and non-technical staff compared to DevOps

• Language barrier between these two parties (e.g. skills mismatch)

• Therefore required is a core enabler like data literacy

• Data literacy is the ability to understand data, to build knowledge from data, and to communicate information/meaning to others

• DataOps can 't be achieved by buying tools

DATAOPS IS ABOUT ORGANIZATIONAL CHANGE

Data Warehouse / DHBWDaimler TSS 22

Source: https://blogs.gartner.com/nick-heudecker/hyping-dataops/

Page 23: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

Organizational team that coordinate and standardize DWH activities within an (end user) organization

• Define standards and create BI portfolio (e.g. which tools/products to use)

• Create DWH architecture and govern BI activities

• Establish processes for business and IT interaction

• Monitor DWH/BI market for new trends

• Determine skills and experience of Business users

BICC: BI CENTER OF EXCELLENCE

Data Warehouse / DHBWDaimler TSS 23

Page 24: LECTURE @DHBW: DATA WAREHOUSE PART LX: PROJECT …buckenhofer/20182DWH/...Backend Frontend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse

4-QUADRANT MODEL (RONALD DAMHOF)

Data Warehouse / DHBWDaimler TSS 24

Source: http://prudenza.typepad.com/files/english---the-data-quadrant-model-interview-ronald-damhof.pdf