2014 data vault reconnect event then & now ddvm

33
© 2014 Genesee Academy, LLC Data Modeling Data Vault Modeling Big Data Agile DW Ensemble Modeling Certification CDVDM Recertification Event Data Vault: Then & Now © 2014 Genesee Academy, LLC USA +1 303 526 0340 Sweden 072 736 8700 [email protected] www.GeneseeAcademy.com CDVDM ReConnect 2014 gohansgo

Upload: hans-hultgren

Post on 02-Dec-2014

116 views

Category:

Data & Analytics


1 download

DESCRIPTION

From the June 5 Dutch Data Vault Masters Event in Amsterdam. This CDVDM Reconnect / Recertification day included presentations from several certified data vault data modelers. This particular presentation was part of a discussion on "then and now" for data vault in the Netherlands.

TRANSCRIPT

Page 1: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Data Modeling Data Vault Modeling Big Data Agile DW Ensemble Modeling Certification

CDVDM Recertification Event Data Vault: Then & Now

© 2014 Genesee Academy, LLC USA +1 303 526 0340 Sweden 072 736 8700 [email protected] www.GeneseeAcademy.com

CDVDM ReConnect 2014 gohansgo

Page 2: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

2

CDVDM ReConnect Event

Page 3: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

3

Page 4: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Then & Now Presentation Agenda

• Looking Back & Progress • Colors and Reverse Engineering • Business Oriented Modeling • Effective Dates • Architecture Revisited • Link Unique Specific Natural • Thinking Differently • Modeling Address • Sourcing the Data Vault • The L:L:L constructs • Automation

Mini-Topics for 5x5 Updates

• Ensemble Modeling • Core Business Concepts • The Business Key • Unit of Work & Possessive • Raw versus Business • Link & Why its not an Event • Satellite & Why its not MV • Big Data & Unstructured • Successful Agile DV DW • Industry Reference Models • Ensemble Forms

4

AGENDA ITEMS

Page 5: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

5

Then and Now…

2007 * 2008 * 2009 * 2010 * 2011 * 2012 * 2013 * 2014

Page 6: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Genesee Academy Activities

6

Seminars

Advising

Online

Conferences

Page 7: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Genesee Academy Activities

38%

29%

17%

14%

GA Activities

SeminarsAdvisingOnlineConferences

7

Genesee Academy, LLC – World Class Training

• Seminars – 1-4 day, on-location & in-company courses. – Certifications issued by GA. – Blended (hybrid) Pedagogy.

• Advising – DWBI Programs, Modeling Patterns, Enterprise

Architecture, Agility, etc. – Reviews: Programs, Models, Architectures, etc.

• Online – Classroom studio, online, on-demand video lessons. – Multiple channels DVA and TrainOvation.

• Conferences – Speaking, Presenting, and sometimes coordinating

industry conferences around the globe.

Page 8: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Unified Decomposition™

8

• With the EDW, we seek to break things out into component parts for flexibility, adaptability, agility, and generally to facilitate the capture of things that are either interpreted in different ways or changing independently of each other.

• At the same time a core premise of data warehousing is integration and moving to a common standard view of unified concepts. So we also want to tie things together – to Unify.

Page 9: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Ensemble Modeling™

9

All the parts of a thing taken together, so that each part is considered only in relation to the whole.

• The constellation of component parts acts as a whole – an Ensemble.

• With Ensemble Modeling the Core Business Concepts that we define and model are represented as a whole – an ensemble – including all of the component parts. An Ensemble is based on all things defining a Core Business Concept that can be uniquely and specifically said for one instance of that Concept.

Page 10: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

The Data Vault Ensemble

10

• The Data Vault Ensemble conforms to a single key – embodied in the Hub construct.

• The component parts for the Data Vault Ensemble include: – Hub The Natural Business Key – Link The Natural Business Relationships – Satellite All Context, Descriptive Data and History

Page 11: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Data Vault means thinking differently

11

Customer

Customer • The minimal construct then for an “entity”

such as “Customer” is now a

Hub with a set of Satellites

Page 12: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Data Vault means thinking differently

12

Customer

Customer

Page 13: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

DV versus 3NF

Sat

Sat

Sat Sat Sat Sat Sat Sat Sat

Sat Sat Sat

13

EDW

Hist

ory

Ope

ratio

nal

Page 14: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

The Data Vault modeling approach

• As the scope of the EDW is expanded and new data sources added, the Data Vault can adapt to these changes without impacting the existing model. This is what allows the EDW to be built incrementally and to adapt to change without the need for re-engineering.

New Area absorbed

14

H_Cust

H_Sale H_Empl

H_Store

H_Car

Page 15: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Data Vault Modeling Process

• The Modeling Process for creating a Data Vault model includes three primary steps:

1) Identify and Model your Core Business Concepts • Business Interviews is at the heart of this step

What do you do? What are the main things you work with?

• Also find best/target Natural Business Key 2) Identify and Model your Natural Business Relationships

• Specific Unique Relationships • Be considerate of the Unit of Work and Grain

3) Analyze and Design your Context Satellites • Consider Rate of Change, Type of Data

and also the Sources of your data during design process

15

Page 16: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

16

Anatomy of a Hub

Page 17: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

17

Anatomy of a Link

Page 18: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

18

Anatomy of a Satellite

Page 19: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Sales DV Model - Backbone

19

Sam

ple

Mod

el

Page 20: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Sample: Sales Data Vault Model

20

Page 21: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Identifying the Core Business Concepts

21

Page 22: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Business Key?

• The Business Key that forms the basis of the Hub should be: – Enterprise Wide Unique – Central Business View Aligned

This means that: – It is not a “Technical Key” but rather a “Business Key” – It is not the source system primary key (id) – It is not driven by any one source system – Should be aligned with central business initiatives In a data warehouse this means: – Will have clashes – Will have duplicates

22

Page 23: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Starting with Stars

• Begins to get complicated…

Star 1

Reach complexity and lack of agility level…

Star 2

Star 3

Star 4

Star 5

Star 6

Star 7

Star 8

Star 9

Star 10

Star 11

Star n…

23

Accounting

Finance

Logistics

Sales

Page 24: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Adapting & Expanding the EDW

• With Data Vault, scale easily – without re-engineering!

Star 1

Easily adapts to changes…

Star 2

Star 3

Star 4

Star 5

Star 6

Star 7

Star 8

Star 9

Star 10

Star 11

Star n…

EDW DV EDW

24

Accounting

Finance

Logistics

Sales

Page 25: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Fundamental Architecture

Data Mart Star

Schema

Other Marts & Error Marts

Enterprise DWBI Solution

Load

Tran

sfor

m

Calc

ulat

e Co

nver

t

Clea

nse

Prof

ile

Val

idat

e

Extra

ct

Load

D/T

Stam

p

Inte

grat

e

Extra

ct

Staging

EDW

Tran

sfor

m

Calc

ulat

e Co

nver

t

Clea

nse

Prof

ile

Val

idat

e

Inte

grat

e Raw BDW

* Integrate * Align

* Reconcile

Mart Specific Rules

Common Business Rules

25

Data Mart Star

Schema

Page 26: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Identifying relationships that are really Ensembles

• Rules and Guidelines

• Does the Link have its own Business Key?

• Does the Link represent its own Core Business Concept?

• Are there several Satellites on the Link?

• Are there many attributes to describe the Link?

• Are there relationships (Link to Link) with this Link?

IF YES to any of these questions then the Link is Likely a Hub.

When a Link becomes a Hub

26

Page 27: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Applying the Data Vault Ensemble

27

• Mixing “color types of data” is not Data Vaulting but rather unvaulting

* A blended pattern has different dynamics

Thinking Differently

• Stay with the Ensemble Modeling Pattern. Continue practicing Unified Decomposition. Continue Vaulting. Be aware when you change patterns.

Option 1 Option 2 Option 3

Page 28: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Sourcing the Data Vault EDW

28

• Sourcing Data Vault requires more joins (Hub to Sats, 2 sides of Links)

• Sourcing Data Vault can be more efficient than sourcing other forms

• Primary path to efficient sourcing is thinking differently…

1. ETL team needs to understand the DV model to be efficient 2. Automation and templates for repeatable patterns make this easier 3. Pulling context from subset of Satellites eases this join impact 4. Hubs and Links are thin and short tables with no redundancy (fast) 5. Data Marts should not be based on creating another copy of DW 6. Data Mart design should be agile, purpose-built, and business driven 7. Data Marts should pass the virtualization test 8. Tune with PITS, Bridges, other Mart Stage views (& materialized)

Page 29: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

Link:Link:Link

29

• What does a L:L:L mean? • Can a relationship have relationships to other relationships?

Whenever you see a Link:Link you should take a moment to find the Hub you are missing. Either there or not yet modeled.

• Automation:

Page 30: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

30

Benefits of Data Vault Modeling

Agility Auditability History Scalability Simplicity Loadability

Responds Faster & Costs Less

Page 31: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

• Financial Institutions • Telecommunications • Retail • Manufacturing • Technology • Energy & Utility • HealthCare • Consultancy • Transportation • Government • Gaming • Etc.

31

Applying Data Vault

Page 32: 2014 Data Vault ReConnect Event Then & Now DDVM

© 2014 Genesee Academy, LLC

32