stopping the lake from becoming a swamp

31
Stopping your Lake becoming a Swamp Steve Jones – Global VP Big Data – Capgemini 4 May 2016

Upload: capgemini

Post on 06-Jan-2017

996 views

Category:

Technology


0 download

TRANSCRIPT

Stopping your Lake becoming a Swamp Steve Jones – Global VP Big Data – Capgemini 4 May 2016

#INFA16

The biggest complaint in Data Governance…

The business won’t engage

#INFA16

The biggest complaint in Data Governance…

Want to know why?

#INFA16

The biggest complaint in Data Governance…

Its because they never cared about Data Governance

#INFA16

The biggest complaint in Data Governance…

Because it didn’t actually govern the data in the way they wanted

#INFA16

Cor

pora

te

Ad-

hoc

LOB

m

anag

emen

t

Ope

ratio

ns

Market

Multiple internal views – consistently compromised

Ope

ratio

ns

LOB mart Spreadsheets

Line of business

Transactional systems

CRM ERP PLM

EDW Corporate

ODS

Web

#INFA16

Cor

pora

te

Ad-

hoc

LOB

m

anag

emen

t

Ope

ratio

ns

Market

Multiple internal views - you already have a swamp

Ope

ratio

ns

LOB mart Spreadsheets

Line of business

Transactional systems

CRM ERP PLM

EDW

Fit

Detail

Freshness

Fidelity

Corporate ODS

Web

#INFA16

IT’s answer – just make the EDW ‘bigger’- one view for all

#INFA16

Why the single view & the golden record fails

Div

isio

n 1

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Corporate

Now agree on everything D

ivis

ion

2

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Div

isio

n 3

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Div

isio

n 4

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Corporate KPIs

#INFA16

The biggest complaint in Data Governance…

But good news, Big Data is changing the game

#INFA16

Marketing

Business Meta-Data – recognizing how people work

Data Lake

Debt Recovery High Value

Customers

Explore the Lake

Prioritize

#INFA16

Marketing

Business Meta-Data – recognizing how people work

Data Lake

Debt Recovery High Value

Customers

Explore the Lake

Business Meta-Data Purpose: Debt

Recovery

Verify Purpose Exclude

Purpose: Mktg Exclude

#INFA16

So what do we need?

Govern where it matters

!  Focus on Meta-data, MDM and RDM !  Enforce only when sharing !  Treat Corporate as aggregation of Local.

Encourage local requirements

!  Let the business decide what they need !  Build from the bottom !  Enable traceability to source !  Disposable data views.

Distill on demand !  Select only what you want !  Business friendly tooling !  Re-usable information maps !  Rapid change cycle.

Store everything !  Store everything ‘as is’ !  Include structured and unstructured data !  Store it cheaply.

#INFA16

Why Data Lakes work

Archives Operational systems

Case Mgmt

Operations

Scheduling

Data Explosion

Geographic Data

One Place for Everything

Supply Chain Optimization

Fraud Detection

Next Best Action Marketing

R MADLib SAS

#INFA16

Keep governance small and focused – the MDM RADAR

Core

Sync

Share

Federated

#INFA16

Collaboration over quality

The cross-reference is the most important thing an MDM project can

deliver

#INFA16

Collaboration over quality

Quality is a side-effect – not a goal of the first phase of your MDM

project

#INFA16

Why the X-Ref matters

Local view

Corporate standards

Master data and reference data

Corporate view

Customer x-ref

Customer MDM

Invoices Orders

Customer

Invoices Orders

BU1

Info

rmat

ion

gove

rnan

ce

BU2 BU3

Customer

Invoices

Orders

Customer

Invoices

Orders

Customer

Invoices

Orders

#INFA16

The Quality mess

Internal systems

CRM Back Office Trading Market Data

Reports Reports Reports Reports

Trading Trading

ETL ETL ETL

#INFA16

The current situation – add a new System

Internal systems

CRM Back Office Trading Market Data

Reports Reports Reports Reports

Trading Trading

ETL ETL ETL

Reports

ETL

#INFA16

The current situation – Where does Data Quality go?

Internal systems

CRM Back Office Trading Market Data

Reports Reports Reports Reports

Trading Trading

ETL ETL ETL

Reports

ETL

Fix at Source

Patch Patch Patch Patch Patch

DQ DQ DQ DQ

#INFA16

Lets try another way – Fix at Source is still the ideal but

Internal systems

CRM Back Office Trading Market Data

Reports Reports Reports Reports

Trading Trading

Data Lake

Data Lake + Quality Data Quality & MDM

#INFA16

Then lets go further

Internal systems

CRM Back Office Trading Market Data

Reports Reports Reports Reports

Trading Trading

Data Lake

Data Lake + Quality Data Quality & MDM

Data Science

#INFA16

Why the Business Data Lake succeeds

Div

isio

n 1

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s

Now agree where it counts

Div

isio

n 2

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Div

isio

n 3

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Div

isio

n 4

Sales

Finance

Supply chain

Marketing

R&D

Pers

onal

KPI

s D

ivsi

onal

KPI

s

Corporate

Corporate KPIs

Div

sion

al K

PIs

#INFA16

Ingestion tier

Insights tier

Unified operations tier

System monitoring System management

Unified data management tier

Data mgmt. services

MDM RDM

Audit and policy mgmt.

Processing tier

Workflow management

Distillation tier

HDFS storage Unstructured and structured data

In-memory

MPP database

Real time

Micro batch

Mega batch

SQL NoSQL

SparkSQL

Query interface

s

SQL

Help me ingestion – you’re my only hope

Sources Action tier

Real-time ingestion

Micro batch ingestion

Batch ingestion

Real-time insights

Interactive insights

Batch insights

Ingestion The first and

LAST place you can have control

Distillation Think ‘Excel’ not EDW

#INFA16

If your business isn’t trivial – you’ll end up with multiple lakes

Client Data Lake Partner Data Lake

Geo specific Geo specific Geo specific

Corporate Data Lake

Analytics

Results Distillation Summary

Analytics Analytics

Results

Analytics

Results

#INFA16

Federation requires THREE things

The cross-reference •  If you can’t link data sets its impossible

Rationalization •  The lesson of Google & Amazon is less choice, not

more

Business Meta-Data •  Technical Meta-Data is no-longer enough

#INFA16

Building a Lake and the challenge of federation

Client specific Client specific

Geo specific Geo specific Geo specific

Corporate Data Lake To do this you need to standardize technology

#INFA16

Business Meta-Data and the x-ref – the two things that must be shared

Client specific Client specific

Geo specific Geo specific Geo specific

Corporate Data Lake

Business Meta-Data & x-ref

#INFA16

The new philosophy

Business Data Lake

Store everything

Encourage local

Govern only the common

Assume Federation

It’s all about insight at the point of action

#INFA16

But remember

Your next challenge will be analytical collaboration beyond

your company boundaries