stopping the lake from becoming a swamp
TRANSCRIPT
#INFA16
The biggest complaint in Data Governance…
Its because they never cared about Data Governance
#INFA16
The biggest complaint in Data Governance…
Because it didn’t actually govern the data in the way they wanted
#INFA16
Cor
pora
te
Ad-
hoc
LOB
m
anag
emen
t
Ope
ratio
ns
Market
Multiple internal views – consistently compromised
Ope
ratio
ns
LOB mart Spreadsheets
Line of business
Transactional systems
CRM ERP PLM
EDW Corporate
ODS
Web
#INFA16
Cor
pora
te
Ad-
hoc
LOB
m
anag
emen
t
Ope
ratio
ns
Market
Multiple internal views - you already have a swamp
Ope
ratio
ns
LOB mart Spreadsheets
Line of business
Transactional systems
CRM ERP PLM
EDW
Fit
Detail
Freshness
Fidelity
Corporate ODS
Web
#INFA16
Why the single view & the golden record fails
Div
isio
n 1
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Corporate
Now agree on everything D
ivis
ion
2
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Div
isio
n 3
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Div
isio
n 4
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Corporate KPIs
#INFA16
Marketing
Business Meta-Data – recognizing how people work
Data Lake
Debt Recovery High Value
Customers
Explore the Lake
Prioritize
#INFA16
Marketing
Business Meta-Data – recognizing how people work
Data Lake
Debt Recovery High Value
Customers
Explore the Lake
Business Meta-Data Purpose: Debt
Recovery
Verify Purpose Exclude
Purpose: Mktg Exclude
#INFA16
So what do we need?
Govern where it matters
! Focus on Meta-data, MDM and RDM ! Enforce only when sharing ! Treat Corporate as aggregation of Local.
Encourage local requirements
! Let the business decide what they need ! Build from the bottom ! Enable traceability to source ! Disposable data views.
Distill on demand ! Select only what you want ! Business friendly tooling ! Re-usable information maps ! Rapid change cycle.
Store everything ! Store everything ‘as is’ ! Include structured and unstructured data ! Store it cheaply.
#INFA16
Why Data Lakes work
Archives Operational systems
Case Mgmt
Operations
Scheduling
Data Explosion
Geographic Data
One Place for Everything
Supply Chain Optimization
Fraud Detection
Next Best Action Marketing
R MADLib SAS
#INFA16
Collaboration over quality
The cross-reference is the most important thing an MDM project can
deliver
#INFA16
Collaboration over quality
Quality is a side-effect – not a goal of the first phase of your MDM
project
#INFA16
Why the X-Ref matters
Local view
Corporate standards
Master data and reference data
Corporate view
Customer x-ref
Customer MDM
Invoices Orders
Customer
Invoices Orders
BU1
Info
rmat
ion
gove
rnan
ce
BU2 BU3
Customer
Invoices
Orders
Customer
Invoices
Orders
Customer
Invoices
Orders
#INFA16
The Quality mess
Internal systems
CRM Back Office Trading Market Data
Reports Reports Reports Reports
Trading Trading
ETL ETL ETL
#INFA16
The current situation – add a new System
Internal systems
CRM Back Office Trading Market Data
Reports Reports Reports Reports
Trading Trading
ETL ETL ETL
Reports
ETL
#INFA16
The current situation – Where does Data Quality go?
Internal systems
CRM Back Office Trading Market Data
Reports Reports Reports Reports
Trading Trading
ETL ETL ETL
Reports
ETL
Fix at Source
Patch Patch Patch Patch Patch
DQ DQ DQ DQ
#INFA16
Lets try another way – Fix at Source is still the ideal but
Internal systems
CRM Back Office Trading Market Data
Reports Reports Reports Reports
Trading Trading
Data Lake
Data Lake + Quality Data Quality & MDM
#INFA16
Then lets go further
Internal systems
CRM Back Office Trading Market Data
Reports Reports Reports Reports
Trading Trading
Data Lake
Data Lake + Quality Data Quality & MDM
Data Science
#INFA16
Why the Business Data Lake succeeds
Div
isio
n 1
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Now agree where it counts
Div
isio
n 2
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Div
isio
n 3
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Div
isio
n 4
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s D
ivsi
onal
KPI
s
Corporate
Corporate KPIs
Div
sion
al K
PIs
#INFA16
Ingestion tier
Insights tier
Unified operations tier
System monitoring System management
Unified data management tier
Data mgmt. services
MDM RDM
Audit and policy mgmt.
Processing tier
Workflow management
Distillation tier
HDFS storage Unstructured and structured data
In-memory
MPP database
Real time
Micro batch
Mega batch
SQL NoSQL
SparkSQL
Query interface
s
SQL
Help me ingestion – you’re my only hope
Sources Action tier
Real-time ingestion
Micro batch ingestion
Batch ingestion
Real-time insights
Interactive insights
Batch insights
Ingestion The first and
LAST place you can have control
Distillation Think ‘Excel’ not EDW
#INFA16
If your business isn’t trivial – you’ll end up with multiple lakes
Client Data Lake Partner Data Lake
Geo specific Geo specific Geo specific
Corporate Data Lake
Analytics
Results Distillation Summary
Analytics Analytics
Results
Analytics
Results
#INFA16
Federation requires THREE things
The cross-reference • If you can’t link data sets its impossible
Rationalization • The lesson of Google & Amazon is less choice, not
more
Business Meta-Data • Technical Meta-Data is no-longer enough
#INFA16
Building a Lake and the challenge of federation
Client specific Client specific
Geo specific Geo specific Geo specific
Corporate Data Lake To do this you need to standardize technology
#INFA16
Business Meta-Data and the x-ref – the two things that must be shared
Client specific Client specific
Geo specific Geo specific Geo specific
Corporate Data Lake
Business Meta-Data & x-ref
#INFA16
The new philosophy
Business Data Lake
Store everything
Encourage local
Govern only the common
Assume Federation
It’s all about insight at the point of action