technical metadata integration for true data lineage...business analyst etl developer business...

34
© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved. The material in this document is for the consumption of the recipient only. It may not be forwarded/shared with anyone else without express written permission of Compact Solutions LLC. TECHNICAL METADATA INTEGRATION & TRUE DATA LINEAGE SID BANERJEE VP – WW PRODUCT SALES DAWID DUDA, VP – PRODUCTS

Upload: others

Post on 12-Mar-2020

10 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved. The material in this document is for the consumption of the recipient only. It may not be forwarded/shared with anyone else without express written permission of Compact Solutions LLC.

TECHNICAL METADATA INTEGRATION & TRUE DATA LINEAGE

SID BANERJEE VP – WW PRODUCT SALES

DAWID DUDA, VP – PRODUCTS

Page 2: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

INTRODUCTION

Page 3: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

Founded in 2002, privately held

Presence in

Chicago (US) – Worldwide HQ

London (Western Europe)

Krakow (Eastern Europe/Poland) – Innovation labs

Ahmedabad (APAC/India) – Dual shore services

Solutions

MetaDexTM – Metadata Integration

TestDriveTM – Testing solution for Data Warehouse/ETL

System Integration Capabilities/Alliances

WHO ARE WE – ORGANIZATION AND CAPABILITIES

Page 4: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

GoalsTop Companies trust Compact Software

BFSI

Life Sciences

Technology

All logos are trademarks and owned by their respective companies and affiliates

Retail

Media

Logistics

© Copyright 2009 - 2016 Compact Solutions LLC. All Rights Reserved.

4

Page 5: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

ETL DeveloperBusiness Analyst

Business Partner Data Steward

Project Manager

Information Architect

“In reviewing this report

I have a question

about…"

“Let me look into it,

I will get back to you”

“I’ll need to take

resources from another

project to get those

answers…”

BI Developer DBA/DDA/DA

Let’s look at

the table…

That is

calculated by …

The data comes in from …

and then… finally…

The mapping rules tell

us that this should…The most reliable

source is.…

Today a single question often requires talking to several different resources because the answers are only found by looking across disparate locations

The AS – IS State?

Page 6: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

GoalsData Governance – Business Challenges

Expensive missteps - Action is taken - only to find out later that information was wrong or incomplete

Higher costs – unclear change impact and creation of redundant processes and information

Slow response – lack of information clarity slows decision process and agility for mergers and regulatory initiatives (DFAST, CCAR Basel III)

Productivity loss – those who don’t understand data burden the few that do

Lack of standards– no global codes, definitions or data format exists

Application specific definitions – term definitions differ across divisions and LOB

No single source of truth – unless vetted its not trusted.

No ownership / governance for the problem – system and process “work-arounds” are created.

Difficult to find and understand data reliance on key knowledge workers.

Root Cause Analysis-data quality issues are time consuming to understand and verify.

Problems Governing & Managing Data Cost of Misunderstanding

Page 7: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

FRONT LINE

APPLICATIONS,

REGULATORY

OLAP

DATA INTEGRATION /

DATA QUALITY / ETL

Analytics(e.g. SAS, Cognos,

Business Objects, etc.)

ETL Tools

(e.g. DataStage,Informatica,

etc.)

Data Warehouse Appliance

(e.g. IBM PS,Teradata Netezza.)

SOURCE SYSTEMS,

Mainframes

ERP

External

End to end data lineage

Business context and meaning for IT assets

Catalog of information assets

Risk data analysis and dependency management

Shared metadata repository for Business and Technical users

Data Extraction

(e.g. SQL Scripts, COBOL, JCL

STG

Data Governance – Technical Metadata Challenges

EDW

Landing Zone

HDS

Page 8: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

KEY DATA GOVERNANCE ELEMENTS

Page 9: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• What data do I have – catalog of my data assets

• What language do I use to speak about it – my business glossary

• What does my data mean – the assets-glossary relationship

• How is my data sources and transformed – the data lineage

WHAT ARE THE KEY ELEMENTS OF DATA GOVERNANCE

Page 10: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

THERE ARE TWO TYPES OF METADATA OUT THERE...

Technical Metadata

Business Metadata

Succesful Data

Governance

Page 11: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Assets catalog

• Data lineage

• Operational metadata

• Data profiling and quality results

• Testing reports

• And more...

TECHNICAL METADATA ELEMENTS

Page 12: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

THE HOLY GRAIL OF LINEAGE

Page 13: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Connections between data assets

• Shows how does data move from one asset to another

• Exposes transformations used to derive data elements

WHAT IS DATA LINEAGE

Page 14: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Correct

• Up to date

• Usable

THE KEY REQUIREMENTS FOR DATA LINEAGE

Page 15: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Prepare manually• Read the code

• Write source to target mappings

• Maintain over time!

WHERE TO TAKE THE LINEAGE FROM

• Extract automatically• Identify the right extractor

• Configure for your environment

• Setup automatic refreshment

Page 16: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

REAL LIFE ENVIRONMENTS ARE COMPLEX...

MSSQL

Oracle

SSISOracle

DataStage

Oracle

TD

DataStage

FastLoad

BTEQ

TD(EDW)

Cognos

TD(Views)

MicroStrategy

QlikView

MSSQL

Oracle

Informatica

PLSQL

Hadoop Hive

Netezza

Page 17: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• When prepare manually• Technology is used in limited

scope

• Processes do not change over time

• Detailed low level lineage is (not yet) required

WHERE TO TAKE THE LINEAGE FROM – LOOK AT TECHNOLOGIES

• When extract automatically• Technology used at large scale

• Processes changing on a regular basis

• Detailed low level lineage (with transformations) is required

Page 18: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

TWO CLASSES OF APPLICATIONS

Static code/processes• Different types of code (SQL, SAS, Cobol, DataStage, Informatica, SSIS,

...) available directly is relatively easy to derive lineage from

Dynamic code calculated basing on parameters• Dynamic code generated in runtime basing on parameters of

procedures and programs could also be analyzed once the parameter values were identified.

18

Page 19: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

DYNAMIC CODE EXAMPLES

• Fragments of SQL extracted from database tables• Actual query construction happening only at runtime• Highly parameterized ETL processes

19

Page 20: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

THE NOT-THAT-EASY PARTS – APPROACHES

• Using operational metadata to identify parameter values for ETL processes

• Using logs to capture the actually executed transformations• Automatic analysis of metadata-driven code generation• If you are interested in the low level details of how such challanges

are solved with real life examples we will be happy to discuss offline

20

Page 21: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

SAMPLE LINEAGE DIAGRAMIT IS COMPLEX ITSELF, BUT THERE ARE STILL MORE LEVELS...

Page 22: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

A ZOOM INTO A SINGLE STEP OF THE PROCESS

Page 23: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

LESSONS LEARNED

Page 24: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Data governance does not happen overnight across all your assets...

• Clearly identify your priorities, basing on the regulatory requirements and/or internal drivers

• Work on particular applications/areas one after another

DO NOT TRY TO BOIL THE OCEAN

Page 25: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Any large company has policies to protect the assets

• What you need to do to get your governance project off the ground often does not follow the typical patterns

• Introducing governance/quality tools/lineage involves working with• Particular LoB

• Appropriate administration teams

BRING PEOPLE ON-BOARD BEFORE THE ASSETS

Page 26: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Different types of users• Business users

• Power users

• Engineers or analysts

• Ask yourself few questions:• How many users of each type do you have?

• How important is it (in short and long term) to satisfy their needs? How do you prioritize?

• Keep in mind – there are some regulations you may need to follow!

SOME PEOPLE NEED TO SEE IT ALL... WHILE OTHERS DON’T

Page 27: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• These are highly technical projects, with a relatively high amount of „surprises”

• Gather as much intel as you can before you start, but do not assume you know it all – the life of your data will surprise you

• You need a solid data governance platform/tooling to work on, but also a tiny swiss army knife to solve some smaller problems that may be specific to your environment

EXPECT THE UNEXPECTED

Page 28: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Data governance projects usually fail for a reason

• More often then not that reason is• Either lack of sufficient IT engagement

• Or lack of sufficient Business engagement

IT TAKES TWO TO TANGO

Page 29: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

WHERE ARE WE GOING WITH DATA LINEAGE

Page 30: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Big data is just one more technology – the same rules apply

• What is so special about that? The sandbox approach...

• If you want your big data to be governed, use it responsibly

BIG DATA IS COMING TO THE PICTURE

Page 31: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Even when initial lineage (especially for CCAR or BCBS 239) is prepared manually sooner or later keeping it up to date becomes unmanagable

• More technologies can now have ready lineage extractors available (Cobol/JCL for example)

• Lineage becomes more complete, but in the same time more complex

MORE AUTOMATION

Page 32: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Last year about 20% of our projects were related to custom solutions aimed at enriching and enhancing standard technical metadata (to the extend where we have launched a set of generic tools to automate that)

• Demand for operational metadata from various technologies is growing

• Again – metadata becomes more complete and more rich, but in the same time more complex to consume

MORE RICH METADATA

Page 33: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

• Creating data lineage, when using proper extractors, is not that difficult these days

• What is more challenging is consumption - pick a random table and your lineage will have 500+ objects

• Make sure you use a repository that will allow you to work with this complexity (filtering and reporting are the two key features)

• In some environments we are at the point where lineage must be pre-aggregated before populating the repository (user demand is driving the technology)

CONSUMPTION AND USABILITY BECOMES THE MAIN CHALLANGE

Page 34: Technical metadata integration for true data lineage...Business Analyst ETL Developer Business Partner Data Steward Project Manager Information Architect “In reviewing this report

© Copyright 2009 - 2017 Compact Solutions LLC. All Rights Reserved.

Q&A / DISCUSSION

Let us discuss how Compact can assist your organizations information management objectives

▐ For more information please visit www.compactbi.com or

▐ Contact us [email protected]