building a data quality program from scratch

25
Building A Data Quality Program From Scratch DAMA Chicago October 19, 2011 John Grage – Sr. Mgr. Discover Financial Services

Upload: clovis

Post on 04-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Building A Data Quality Program From Scratch. DAMA Chicago October 19, 2011 John Grage – Sr. Mgr. Discover Financial Services. Agenda. Company Introduction Card Acceptance Data Quality Defined The Six Factors of Data Quality Best Practices for Improving Data Quality - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building A Data Quality Program From Scratch

Building A Data Quality Program From Scratch

DAMA Chicago

October 19, 2011

John Grage – Sr. Mgr. Discover Financial Services

Page 2: Building A Data Quality Program From Scratch

2

• Company Introduction• Card Acceptance• Data Quality Defined• The Six Factors of Data Quality• Best Practices for Improving Data Quality• Origins of Poor Data Quality• Benefits of High Data Quality• Who is Responsible for Data Quality?• Let’s Get Started• Celebrate the Wins• Recommendations• Core Functional Requirements of a Data Quality Tool• Q&A

AgendaAgenda

Page 3: Building A Data Quality Program From Scratch

3

Company IntroductionCompany Introduction• Discover Financial Services (NYSE: DFS)

– Direct Banking and Payment Services Company– Founded in 1986– We Offer Many Consumer Products

• Credit Card (One of the Largest Credit Card Issuers in the U.S.)• ATM/Debit Card• Loans (Student, Credit Card, and Personal)• Banking (Online Savings Accts, CDs, and Money Market Accts)

– We Own Three Payments Networks• Discover Network: has millions of merchants and cash access locations• PULSE: one of the nation’s leading ATM/debit networks• Diners Club International: a global payments network with acceptance

in 185 countries and territories

– Riverwoods, IL Headquarters– Approximately 10,500 Employees– Approximately 50 Million Card Holders– Sites Include: www.discovercard.com and www.discoverbank.com

Page 4: Building A Data Quality Program From Scratch

4

Card AcceptanceCard Acceptance• Discover Card

– North America – U.S. / Canada / Mexico

– Central America – Costa Rica / El Salvador / Panama and others

– South America – Brazil / Ecuador

– Caribbean – Bahamas / BVI / Jamaica / Puerto Rico and others

– Europe – Austria / Finland / Poland / Turkey and others

– Asia – Mainland China / Japan / South Korea

– Africa – South Africa

– Many Other Countries Coming Soon

– See http://www.discovercard.com and select ‘International Acceptance’ under ‘Help and Support’ for up to date list

Page 5: Building A Data Quality Program From Scratch

5

Data Quality DefinedData Quality Defined• Many Definitions

– The degree of excellence exhibited by the data in relation to the portrayal of the actual scenario.

– The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.

– The people, processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria.

– People (must), Process (must), Technology (tools needed at some point)

• Myths and Misconceptions– More than defect correction– Not a one time action– Seldom about perfection

Page 6: Building A Data Quality Program From Scratch

6

The Six Factors of Data QualityThe Six Factors of Data Quality• Context

– The purpose for which it is used• Storage

– Where the data resides• Data Flow

– How the data enters and moves through the organization• Work Flow

– How work activities interact with and use the data• Stewardship

– People responsible for managing the data• Continuous Monitoring

– Processes for regularly validating the data

Page 7: Building A Data Quality Program From Scratch

7

Best Practices for Improving Data QualityBest Practices for Improving Data Quality• Every Data Quality Effort Starts with Data Profiling• Tool Based Data Profiling is More effective Than Manual

Methods• Data Profiling is Not a One Time Task• Data Profiling, Integration and Quality are Closely Related• Proactive Order Can Reduce Reactive Chaos• Improving Data When It’s Created or Changed is Easier Than

Fixing It Later– Garbage in, garbage out– An ounce of prevention is worth a pound of cure– Data quality needs to move upstream

Page 8: Building A Data Quality Program From Scratch

8

Origins of Poor Data QualityOrigins of Poor Data Quality• Inconsistent Definitions for Common Terms• Any Manual Intervention in the Data Flow Process

(employees/customers)• Data Migration or Conversion Projects• External Data• Customer, Product and Financial Data are More Prone to Data

Quality Problems Compared to Other Types of Data

Page 9: Building A Data Quality Program From Scratch

9

Benefits of High Data QualityBenefits of High Data Quality• Greater Confidence in Analytic Systems• Less Time Spent Reconciling Data and/or Fixing Problems• Single Version of the Truth• Increased Customer Satisfaction• Reduced Costs• Increased Revenues• Compliance

– Compliance can drive your DQ program if you can’t sell the other benefits– Make friends with you audit staff– HIPAA, GLBA, SOX, Basel II, FDIC, Federal Reserve and others

Page 10: Building A Data Quality Program From Scratch

10

Who is Responsible for Data Quality?Who is Responsible for Data Quality?

• Information Technology• Business Analysts• Business• Front-Line Workers• DQ Analysts• Data Steward• Corporate Executives• Board of Directors• No One• All of Us Are – We just play different roles

Page 11: Building A Data Quality Program From Scratch

11

Let’s Get Started (Metadata)Let’s Get Started (Metadata)• Information = Data (content) + Metadata (context)• Your DQ Program Needs to Address Both Data and Metadata• “Don’t Boil the Ocean” • Start with a Focus on Structured Data (get this right b4 tackling others)

• Start With Selecting a Handful of Business Attributes From:– Customer– Product– Vendor / Supplier– Employee– Financial– Master Reference Data– or an attribute(s) someone brings to you. Don’t turn away this opportunity

• Find Data Steward / SME / or Someone with Business Knowledge About Attribute Who is Willing to Work With You

• Find Published Metadata About Those Attributes – Verify Metadata is current and accurate with your SME– If Metadata does not exist then that is your first step

Page 12: Building A Data Quality Program From Scratch

12

Let’s Keep Going (Discovery)Let’s Keep Going (Discovery)• Update and/or Publish Your Metadata on These Attributes

– Great if you already have a single metadata repository tool– If not, that should be one goal of your data governance program– Document and train individuals on how to find and use this metadata – Enterprise LDM should be in your repository

• Business subject areas, critical entities, attributes and relationships• Metadata about these attributes is your golden record

• Discovery (where do these attributes reside?)– Almost impossible to get 100% coverage without a tool– Could write lots of SQL and interrogate lots of programs and copybooks– Either way you will have something to work with – just how complete is it?

Page 13: Building A Data Quality Program From Scratch

13

Let’s Keep Going (POC)Let’s Keep Going (POC)• Start With a POC Within One LOB

– 1-2 week effort– Examine a small number of attributes– Gather a small set of business rules – Profile the data– Share findings with SME– This is your chance to show value within a LOB that a DQ program can

bring

Page 14: Building A Data Quality Program From Scratch

14

Let’s Keep Going (Project)Let’s Keep Going (Project)• Expand to Data Quality Project for That LOB

– 1-6 month effort– Expand to full set of attributes– Expand to full set of business rules – Profile the data– Share findings with SME and LOB– Build action plan to address DQ issues– Fix DQ issues– Build in monitoring and reporting activities– Start looking upstream– Publish results – gain corporate awareness of what you have accomplished– May need to do more than one LOB before preceding to next step

Page 15: Building A Data Quality Program From Scratch

15

Let’s Keep Going (Enterprise)Let’s Keep Going (Enterprise)• Expand to Data Quality Project Across the Enterprise

– 6-12+ month effort– This is where you start to enter into MDM– Look at critical business entities / attributes that span the enterprise

• May be some of the same attributes that you looked at individually within their LOB

– Look at full set of business rules across the enterprise– Profile the data across multiple LOBs– Share findings with enterprise SME and Data Governance Council– Work with DGC to prioritize next steps– Build action plan to address DQ issues– Fix DQ issues– Build in monitoring and reporting activities– Focus upstream - need to address DQ issues in operational systems– Publish results – gain corporate awareness of what you have accomplished

Page 16: Building A Data Quality Program From Scratch

16

Let’s Keep Going (6 Key DQ Dimensions)Let’s Keep Going (6 Key DQ Dimensions)• Completeness

– Are data values missing or in an unusable state?– Nullability

• Conformity– Should data conform to specified formats?

• Consistency– Do distinct data instances provide conflicting information?– Are values consistent across data sets?

• Accuracy– Does data accurately represent the “real-world” values they are expected to

model? i.e. incorrect spellings and not current data

• Duplication– Are there multiple, unnecessary representations of the same data?

• Integrity– What data is missing important relationship links? The inability to link

related records together may introduce duplication across your enterprise

Page 17: Building A Data Quality Program From Scratch

17

Let’s Keep Going (Profile)Let’s Keep Going (Profile)• Run Data Profiling Against Your Attribute(s)

– A DQ tool makes your life much simpler– Report on

• Source system• Entity name• Attribute name• Data type and length• Nullability• Identify if attribute is a PK or FK• Total number of rows (or %) examined (may not want/need to look at all

rows)• Cardinality• Min and max values for the attribute• Classification (SS#, postal code, name, address, etc.) DQ tools good at

this• Number of data quality issues (attributes not in-line with business rules)• Provide explanations and examples for each exception

Page 18: Building A Data Quality Program From Scratch

18

Let’s Keep Going (Analyze / Fix)Let’s Keep Going (Analyze / Fix)• Analyze Your Results

– Look at results from your analysis regarding DQ dimensions looked at– Identify data quality issues– Determine with SME the impact to LOB or company these exceptions bring– $ is the best message to bring– Compliance is equally as effective– Build action plan to fix– Determine cost to fix– Take action to fix if cost effective (remember it’s not about perfection)– Save results

Page 19: Building A Data Quality Program From Scratch

19

Let’s Keep Going (Swim Upstream)Let’s Keep Going (Swim Upstream)• Trace Data Flow in Reverse from Data Quality Issue • Data was Corrupted Somewhere Along Data Flow

– Right off the bat – as data entered the company• Bad vendor file• Bad data entry from customer service rep (telephone call)• Bad data entry from customer (online application)

– Programming error in operational system– Data Transformation processes as data moves along– ???

• Find Where Corruption is Occurring and Fix It• Beware: Corruption May be Occurring in Multiple Places

Page 20: Building A Data Quality Program From Scratch

20

Let’s Keep Going (Monitoring)Let’s Keep Going (Monitoring)• Build Monitoring Process to Audit Your Fix • Monitoring Process Should be a Scheduled Automated Process• Need to Review Results to Determine if Data is No Longer

Being Corrupted• Take Action if Data Quality is Being Compromised

Page 21: Building A Data Quality Program From Scratch

21

Let’s Keep Going (Non-Compliance)Let’s Keep Going (Non-Compliance)• Use Pie charts, Bar Graphs, etc to Pictorially Illustrate Effect of

Not Addressing Discovered DQ Issues • Tie to Regulatory Compliance if Helpful. Refer to HIPAA, Basel

II, SOX, FDIC, Federal Reserve.• Tie to $

– Increased cost– Decreased revenue

• Present to Data Governance Council

Page 22: Building A Data Quality Program From Scratch

22

Celebrate The WinsCelebrate The Wins• Celebrate • Publish Wins on Scorecard• Show $ Saved or Revenue Increased• Constantly Remind Enterprise of What You are Doing and Value

You are Providing

Page 23: Building A Data Quality Program From Scratch

23

RecommendationsRecommendations• Start Small (POCs)• Show Some Quick Wins - $• Grow From There• Focus on What You Have to Work With, Not What You Don’t

Have to Work With• Profile Data More Deeply and More Often• Find Solutions in Tools• Establish Both Proactive and Reactive Processes• Take Data Quality Upstream• Use Regulatory Compliance to Drive Data Quality• Use MetaData to Drive Quality• Address Enterprise Data Quality• Derive EDQ Org Structure and Support Through Data

Governance or other Executive Support

Page 24: Building A Data Quality Program From Scratch

24

Core Functional Requirements of a DQ Tool• Profiling

– Capture statistics (metadata) providing insight into the quality of the data and help to identify data quality issues

• Parsing and Standardization– Decomposition of text fields into component parts and the formatting of

values into consistent layouts based on industry standards, local standards, user defined business rules and knowledge bases of values and patterns

• Generalized “Cleansing”– The modification of data values to meet domain restrictions, integrity

constraints or other business rules that define when the quality of data is sufficient for organization

• Matching– Identifying, linking or merging related entries within or across sets of data

• Monitoring– Deploying controls ensuring data continues to conform to business rules

that define data quality for the organization

• Enrichment– Enhancing the value of internally held data by appending related attributes

from external sources (i.e. consumer demographic attributes or geographic descriptors)

Page 25: Building A Data Quality Program From Scratch

25

The EndThe End

Thank You!Thank You!

Questions?Questions?