data quality dashboards

14
DATA QUALITY DASHBOARDS HOW TO BUILD ONE AND HOW TO MAINTAIN IT BILL SHARP

Upload: billy-sharp

Post on 13-Nov-2014

1.972 views

Category:

Technology


2 download

DESCRIPTION

A short deck on how to build a data quality dashboard

TRANSCRIPT

Page 1: Data Quality Dashboards

DATA QUALITY DASHBOARDS

HOW TO BUILD ONE AND HOW TO MAINTAIN IT

BILL SHARP

Page 2: Data Quality Dashboards

FIRST THING, WELL, FIRST

1. QUALITY IS AN AMBIGUOUS TERM SO YOU NEED TO DRIVE TO DEFINE IN IT IN YOUR CUSTOMERS EYES

• TO DO THIS, CONFIRM WHAT THEY CARE ABOUT AND, MORE IMPORTANTLY, WHY THEY CARE ABOUT IT

2. I HAVE SEEN CLIENTS SPEND A LOT OF TIME ON DASHBOARDS DESIGN, WIDGETS, AND NOT ENOUGH TIME ON WHAT THE DASHBOARDS ARE DRIVING

Page 3: Data Quality Dashboards

DATA QUALITY DASHBOARD COMPONENTS

• DIMENSIONS & METRICS

• THESE FORM THE BASIC FRAMEWORK FOR A DASHBOARD

• SHOULD BE PURPOSE FIT

• SOME METRICS ARE MORE APPLICABLE FOR CERTAIN ACTIVITIES

• DUPLICATION IS A PURPOSE FIT DIMENSION FOR MDM

• CONFORMITY IS A PURPOSE FIT DIMENSION FOR A MIGRATION EFFORT

• TARGETS & TRENDS

• THESE GIVE STAKEHOLDERS THE ABILITY TO CUSTOMIZE DASHBOARDS

• SHOULD ALSO BE PURPOSE FIT

• TARGETS ARE RELATIVE TO THE METRICS THEY ARE ASSOCIATED WITH

• TRENDING IS A VERY INSIGHTFUL AND QUICK WAY TO GAUGE PROGRESS

Page 4: Data Quality Dashboards

DATA QUALITY DIMENSIONS & METRICS

Page 5: Data Quality Dashboards

COMMONLY ACCEPTED DIMENSIONS OF DATA QUALITY

1. COMPLETENESS

IS REQUIRED DATA PRESENT?

2. CONFORMITY

IS DATA ADHERING TO DEFINED RULES?

3. CONSISTENCY

IS DATA REPRESENTED THE SAME ACROSS THE ENTERPRISE?

4. DUPLICATION

IS DATA REPRESENTED ONCE AND ONLY ONCE?

5. INTEGRITY

ARE DATA RELATIONSHIPS DEFINED AND ENFORCED?

6. ACCURACY

IS DATA CORRECT? (TYPICALLY REFERENCE DATA LIKE CODES / ADDRESSES / ETC)

Page 6: Data Quality Dashboards

DATA QUALITY DIMENSIONS: COMPLETENESS

• IS ALL THE REQUIRED INFORMATION PRESENT?

• IMPLIES THAT THE REQUIRED INFORMATION IS A KNOWN AND THAT IT CAN BE PACKAGED INTO A RULE

• SOME EXAMPLES FROM MY PAST:

• EVERY CUSTOMER MUST HAVE A LAST NAME, ADDRESS LINE ONE AND ZIP CODE PRESENT BECAUSE THIS IS THE ESSENTIAL INFORMATION REQUIRED TO MAIL AN INVOICE

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 7: Data Quality Dashboards

DATA QUALITY DIMENSIONS: CONFORMITY

• DOES THE DATA MATCH THE REQUIRED DATA TYPE?

• IMPLIES THAT THE REQUIRED DATA TYPE IS A KNOWN AND THAT IT CAN BE PACKAGED INTO A RULE

• SOME EXAMPLES FROM MY PAST:

• ALL INVOICE AMOUNTS ARE TO BE STORED IN US DOLLARS BECAUSE THERE ARE CALCULATIONS DOWNSTREAM THAT CONVERT THESE AMOUNTS TO OTHER CURRENCIES WHEN REQUIRED

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 8: Data Quality Dashboards

DATA QUALITY DIMENSIONS: CONSISTENCY

• IS DATA REPRESENTED THE SAME WAY IN MULTIPLE SYSTEMS?

• IMPLIES THAT THERE IS ONE WAY TO REPRESENT THE DATA IN ALL SYSTEMS, THAT THIS IS A KNOWN AND THAT IT CAN BE PACKAGED INTO A RULE

• SOME EXAMPLES FROM MY PAST:

• ARE ASSETS ASSIGNED TO THE SAME CUSTOMER IN INVENTORY, BILLING AND CRM SYSTEMS?

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 9: Data Quality Dashboards

DATA QUALITY DIMENSIONS: DUPLICATION

• IS INFORMATION REPRESENT ONCE AND ONLY ONCE?

• IMPLIES THAT HOW TO BREAKDOWN INFORMATION INTO COMPONENTS THAT NEED TO ONLY BE REPRESENTED ONCE IS A KNOWN

• SOME EXAMPLES FROM MY PAST:

• A CUSTOMER, DEFINED BY NAME AND ADDRESS, SHOULD ONLY HAVE ONE ACTIVE RECORD ACROSS THE ENTERPRISE DATA LANDSCAPE

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 10: Data Quality Dashboards

DATA QUALITY DIMENSIONS: INTEGRITY

• ARE THERE TRANSACTIONAL ORPHANS PRESENT IN THE SYSTEM?

• IMPLIES THAT THE REQUIRED INFORMATION IS A KNOWN AND THAT IT CAN BE PACKAGED INTO A RULE

• SOME EXAMPLES FROM MY PAST:

• EVERY UNIQUE CUSTOMER MUST BE ASSOCIATED WITH AT LEAST ONE ADDRESS

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 11: Data Quality Dashboards

DATA QUALITY DIMENSIONS: ACCURACY

• ACCURACY

• IS THE DATA VALID/TRUE?

• IMPLIES THAT THE REQUIRED INFORMATION IS A KNOWN AND THAT IT CAN BE PACKAGED INTO A RULE

• SOME EXAMPLES FROM MY PAST:

• EVERY CUSTOMER MUST HAVE A DELIVERABLE ADDRESS

• THIS RULE IS ROOTED IN DATA ELEMENTS AND TIED TO A MEANINGFUL AND VALUE ADDED BUSINESS OBJECTIVE

• THAT’S A GOOD METRIC!

Page 12: Data Quality Dashboards

TARGETS & TRENDS

Page 13: Data Quality Dashboards

TRAFFIC LIGHT TARGET SETTING• PERCENTAGES REPRESENT THE PERCENTAGE OF

RECORDS THAT VIOLATE THE RULE

• HELPS QUICKLY HIGHLIGHT WHAT NEEDS TO BE PRIORITIZED (REDS) AND WHAT IS GOING WELL (GREEN)

• PROBABLY ONLY CARE ABOUT THE RED CATEGORY METRICS

• HIGHLY DEPENDENT ON A GOOD DEFINITION OF WHAT PERCENTAGES ARE GREEN, YELLOW AND RED

• TAKES SOME TWEAKING TO GET IT RIGHT

Page 14: Data Quality Dashboards

TRENDING: PROGRESS INDICATOR

• PROBABLY CARE ABOUT TRENDS MORE THAN ANYTHING ELSE

• THIS IS THE MEASURE OF REMEDIATION PROGRAM EFFECTIVENESS

• PROBABLY ONLY CARE ABOUT WHAT’S DECLINING OR REMAINING THE SAME (QUALITY IS SUPPOSED TO GET BETTER)