datalab dataquality dimensions

7
Data Quality Dimensions #BecauseDataMatters

Upload: carlos-guerreiro

Post on 28-Jul-2015

60 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: DataLab DataQuality Dimensions

Data Quality Dimensions#BecauseDataMatters

Page 2: DataLab DataQuality Dimensions

2015 Copyright DataLab/ Confidential Information / Distribution Prohibited

What data gives conflicting information?What data gives conflicting information?

What data is incorrect or out of date?What data is incorrect or out of date?

What data records are duplicated?What data records are duplicated?

What data is missing important relationship linkages?What data is missing important relationship linkages?

What data is stored in a non-standard format?What data is stored in a non-standard format?

CompletenessCompleteness

ConformityConformity

ConsistencyConsistency

AccuracyAccuracy

DuplicationDuplication

IntegrityIntegrity

What data is missing or unusable?What data is missing or unusable?

What scores, values, calculations are outside of range?What scores, values, calculations are outside of range?RangeRange

RedundancyRedundancy What data is redundant? Orphan Analysis?What data is redundant? Orphan Analysis?

RelationshipRelationship What relationships exist in the data set? Across multiple tables?What relationships exist in the data set? Across multiple tables?DataProfile

DataQuality

Column ProfilingColumn Profiling What is the data’s physical characteristics ? Across multiple tables?What is the data’s physical characteristics ? Across multiple tables?

Data Quality Dimensions

Page 3: DataLab DataQuality Dimensions

2015 Copyright DataLab/ Confidential Information / Distribution Prohibited

COMPLETENESS CONFORMITY CONSISTENCY DUPLICATION INTEGRITY ACCURACY RANGE

Completeness:Missing Key Values

Conformity:Incorrect Format

Consistency:Incorrect Format

Consistency:Data is in correct format and complete,

but breaks a business ruleDuplication:Fuzzy matching

Duplication:Fuzzy matching

Integrity:Relationship Identification

Integrity:Relationship Identification

Accuracy:Using reference data to validate

Range:Identify outliers

Customer Data - Examples

Page 4: DataLab DataQuality Dimensions

2015 Copyright DataLab/ Confidential Information / Distribution Prohibited

COMPLETENESS CONFORMITY CONSISTENCY ACCURACY DUPLICATION INTEGRITY

Product Data - Examples

Page 5: DataLab DataQuality Dimensions

2015 Copyright DataLab/ Confidential Information / Distribution Prohibited

COMPLETENESS

CONFORMITY

CONSISTENCY

DUPLICATION

INTEGRITY

ACCURACY

Finance Data - Examples

Page 6: DataLab DataQuality Dimensions

BECAUSE DATA MATTERSObrigado

Page 7: DataLab DataQuality Dimensions

Carlos Guerreiro | Sales & Marketing Manager

e: [email protected]: +351 91 765 11 99