get it clean and keep it clean

38
Data, how to get it clean and keep it clean? The best way to make money is to stop wasting it!

Upload: dq-global

Post on 07-Dec-2014

174 views

Category:

Software


3 download

DESCRIPTION

Presentation on "Data, how to get it clean and keep it clean?" We can help with your data quality issues.

TRANSCRIPT

Page 1: Get it Clean and Keep it Clean

Data, how to get it clean and keep it clean?

The best way to make money is to stop wasting it!

Page 2: Get it Clean and Keep it Clean

Agenda:

Who are DQ

Setting the scene

Acceptable Quality

Data Defects

Get it Clean

Keep it Clean

Q&A via web chat

Close

Page 3: Get it Clean and Keep it Clean

Setting the scene…

Who are we ?

What do we do ?

How do we do it

?

What’s in it for

our clients ?

Page 4: Get it Clean and Keep it Clean

UK B2C Data – annual rates of change…

UK Population is 63.23 M

• Over 3.25 M (5.1%) people move house• 0.584 M (0.9%) people pass away• 0.813 M (1.3%) Births• 0.290 M (0.5%) Marry• 0.130 M (0.2%) Divorce• 0.500 M (1.9%) Changes by Royal Mail• 0.250 M (1.4%) people sign up to MPS

UK Households 26.4 M

½ life of B2C data 1 to 1.2 years

Page 5: Get it Clean and Keep it Clean

UK B2B Data – annual rates of change…

4.934 M trading businesses in the UK• 3.10 M (62.8%) sole proprietorships• 0.43 M (8.8%) partnerships• 1.40 M (28.4%) limited companies• 0.60 M (12.2%) dormant businesses

5.7 M company or individual details changes:• 1 moves every 6 Minutes• 1 fails every 4 minutes

On average a person changes jobs 11 times during their career

Over 1.1 M (22.3%) businesses are registered with the CTPS

• 99.9% of businesses employ less than 250 staff • 99.2% of businesses employ less than 50 people who employ 59% of total staff

2.43 M employees of UK businesses:

@ 24% p.a. ½ life attrition = 3 years

@ 35% p.a. ½ life attrition = 2 years

Page 6: Get it Clean and Keep it Clean

Data decay – the impacts…

Financial:• £220 M per-annum wasted on inaccurate mailings• £95 M per-annum wasted by companies mailing people who have moved addresses• It costs more to mail a moved or deceased individual than to suppress them• Increase response rates – the same return with less mail

Brand:• Duplicates and incorrect details cause a negative perception• Mailing deceased individuals or bereaved families causes significant distress• Mailing someone who no longer lives at an address does not impress

Compliance:• Best practice – comply with Direct Marketing Association guidelines• Calling a consumer who has registered their objection to receiving direct marketing phone calls is illegal • Mailing a consumer who has registered their objection to receiving direct mail is bad management, contravenes the DMA Code of Practice and could be illegal

Environment

• Protect the environment – help cut down on wasteful mailing

Page 7: Get it Clean and Keep it Clean

The human factorsAcknowledging there is a problem

Page 8: Get it Clean and Keep it Clean

The Data Quality Delusion

Everyone understand the importance of data quality

Everyone agrees data

quality is important

Everyone cares about data

quality

Everyone knows what actions to take to improve

data quality

Page 9: Get it Clean and Keep it Clean

Opening the Johari Window

Seeing what you don’t currently see!

Page 10: Get it Clean and Keep it Clean

Unknown AreaUnknown to others and unknown to self

Johari Window

Johari Window - You don’t know what you don’t know...

Self

Others

Expand the Open Area

Reduce Blind Area

Reduce the Hidden Area

?

Johari Window

Page 11: Get it Clean and Keep it Clean

Acceptable levels of data quality?

All data has some level of quality, the question

is at what level is it unacceptable?

How does anyone know?

Who’s responsi

ble?

How much is low quality

data actually costing?

Unacceptable

Acceptable

Page 12: Get it Clean and Keep it Clean

All data has some level of quality, the question is at what level is it unacceptable.

Temp< 37°C

Hyperthermia

Temp= 37°C

Normal

Temp> 37°C

Abnormal

Temp> 37.8°C

Get help

Page 13: Get it Clean and Keep it Clean

How can we end up with bad data?

A Boy's name

beginning with the letter J:

"Gerald.."

A word beginning with Z: "Xylopho

ne.."

A part of the body beginning with N: "Knee..“

A mode of transport that you can walk in: "Your shoes.."

Page 14: Get it Clean and Keep it Clean

Getting your data clean and keeping it clean

Identify, correct, prevent

Page 15: Get it Clean and Keep it Clean

Get it Clean the basics

About “CURING” data defects• Mastering & Merging• Manual review

Batch process automation

Mass defect identification

Time consuming

More costly than prevention

Page 16: Get it Clean and Keep it Clean

Keep it Clean the basics

Prevention better than cure

• People• Process• Technology

Ongoing process

Costs of prevention many times lower than cure!

Page 17: Get it Clean and Keep it Clean

Waging war on error…

Finding

defects

Defini

ng st

andards

Correcting

data

Preventin

g error

Monitorin

g defects

Reference

data

Internal d

ata

Page 18: Get it Clean and Keep it Clean

Boolean Logic & Dates

DD/MM/YY v MM/DD/YY• 10/10/09 = 10/10/09• 99/99/99 was

accepted as a valid date structure yet it’s clearly wrong

Is it European format

DD/MM/YYYY or US format MM/DD/YYYY?

Precision• DD/MM/

YY or DD/MM/YYYY

OK to Mail = Y

Not OK to Mail =

Y

OK to Mail = N

Not OK to Mail =

N

Page 19: Get it Clean and Keep it Clean

Numbers in Text and Shared Numbers

Systems Contain:

•0’s and/or O’s•1’s and/or I’s•Tel numbers with 9 x 000 000 000 Same product –

different numbers in 2 systems

Page 20: Get it Clean and Keep it Clean

Misinterpretation & Standards

M = Male in one system

and Married in another

S = Single in one

system and Separated in another

Gender• 9 variants in

the gender field of a hotel project

Padhraic, Pádraig or PáraicLane, LN, Ln, Road, Rd, Rd. etc.MI or MichiganUS or USA or United StatesGB or UK or United KingdomMr. or MisterHants or Hampshire

Page 21: Get it Clean and Keep it Clean

Dislocation, misfielding

Address A Address B123 Arcasia Avenue

123 Arcasia Ave

Fareham

Hampshire Fareham

PO16 8XT HantsPO16 8XT

Person A Person B

MartinP Martin PDoyle Doyle

02392 988303 +1 312-253-7873

+1 312-253-7873 02392 988303

Page 22: Get it Clean and Keep it Clean

Anomalies & Congruence

eMail does not tally with

name parts

Currency does not

tally with

location

Goods shipped before order

Values not in

application pick lists

(metadata)

Default values used

Notes (memo)

fields used without

validation rules

Page 23: Get it Clean and Keep it Clean

DQ Studio – identifying and fixing

• Product demonstration by:• Martin Kerr

• How to connect, identify and correct defects…

Page 24: Get it Clean and Keep it Clean

DQ Studio

Classify

• Is the data in your database what you think it is?

Compare

• How similar is value A to value B in % similarity

Format• Email• I.P.• Postcode• Telephone• URL

Generate:• phonetic tokens• pattern tokens

Transform data• 13 Categories• 5 Spoken Languages

Validate• Email• I.P. Address• Postal code• Telephone• URL

Page 25: Get it Clean and Keep it Clean

DQ Studio

Derive:

• Job Title• Role• Level

• Gender• Male, female, unknown

• Telephone• Country• Location• Number Type

Parse:

• Email• I.P. Address• Telephone

Verify

• Locations (240 Countries)• Phones• Businesses• Contacts

Page 26: Get it Clean and Keep it Clean

Record matching

Identifying matches

Linking

Mastering

Merging

Updating

Page 27: Get it Clean and Keep it Clean

Matching – What is it?

• Identification and management of records which:• Are the same• Might be the same• Are not the same

• Table v Itself

• Table v Table

• PAF Batch• PAF

Lookup

• No Way• Gone Away• Passed Away• Append

Dedupe X-Match

X-Ref API

X-Ref Data

Page 28: Get it Clean and Keep it Clean

How is it done?

Black White

Manually • Internally• External Bureau service

Automatically • Software

Using black and white

magic...

• Black = Matches• White = Non Matches• Grey = Ambiguous

Carefully to avoid:

• Too many matches• Too few matches• Errors in matches

Page 29: Get it Clean and Keep it Clean

The grey areas - When is a match a match?

Bob = Bobby = Rob= Robert =

Robby= Roberto?

Thomson = Thompson =

Tomson = Thomson?

Xerox = Zerocks?

PO16 8XT = P0I6 8XT?

Page 30: Get it Clean and Keep it Clean

Grey to Black or Grey to White

• Transformations (Synonyms)• Phonetics• String comparisons• Intelligence

• Rules• Spelling• Typo’s

• Logic• Experience• Lookups

Page 31: Get it Clean and Keep it Clean

Mastering Perfection & merging?

Problems:• Which data survives?• Which data gets re-

assigned?• Which data gets stored?• Which data gets thrown

away

Solutions:• Define the record master• Define the field merge

rules• Use technology to

automate processes• Humanise exceptions

Page 32: Get it Clean and Keep it Clean

Perfect & Merge for

Identify Perfect Merge

Page 33: Get it Clean and Keep it Clean

Process flow

CRMDatabas

e

PrimaryID SecondaryID Score

{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EFF76F28-E8EE-E211-9968-0015F298503A} 100{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EE1F80ED-53F0-E211-BBCE-0015F298503A} 86{E9C12E3A-B7F2-E211-95FC-0015F298503A} {07F86F28-E8EE-E211-9968-0015F298503A} 100{E9C12E3A-B7F2-E211-95FC-0015F298503A} {062080ED-53F0-E211-BBCE-0015F298503A} 94{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1DF86F28-E8EE-E211-9968-0015F298503A} 100{FFC12E3A-B7F2-E211-95FC-0015F298503A} {81F86F28-E8EE-E211-9968-0015F298503A} 92{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1C2080ED-53F0-E211-BBCE-0015F298503A} 99{FFC12E3A-B7F2-E211-95FC-0015F298503A} {802080ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EBF76F28-E8EE-E211-9968-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4FF86F28-E8EE-E211-9968-0015F298503A} 82

{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EA1F80ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4E2080ED-53F0-E211-BBCE-0015F298503A} 82

{71F86F28-E8EE-E211-9968-0015F298503A} {702080ED-53F0-E211-BBCE-0015F298503A} 100

{6BF86F28-E8EE-E211-9968-0015F298503A} {6A2080ED-53F0-E211-BBCE-0015F298503A} 100{01C22E3A-B7F2-E211-95FC-0015F298503A} {1FF86F28-E8EE-E211-9968-0015F298503A} 100

{01C22E3A-B7F2-E211-95FC-0015F298503A} {83F86F28-E8EE-E211-9968-0015F298503A} 100

{01C22E3A-B7F2-E211-95FC-0015F298503A} {1E2080ED-53F0-E211-BBCE-0015F298503A} 100

{01C22E3A-B7F2-E211-95FC-0015F298503A} {822080ED-53F0-E211-BBCE-0015F298503A} 100

Page 34: Get it Clean and Keep it Clean

Match demonstration

Connectin

gDefini

ngIdentifying

Reviewing

Processing

Page 35: Get it Clean and Keep it Clean

Cleaning up your business systems:

Back-up your data

Define pick lists

Ensure legacy data conforms to picklists

Delete any temporary fields set-up for test and still in the production system

Delete or archive old dataIdentify contacts with no email and/or no telephone #

Identify and correct contacts with bogus phone numbers

Identify records whose email bounces

Identify businesses without contacts

Archive linked documents which are ‘n’ years old, however, take care with legal including: invoices and contracts

User admin – delete any users who no longer access systemsReview any prospects, suspects or opportunities not properly closed i.e. > ‘n’ weeks from opening

Page 36: Get it Clean and Keep it Clean

Actions to consider…

Change attitudes to “ABC” thinking

Think prevention not cure

Apply DQ processes

Verify, Format & Validate

Suppress records

Merge duplicates

Append missing data for segmentation

Govern and Comply

Measure & Manage

Get a CXO sponsorPrune & Consolidate & Remove competitionCommon dictionary of terms

Define customer value, and lifetime?

Page 37: Get it Clean and Keep it Clean

In conclusion…

Identify• recogni

se there is a problem?

Qualify• gather

evidence, what, when, where and how large is the problem?

Quantify• what’s

specifically doing the damage?

Accept• acknowl

edge the scale of the task?

Define• the goals

and what will be measured?

Perform• carry out

the tasks agreed in the order or significance

Page 38: Get it Clean and Keep it Clean

Questions…

• Build a better business based on trusted data…

• Contact DQ Global• www.DQGlobal.com

• Talk to a consultant• [email protected]• +44 2392 988303 (Europe)• +1 314-253-7873 (North America)