get it clean and keep it clean
DESCRIPTION
Presentation on "Data, how to get it clean and keep it clean?" We can help with your data quality issues.TRANSCRIPT
![Page 1: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/1.jpg)
Data, how to get it clean and keep it clean?
The best way to make money is to stop wasting it!
![Page 2: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/2.jpg)
Agenda:
Who are DQ
Setting the scene
Acceptable Quality
Data Defects
Get it Clean
Keep it Clean
Q&A via web chat
Close
![Page 3: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/3.jpg)
Setting the scene…
Who are we ?
What do we do ?
How do we do it
?
What’s in it for
our clients ?
![Page 4: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/4.jpg)
UK B2C Data – annual rates of change…
UK Population is 63.23 M
• Over 3.25 M (5.1%) people move house• 0.584 M (0.9%) people pass away• 0.813 M (1.3%) Births• 0.290 M (0.5%) Marry• 0.130 M (0.2%) Divorce• 0.500 M (1.9%) Changes by Royal Mail• 0.250 M (1.4%) people sign up to MPS
UK Households 26.4 M
½ life of B2C data 1 to 1.2 years
![Page 5: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/5.jpg)
UK B2B Data – annual rates of change…
4.934 M trading businesses in the UK• 3.10 M (62.8%) sole proprietorships• 0.43 M (8.8%) partnerships• 1.40 M (28.4%) limited companies• 0.60 M (12.2%) dormant businesses
5.7 M company or individual details changes:• 1 moves every 6 Minutes• 1 fails every 4 minutes
On average a person changes jobs 11 times during their career
Over 1.1 M (22.3%) businesses are registered with the CTPS
• 99.9% of businesses employ less than 250 staff • 99.2% of businesses employ less than 50 people who employ 59% of total staff
2.43 M employees of UK businesses:
@ 24% p.a. ½ life attrition = 3 years
@ 35% p.a. ½ life attrition = 2 years
![Page 6: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/6.jpg)
Data decay – the impacts…
Financial:• £220 M per-annum wasted on inaccurate mailings• £95 M per-annum wasted by companies mailing people who have moved addresses• It costs more to mail a moved or deceased individual than to suppress them• Increase response rates – the same return with less mail
Brand:• Duplicates and incorrect details cause a negative perception• Mailing deceased individuals or bereaved families causes significant distress• Mailing someone who no longer lives at an address does not impress
Compliance:• Best practice – comply with Direct Marketing Association guidelines• Calling a consumer who has registered their objection to receiving direct marketing phone calls is illegal • Mailing a consumer who has registered their objection to receiving direct mail is bad management, contravenes the DMA Code of Practice and could be illegal
Environment
• Protect the environment – help cut down on wasteful mailing
![Page 7: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/7.jpg)
The human factorsAcknowledging there is a problem
![Page 8: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/8.jpg)
The Data Quality Delusion
Everyone understand the importance of data quality
Everyone agrees data
quality is important
Everyone cares about data
quality
Everyone knows what actions to take to improve
data quality
![Page 9: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/9.jpg)
Opening the Johari Window
Seeing what you don’t currently see!
![Page 10: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/10.jpg)
Unknown AreaUnknown to others and unknown to self
Johari Window
Johari Window - You don’t know what you don’t know...
Self
Others
Expand the Open Area
Reduce Blind Area
Reduce the Hidden Area
?
Johari Window
![Page 11: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/11.jpg)
Acceptable levels of data quality?
All data has some level of quality, the question
is at what level is it unacceptable?
How does anyone know?
Who’s responsi
ble?
How much is low quality
data actually costing?
Unacceptable
Acceptable
![Page 12: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/12.jpg)
All data has some level of quality, the question is at what level is it unacceptable.
Temp< 37°C
Hyperthermia
Temp= 37°C
Normal
Temp> 37°C
Abnormal
Temp> 37.8°C
Get help
![Page 13: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/13.jpg)
How can we end up with bad data?
A Boy's name
beginning with the letter J:
"Gerald.."
A word beginning with Z: "Xylopho
ne.."
A part of the body beginning with N: "Knee..“
A mode of transport that you can walk in: "Your shoes.."
![Page 14: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/14.jpg)
Getting your data clean and keeping it clean
Identify, correct, prevent
![Page 15: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/15.jpg)
Get it Clean the basics
About “CURING” data defects• Mastering & Merging• Manual review
Batch process automation
Mass defect identification
Time consuming
More costly than prevention
![Page 16: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/16.jpg)
Keep it Clean the basics
Prevention better than cure
• People• Process• Technology
Ongoing process
Costs of prevention many times lower than cure!
![Page 17: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/17.jpg)
Waging war on error…
Finding
defects
Defini
ng st
andards
Correcting
data
Preventin
g error
Monitorin
g defects
Reference
data
Internal d
ata
![Page 18: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/18.jpg)
Boolean Logic & Dates
DD/MM/YY v MM/DD/YY• 10/10/09 = 10/10/09• 99/99/99 was
accepted as a valid date structure yet it’s clearly wrong
Is it European format
DD/MM/YYYY or US format MM/DD/YYYY?
Precision• DD/MM/
YY or DD/MM/YYYY
OK to Mail = Y
Not OK to Mail =
Y
OK to Mail = N
Not OK to Mail =
N
![Page 19: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/19.jpg)
Numbers in Text and Shared Numbers
Systems Contain:
•0’s and/or O’s•1’s and/or I’s•Tel numbers with 9 x 000 000 000 Same product –
different numbers in 2 systems
![Page 20: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/20.jpg)
Misinterpretation & Standards
M = Male in one system
and Married in another
S = Single in one
system and Separated in another
Gender• 9 variants in
the gender field of a hotel project
Padhraic, Pádraig or PáraicLane, LN, Ln, Road, Rd, Rd. etc.MI or MichiganUS or USA or United StatesGB or UK or United KingdomMr. or MisterHants or Hampshire
![Page 21: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/21.jpg)
Dislocation, misfielding
Address A Address B123 Arcasia Avenue
123 Arcasia Ave
Fareham
Hampshire Fareham
PO16 8XT HantsPO16 8XT
Person A Person B
MartinP Martin PDoyle Doyle
02392 988303 +1 312-253-7873
+1 312-253-7873 02392 988303
![Page 22: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/22.jpg)
Anomalies & Congruence
eMail does not tally with
name parts
Currency does not
tally with
location
Goods shipped before order
Values not in
application pick lists
(metadata)
Default values used
Notes (memo)
fields used without
validation rules
![Page 23: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/23.jpg)
DQ Studio – identifying and fixing
• Product demonstration by:• Martin Kerr
• How to connect, identify and correct defects…
![Page 24: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/24.jpg)
DQ Studio
Classify
• Is the data in your database what you think it is?
Compare
• How similar is value A to value B in % similarity
Format• Email• I.P.• Postcode• Telephone• URL
Generate:• phonetic tokens• pattern tokens
Transform data• 13 Categories• 5 Spoken Languages
Validate• Email• I.P. Address• Postal code• Telephone• URL
![Page 25: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/25.jpg)
DQ Studio
Derive:
• Job Title• Role• Level
• Gender• Male, female, unknown
• Telephone• Country• Location• Number Type
Parse:
• Email• I.P. Address• Telephone
Verify
• Locations (240 Countries)• Phones• Businesses• Contacts
![Page 26: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/26.jpg)
Record matching
Identifying matches
Linking
Mastering
Merging
Updating
![Page 27: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/27.jpg)
Matching – What is it?
• Identification and management of records which:• Are the same• Might be the same• Are not the same
• Table v Itself
• Table v Table
• PAF Batch• PAF
Lookup
• No Way• Gone Away• Passed Away• Append
Dedupe X-Match
X-Ref API
X-Ref Data
![Page 28: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/28.jpg)
How is it done?
Black White
Manually • Internally• External Bureau service
Automatically • Software
Using black and white
magic...
• Black = Matches• White = Non Matches• Grey = Ambiguous
Carefully to avoid:
• Too many matches• Too few matches• Errors in matches
![Page 29: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/29.jpg)
The grey areas - When is a match a match?
Bob = Bobby = Rob= Robert =
Robby= Roberto?
Thomson = Thompson =
Tomson = Thomson?
Xerox = Zerocks?
PO16 8XT = P0I6 8XT?
![Page 30: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/30.jpg)
Grey to Black or Grey to White
• Transformations (Synonyms)• Phonetics• String comparisons• Intelligence
• Rules• Spelling• Typo’s
• Logic• Experience• Lookups
![Page 31: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/31.jpg)
Mastering Perfection & merging?
Problems:• Which data survives?• Which data gets re-
assigned?• Which data gets stored?• Which data gets thrown
away
Solutions:• Define the record master• Define the field merge
rules• Use technology to
automate processes• Humanise exceptions
![Page 32: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/32.jpg)
Perfect & Merge for
Identify Perfect Merge
![Page 33: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/33.jpg)
Process flow
CRMDatabas
e
PrimaryID SecondaryID Score
{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EFF76F28-E8EE-E211-9968-0015F298503A} 100{D1C12E3A-B7F2-E211-95FC-0015F298503A} {EE1F80ED-53F0-E211-BBCE-0015F298503A} 86{E9C12E3A-B7F2-E211-95FC-0015F298503A} {07F86F28-E8EE-E211-9968-0015F298503A} 100{E9C12E3A-B7F2-E211-95FC-0015F298503A} {062080ED-53F0-E211-BBCE-0015F298503A} 94{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1DF86F28-E8EE-E211-9968-0015F298503A} 100{FFC12E3A-B7F2-E211-95FC-0015F298503A} {81F86F28-E8EE-E211-9968-0015F298503A} 92{FFC12E3A-B7F2-E211-95FC-0015F298503A} {1C2080ED-53F0-E211-BBCE-0015F298503A} 99{FFC12E3A-B7F2-E211-95FC-0015F298503A} {802080ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EBF76F28-E8EE-E211-9968-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4FF86F28-E8EE-E211-9968-0015F298503A} 82
{CDC12E3A-B7F2-E211-95FC-0015F298503A} {EA1F80ED-53F0-E211-BBCE-0015F298503A} 100{CDC12E3A-B7F2-E211-95FC-0015F298503A} {4E2080ED-53F0-E211-BBCE-0015F298503A} 82
{71F86F28-E8EE-E211-9968-0015F298503A} {702080ED-53F0-E211-BBCE-0015F298503A} 100
{6BF86F28-E8EE-E211-9968-0015F298503A} {6A2080ED-53F0-E211-BBCE-0015F298503A} 100{01C22E3A-B7F2-E211-95FC-0015F298503A} {1FF86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {83F86F28-E8EE-E211-9968-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {1E2080ED-53F0-E211-BBCE-0015F298503A} 100
{01C22E3A-B7F2-E211-95FC-0015F298503A} {822080ED-53F0-E211-BBCE-0015F298503A} 100
![Page 34: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/34.jpg)
Match demonstration
Connectin
gDefini
ngIdentifying
Reviewing
Processing
![Page 35: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/35.jpg)
Cleaning up your business systems:
Back-up your data
Define pick lists
Ensure legacy data conforms to picklists
Delete any temporary fields set-up for test and still in the production system
Delete or archive old dataIdentify contacts with no email and/or no telephone #
Identify and correct contacts with bogus phone numbers
Identify records whose email bounces
Identify businesses without contacts
Archive linked documents which are ‘n’ years old, however, take care with legal including: invoices and contracts
User admin – delete any users who no longer access systemsReview any prospects, suspects or opportunities not properly closed i.e. > ‘n’ weeks from opening
![Page 36: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/36.jpg)
Actions to consider…
Change attitudes to “ABC” thinking
Think prevention not cure
Apply DQ processes
Verify, Format & Validate
Suppress records
Merge duplicates
Append missing data for segmentation
Govern and Comply
Measure & Manage
Get a CXO sponsorPrune & Consolidate & Remove competitionCommon dictionary of terms
Define customer value, and lifetime?
![Page 37: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/37.jpg)
In conclusion…
Identify• recogni
se there is a problem?
Qualify• gather
evidence, what, when, where and how large is the problem?
Quantify• what’s
specifically doing the damage?
Accept• acknowl
edge the scale of the task?
Define• the goals
and what will be measured?
Perform• carry out
the tasks agreed in the order or significance
![Page 38: Get it Clean and Keep it Clean](https://reader033.vdocuments.site/reader033/viewer/2022052315/5483b46db4af9f38278b4676/html5/thumbnails/38.jpg)
Questions…
• Build a better business based on trusted data…
• Contact DQ Global• www.DQGlobal.com
• Talk to a consultant• [email protected]• +44 2392 988303 (Europe)• +1 314-253-7873 (North America)