copyright 2009 bsolv. all rights reserved citizen360 identity resolution introduction to the...
TRANSCRIPT
Copyright 2009 bSolv. All rights reserved
Citizen360 Identity Resolution
Introduction to the Identity Resolution (IR) processes
Version 1.0
You should see the system overview before you run this presentation. Click here to launch the overview presentation.
Copyright 2009 bSolv. All rights reserved
Citizen360 – Introduction to Identity Resolution
Identity Resolution (IR)
This is the process by which computer records are analyzed to find those records which represent the same physical person and to subsequently merge or link those records.
Copyright 2009 bSolv. All rights reserved
Citizen360 IR Approach
IR Sweep - this batch program identifies possible citizen matches– Built with a high degree of parallelism - up to 10 instances can run in parallel– Can be configured to run against customized citizen models – Runs a configurable IR Algorithm– Can set the confidence-threshold level (e.g., 70%) at which match results are not
reported
IR Algorithm – this is the algorithm, used by the IR Sweep program, that calculates the “match confidence level” between two citizens
– The algorithm results in a single confidence result expressed as a percentage, e.g., 83%– The algorithm is made up of three major components:
Identifier Match, e.g., SSN Personal Demographic Data (PDD) match, e.g., names, ages, and gender Location match, e.g., phones, emails, and addresses.
Record “Merge”– This moves the different citizen detail-records under the same “citizen header”– Although called a “merge” it is really a “link”. The source systems are not forced to be
the same
Copyright 2009 bSolv. All rights reserved
Date of BirthDate: 07/13/1965Source DOH
Date of BirthDate: 07/14/1965Source DSS
CitizenId: 222222Master Index:10001340065
Date of BirthDate: 07/14/1965Source DSS
CitizenId: 333333Master Index:1000130073
Date of BirthDate: 07/13/1965Source DOH
Merging does not change the data – it is still held by Source System
CitizenId: 111111Master Index: 10001340057
Date of BirthDate: 07/14/1965Source DHS
Identity Resolution ProcessMatch: 83%
Identity Resolution ProcessMatch: 82%
Master Index HistoryValue: 10001340057
Master Index HistoryValue: 10001340065
Master Index HistoryValue: 1000130073
We can continue to use any of the original/historic “master index” values to reference the citizen
Based on Identity Resolution processes we may decide to merge other records…
The data is still unique by source system - but we now know that it is for a common citizen
Copyright 2009 bSolv. All rights reserved
IR Algorithm Configuration
Data elements that are compared are given “grades”:
– None A confirmed non-match
– Approximate A less exact match or quite often one or more values
are absent (null)– Close
For example SSNs that have some digits swapped, dates of birth that are 1 day apart, a name that “sounds like” another name
– Exact A confirmed exact match
Each data element grade type is given a score between 0 and 1 (exact)
– The grade scoring is configurable through the user interface
Each data element grade is weighted and applied to the overall score
– The data element weighting is configurable through the user interface
Copyright 2009 bSolv. All rights reserved
IR Algorithm Sophistication – a few examples
Can select the preferred phonetic algorithms for different fields, e.g., Soundex, Metaphone, Double Metaphone, Phonex, NYCIIS
The Double Metaphone phonetics comparison is generally the best for names:– Much more powerful than Soundex– Can properly handle Eastern European names, e.g., Budjinski– Considers correct and incorrect pronunciations of names such as “Juan”, e.g., “hwahn”
and “jewann”– Can handle silent B in Bomb and Dumb, etc.
The address-comparison converts the addresses into the best-fit standardized post office address names
Full and partial address matches Dates are considered Close matches if they are within a range, only have a single
digit difference, or the format is possibly different (US standard - mm/dd/yyyy, compared to US INS - dd/mm/yyyy)
Emails with the same name but different domains are considered Close matches, e.g., [email protected] and [email protected]
Number fields (e.g., SSNs) are considered Close if they just have digits swapped, if they are the same except a digit is missing from one, etc.
Copyright 2009 bSolv. All rights reservedConfidential and Proprietary
THANK YOU
www.bsolv.com
3330 Cumberland BoulevardSuite 500Atlanta, GA 30339
Office: +1 678.638.6692