linking electronic patient records and death records: challenges and opportunities
TRANSCRIPT
Linking ELECTRONIC patient RECORDs and death records: Challenges and opportunities
Mike Hogarth, MD, FACP, FACMI
http://hogarth.ucdavis.edu
March 21, 2017
Overview• Problem statement• Review of death certificate data• Overview of CA-EDRS• Sources for “death records” • Entity matching– Deterministic, probabilistic
• Matching modes – supervised vs. unsupervised• Where to match – front office vs. back office• Things to consider – the impact of “false” and where• Matching tools
Problem• The EHR records have a large number of falsely
“alive” patients–Most patients pass away outside of the healthcare
organization’s hospitals/clinics– There is no process to notify healthcare
organizations that a patient previously seen has expired
– Healthcare organizations do not have a systematic and reliable source for death information about their patients• Those who know of the expiration do not know the
healthcare org cared for the patient• Healthcare orgs are left to perform “matching”
California EDRS (CA-EDRS)CA-EDRS CA-FDRS
Deaths/Year ~
250,000 2,500
Go Live Jan 1, 2005 May 1, 2013
Users 4,438 1,871Funeral Directors 753 271FH Staff 1,860 748Certifiers 0 0MF Staff 566 251ME/Coroners 491 251
Organizations 1,512 1,512Funeral Homes 1,177 1,177Hospitals 208 208ME/Coroner 58 58
Two national death files: CDC NDI and SSA DMF
• National Death Index (CDC)– Application form– 2-3 months review– Study subject matching against
CDC’s national death records– $$
• “Death Master File” (SSA)– 1962-present (83million)– 2011 – no longer includes
‘protected’ state records (removed 4.2m records)
– Since 2011, has 1M fewer records per year – about 40% of all annual deaths are no longer in the DMF)
– ~$60,000 license fee
Using the CDC NDI• Requires you to submit your data to CDC• Only approved for use in matching clinical
trial/research data• Can be delayed – up to 24mo before all deaths
from a year are in the file• Approval takes ~2 months • Costs
Issues with Death Master File
• Has all deaths prior to 2011, but ongoing is missing significant numbers of deaths
• Updated annually• Can be ~24mo behind• No longer includes all deaths in the US
annually– Only about 50% of deaths per year are in DMF
today
California’s Fact of Death File
• As of 2016, California Dept. of Public Health (CDPH) has made a fact of death file available to healthcare organizations to match against their records
• Provided monthly• Data elements for matching in the file– First name– Middle name– Last name– Gender– Date of birth
• Does not include SSN or cause of death
California Law Regarding Preservation and Release of Vital Records Data (Health and Safety Code – 102230)
California Research Files
• CDPH has a process for applying for identified death files with data beyond the fact of death file– Requires IRB review and Vital Statistics Advisory
Committee (VSAC) approval– Used to be a “one time” file, but they will consider
on-going distribution on a monthly basis
Matching records – entity matching and ‘record linkage’
• There are two ways to link/match records– Deterministic matching– Probabilistic matching
• Probabilistic matching allows one to assign weights to different data elements used in the matching and use a threshold rather than an “all or none” determination on matching
Why use Probabilistic Matching?
• Can handle missing data in a weighted fashion• One can have “possible matches”, in addition
to “matches” and ”non-matches”• Can adjust the thresholds for matches and
possible matches• Can be ‘trained’ to perform with less human
“custom rule making”
“WHERE” to match and its value/rationale
• Where to implement the match can vary dramatically in terms of tolerance to incorrect matching and its impact on the person and/or institution
• “Front office” (EHR)– The avoid scheduling deceased patients– To express condolences to the family– To prevent fraud
• “Back office” or “Data Warehouse” (Population Analytics, Quality metrics)– To improve accuracy of population/quality metrics– Incorrect quality reporting could have a significant impact
• Quality metrics must be reported to CMS under MACRA and will be counted toward the composite performance score (CPS)
Matching modes• Fully Automated matching without
confirmation– A software matching system is employed and
changes the vital status field automatically and without confirmation by a human
• Supervised matching– Software is used for matching but results are
confirmed before the system flag is set– In other words, the software is used as a
‘screening’ to find record matches that should be further explored and confirmed
Vital status - how to think about it• If we consider the vital status flag as “truth” and
”alive” as having the condition (of being alive), then:– True positive (TP) – when your vital status flag is “correct”
as indicating the patient is “alive”– True negative (TN) – when your vital status flag is
“correct” as indicating the patient is “deceased”– False positive (FP) – when your vital status flag has the
patient “alive” but they are actually deceased– False negative (FN) -- when your vital status flag has the
patient ”deceased” but they are actually alive
Vital Status and “False”• It is not possible to have 100% correct status in your
system because you are doing matching at a later date with a source data set and matching approach that cannot guarantee 100% TP and TN– You will have to deal with some degree of incorrectness– So, it is inevitable to have FPs and FNs!
• Two possibilities– False Positive (FP): Patient is deceased, but your system
shows them alive– False Negative(FN): Patient is alive, but your system shows
them deceased
What is done today? • Today, few if any healthcare systems have
access to a file for matching against the EHR• Healthcare systems ”learn” of patient deaths
because they “hear” about them from family or their providers– Similar to “supervised matching” in that the family
notification invokes a process to confirm the status, if possible.
• Some patient pass in the hospital so the vital status is set by staff – the minority of the deceased in your databases
What do you have today in your systems?
• You have a significant rate of False Positives in the EHR and the Clinical Data Warehouse, which receives its vital status from the EHR– You have a low rate of False Negatives
• What is your FP rate (how incorrect are you)?– Depends on the age group• the older the patient age group, the higher the error
(higher FP rate)
UC Health Patients Alive and >85
There were only 600,000 Californians over 85 in 2010!
1.8M non-deceased and over 85 across UC Health
Things to consider• You do NOT have to implement automated matching in
both front office and back office• You CAN start with automated unsupervised matching
in the Clinical Data Warehouse where you have low effort, low risk, high value– Your quality metrics will be more correct– You can tolerate some “false negatives”, which would have
no impact on the front office, or patient
• If you have enough staff, and a high fidelity matching process, you CAN consider implementing supervised matching in the front office (EHR)– You will still be VERY unlikely to have False Negatives from a
poorly performing matching system
Most likely errors of an entity matching system
• “False Positive” is by far the most common error by a matching system– FP – it fails to detect a match that is there, so the record
continues as “alive” when the person is deceased
• ”False Negatives” are quite uncommon because of how rare it is to have two individuals with exactly the same name (first, middle, and last), gender, and date of birth– It is possible but not common– One can require ‘supervised confirmation’ if you have two
records in your EHR/CDW that match an EDRS record.
Where are we with the file today• We have an existing agreement with CDPH for the
”fact of death” file (2005 – present) – Available to all UC Health sites– The fact of death California death file is available through
a secure site hosted by UCSD – required 2 factor RSA authentication in addition to login/pw
– UCD required an MOU to be signed with UCD for me to provide you the file (because you have to agree not to misuse the file, which is a misdemeanor per CDPH agreement)
• We are applying for a file that includes SSN and cause of death – through the VSAC process
Getting Started
• You can start by performing automated unsupervised matching in the clinical data warehouse– A “false negative”, even if it happened, would
have no impact on the EHR and/or patient– The ”false positive” rate for “alive” is so high in
the clinical data warehouse, that even a poorly performing match because it uses a low number of common data elements without SSN is likely to help you get “more correct” than you are today• Remember – 100% perfection is not realistic or possible
The DecEnt Matching Tool• We have a simple command line java tool we developed that
uses Oyster, an open source implementation of probabilistic matching based on Fellegi-Sunter
• It loads edrs data we furnish and performs matching on first, middle, last, gender, dob
IBM Initiate
• A sophisticated matching system designed for healthcare and identifying duplicate records in different clinical databases (matching)
• Used in many healthcare systems already (over 60% of the market)
• Requires SSN