9 th may 2006 data quality and ensuring usability …of routinely collected pc data presented to...
TRANSCRIPT
9th May 2006
Data Quality and Ensuring Usability
…of routinely collected PC data
Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s Box
Presented bySimon de [email protected]
About me•GP in Guildford
•11,500 patient practice•6.5 Whole time equivalent GPs•Computerised since 1988
•Senior Lecturer, St. Georges• Primary Care Informatics (PCI) research group
Using routinely collected data for quality improvement + research
Electronic libraries Computer in the consultation Telemonitoring
• Chair PCI WG of EFMI• Developing a BSc in BMI
Overview• Introduction
• Benefits from linking clinical + genetic data
• Growing volumes of accessible primary care data… …increasingly used for quality improvement + research
• Objective•Is it possible to define the features of a routinely collected dataset which can be integrated to
genetic data
• Method• Literature review + 10 years of experiential learning working with data
•Features of “quality” data:1. What is data quality?
2. Unique identifiers + denominators
3. What need to be defined about data processing + storage
• Discussion
Introduction
• “GIVEN” Benefits from linking clinical and genetic data
• Routinely collected clinical data is used increasingly for:1. Quality improvement
2. Clinical Audit
3. Health Service Planning
4. Research
References:
1. de Lusignan S, van Weel C. The use of routinely collected computer data for research in primary care: opportunities and challenges. Fam Pract. 2006 Apr;23(2):253-63.
2: de Lusignan S, Hague N, van Vlymen J, Kumarapeli P. Routinely collected general practice data are complex but with systematic processing can be used for quality improvement and research. Accepted for publication: Informatics in primary care
Objective
• To define the features of clinical data which make them fit for integration with genetic data
Features of “quality” data
• Defining Data Quality
• Unique identitifiers
• Defined process of data extraction + storage
Defining data quality
Evolving definitions:
•Completeness + accuracy (Pringle et al. BJGP 1995)
• Currency (Williams, Methods 2003)
• Sensitivity + positive predictive value (Thiru et al., BMJ 2003)
• Data Quality Probe (Brown + Warmington IPC 2003)
• “Fit for purpose” (PCI WG EFMI, 2005)
Unique IDs
• Linkage of data
• Interoperability of systems
• Follow-up / traceability of individuals
• Population denominator + ghosts….
•England + Wales - NHS number
•Scotland - CHI number
Our system
•“MIQUEST” unique ID for one practice + compound with study number + unique ID for practice
•Convert to non-case sensitive ASCII format
Processing data
(1) Appreciation of data entry issues + contemporary perspective of system users;
(2) Defined stages of data processing + applications used at each stage, + quality controls;
(3) Archive coding systems and the look-up tables used to infer meaning or rubrics;
(4) The queries used to extract the data;
(5) A metadata system to ensure traceability of each cell of data;
(6)The ethical constraints that apply to the dataset.
(1) Data entry issues + contemporary perspective of users
• COPD and Bronchitis codes are easily confused
•Recoding half of the practice asthmatics from a diagnosis to “history of” code
Ref: Faulconer ER, de Lusignan S. An eight-step method for assessing diagnostic data quality: COPD as an exemplar. Inform Prim Care. 2004;12(4):243-54.
(2) Defined stages of data processing
We have defined eight discrete steps in data processing:
(1) Design of queries, + piloting,
(2) Data: entry, (already dealt with)
(3) Extraction,
(4) Migration, unique IDs essential
(5) Integration,
(6) Cleaning,
(7) Processing, and
(8) Analysis
Ref: van Vlymen J, de Lusignan S, Hague N, Chan T, Dzregah B. Ensuring the Quality of Aggregated General Practice Data:
Lessons from the Primary Care Data Quality Programme (PCDQ). Stud Health Technol Inform. 2005;116:1010-5.
(3) Archive coding systems….
• Coding systems are constantly evolving
• In general coding systems are becoming larger + more complex
•You can go from many to few; but not from few to many…
• We archive:Clinical codes look-up engine used
e.g. NHS Triset Browser
• Each relevant version E.g. 4 and 5-Byte Read Codes; Drug Dictionary, Proprietary codes
Example of “look-up engine”
(4) The query library
• Re-issued by date
• Query set for each clinical programme• e.g. C1, C2, C3 – Cardiac programme
• Query set for each extraction type• e.g. E4, E5, G4, G5 (E for EMIS, G for Generic)
• Defined look-up tables + rubrics for queries
The query library…
The “C2” queries
The “C2” EMIS 5-Byte set
(5) Metadata system
• Follows data from query set to analysis
• Preserves original data
• Derived variables clearly identified
• Associated dates + numerics labelled• Rules for units used
• Look-up table used to define variable names
van Vlymen J, de Lusignan S. A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. Inform Prim Care. 2005;13(4):281-91.
Source data – metadata structure
originating query
set bigram
query file
Read code /CCC
repeat index
type bigram
C 2 _ PDNPP1
_ G 3 _ 1 _ D I
BIGRAM MEANING
DI Diagnosis
RX Drugs Prescription
OC Occupation
HO HistorySymptoms
OE ExaminationSigns
Linking elements:Query libraryQuery & Core Clinical Concept Read code
Core clinical concept (CCC)
Automation
(6) Ethics
• The Ethical constrains on any dataset are indexed in the query library
9th May 2006
Summary
Summary
• Data quality is best defined in terms of •“Fitness for purpose” - What purpose when?
•Transparent methods of data processing allow audit of results
• Understanding data entry issues / context is essential• Metadata can help control processing
• Careful curation of data may allow its use beyond the timescale of the original study
9th May 2006
Thanks for listening
Simon de Lusignan
Tel: 020 8725 5661Fax: 020 8767 7697Email: [email protected]: www.gpinformatics.org
www.sgul.ac.uk/informatics/