9 th may 2006 data quality and ensuring usability …of routinely collected pc data presented to...

26
9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s Box Presented by Simon de Lusignan [email protected]

Upload: gina-elledge

Post on 31-Mar-2015

222 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

9th May 2006

Data Quality and Ensuring Usability

…of routinely collected PC data

Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s Box

Presented bySimon de [email protected]

Page 2: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

About me•GP in Guildford

•11,500 patient practice•6.5 Whole time equivalent GPs•Computerised since 1988

•Senior Lecturer, St. Georges• Primary Care Informatics (PCI) research group

Using routinely collected data for quality improvement + research

Electronic libraries Computer in the consultation Telemonitoring

• Chair PCI WG of EFMI• Developing a BSc in BMI

Page 3: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Overview• Introduction

• Benefits from linking clinical + genetic data

• Growing volumes of accessible primary care data… …increasingly used for quality improvement + research

• Objective•Is it possible to define the features of a routinely collected dataset which can be integrated to

genetic data

• Method• Literature review + 10 years of experiential learning working with data

•Features of “quality” data:1. What is data quality?

2. Unique identifiers + denominators

3. What need to be defined about data processing + storage

• Discussion

Page 4: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Introduction

• “GIVEN” Benefits from linking clinical and genetic data

• Routinely collected clinical data is used increasingly for:1. Quality improvement

2. Clinical Audit

3. Health Service Planning

4. Research

References:

1. de Lusignan S, van Weel C. The use of routinely collected computer data for research in primary care: opportunities and challenges. Fam Pract. 2006 Apr;23(2):253-63.

2: de Lusignan S, Hague N, van Vlymen J, Kumarapeli P. Routinely collected general practice data are complex but with systematic processing can be used for quality improvement and research. Accepted for publication: Informatics in primary care

Page 5: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Objective

• To define the features of clinical data which make them fit for integration with genetic data

Page 6: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Features of “quality” data

• Defining Data Quality

• Unique identitifiers

• Defined process of data extraction + storage

Page 7: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Defining data quality

Evolving definitions:

•Completeness + accuracy (Pringle et al. BJGP 1995)

• Currency (Williams, Methods 2003)

• Sensitivity + positive predictive value (Thiru et al., BMJ 2003)

• Data Quality Probe (Brown + Warmington IPC 2003)

• “Fit for purpose” (PCI WG EFMI, 2005)

Page 8: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Unique IDs

• Linkage of data

• Interoperability of systems

• Follow-up / traceability of individuals

• Population denominator + ghosts….

•England + Wales - NHS number

•Scotland - CHI number

Our system

•“MIQUEST” unique ID for one practice + compound with study number + unique ID for practice

•Convert to non-case sensitive ASCII format

Page 9: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Processing data

(1) Appreciation of data entry issues + contemporary perspective of system users;

(2) Defined stages of data processing + applications used at each stage, + quality controls;

(3) Archive coding systems and the look-up tables used to infer meaning or rubrics;

(4) The queries used to extract the data;

(5) A metadata system to ensure traceability of each cell of data;

(6)The ethical constraints that apply to the dataset.

Page 10: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(1) Data entry issues + contemporary perspective of users

• COPD and Bronchitis codes are easily confused

•Recoding half of the practice asthmatics from a diagnosis to “history of” code

Ref: Faulconer ER, de Lusignan S. An eight-step method for assessing diagnostic data quality: COPD as an exemplar. Inform Prim Care. 2004;12(4):243-54.

Page 11: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(2) Defined stages of data processing

We have defined eight discrete steps in data processing:

(1) Design of queries, + piloting,

(2) Data: entry, (already dealt with)

(3) Extraction,

(4) Migration, unique IDs essential

(5) Integration,

(6) Cleaning,

(7) Processing, and

(8) Analysis

Ref: van Vlymen J, de Lusignan S, Hague N, Chan T, Dzregah B. Ensuring the Quality of Aggregated General Practice Data:

Lessons from the Primary Care Data Quality Programme (PCDQ). Stud Health Technol Inform. 2005;116:1010-5.

Page 12: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(3) Archive coding systems….

• Coding systems are constantly evolving

• In general coding systems are becoming larger + more complex

•You can go from many to few; but not from few to many…

• We archive:Clinical codes look-up engine used

e.g. NHS Triset Browser

• Each relevant version E.g. 4 and 5-Byte Read Codes; Drug Dictionary, Proprietary codes

Page 13: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Example of “look-up engine”

Page 14: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(4) The query library

• Re-issued by date

• Query set for each clinical programme• e.g. C1, C2, C3 – Cardiac programme

• Query set for each extraction type• e.g. E4, E5, G4, G5 (E for EMIS, G for Generic)

• Defined look-up tables + rubrics for queries

Page 15: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

The query library…

Page 16: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

The “C2” queries

Page 17: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

The “C2” EMIS 5-Byte set

Page 18: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(5) Metadata system

• Follows data from query set to analysis

• Preserves original data

• Derived variables clearly identified

• Associated dates + numerics labelled• Rules for units used

• Look-up table used to define variable names

van Vlymen J, de Lusignan S. A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. Inform Prim Care. 2005;13(4):281-91.

Page 19: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Source data – metadata structure

originating query

set bigram

  query file

  Read code /CCC

 

repeat index

 

type bigram

C 2 _ PDNPP1

_ G 3 _ 1 _ D I

BIGRAM MEANING

DI Diagnosis

RX Drugs Prescription

OC Occupation

HO HistorySymptoms

OE ExaminationSigns

Page 20: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Linking elements:Query libraryQuery & Core Clinical Concept Read code

Page 21: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Core clinical concept (CCC)

Page 22: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Automation

Page 23: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

(6) Ethics

• The Ethical constrains on any dataset are indexed in the query library

Page 24: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

9th May 2006

Summary

Page 25: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

Summary

• Data quality is best defined in terms of •“Fitness for purpose” - What purpose when?

•Transparent methods of data processing allow audit of results

• Understanding data entry issues / context is essential• Metadata can help control processing

• Careful curation of data may allow its use beyond the timescale of the original study

Page 26: 9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s

9th May 2006

Thanks for listening

Simon de Lusignan

Tel: 020 8725 5661Fax: 020 8767 7697Email: [email protected]: www.gpinformatics.org

www.sgul.ac.uk/informatics/