selection of data sources for observational comparative effectiveness research prepared for: agency...

Selection of Data Sources for Observational Comparative

Effectiveness Research

Prepared for:Agency for Healthcare Research and Quality (AHRQ)

www.ahrq.gov

This presentation will: Propose data source(s) that include data required

to address primary and secondary research questions

Describe details of data source(s) selected for the study

Describe validation or other quality assessments that have been conducted on the data source that are relevant to the data elements required for the study

Describe what patient identifiers are necessary for the research purpose, how they will be protected, and the permissions/waivers required

Provide details on the data-linkage approach and the quality/accuracy of the linkage, if applicable

Outline of Material

The research question dictates the type of data

required.

The data source must be chosen carefully to

ensure it:

Can address the study question

Has a sufficient number of observations

Contains key variables including potential

confounders

Allows for sufficient length of followup

Introduction

Primary data are collected by the investigator directly from study participants to address a specific question or hypothesis.

The observational research designs that often require primary data collection are described below: Prospective observational studies:

Subjects are selected on the basis of specific characteristics, and their progress is monitored.

The investigator does not assign the exposure of interest. Registries:

Registries use an observational study design to collect data and do not specify treatments or require therapies intended to change patient outcomes.

Registry data can be used for public health surveillance, to determine incidence rates, to perform risk assessment, and to monitor progress.

Primary Data

Secondary data is data collected for other purposes that can be used to answer the research question. Electronic medical record data

The method of data collection is not standardized nationally, and the intervals between visits vary for every patient.

Patients may receive care at different facilities not captured in the electronic medical record.

Paper-based records The exclusion of sites without electronic records may

bias results. These records may be valuable as a source of primary

data for validating data available elsewhere (e.g., administrative claims data).

Secondary Data (1 of 3)

Administrative data Typically generated as part of the process of

obtaining insurance reimbursement International Classification of Disease (ICD) and

Common Procedural Terminology (CPT) Pharmacy data

Claims submitted to insurance companies for payments, as well as pharmacy dispensing records

National Drug Code (NDC): a unique 10-digit identifier to indicate active ingredient(s), dosage and strength, route of administration, and package size and type


Regulatory data The U.S. Food and Drug Administration (FDA) has a vast

store of data from regulatory approval submissions. The FDA is converting its older datasets into research-

appropriate data to be stored into its clinical trial repository.

These data, however, typically come from efficacy trials rather than effectiveness trials.

Repurposed trial data or data from completed observational studies Investigators who conduct clinical research studies funded

by the Federal Government are mandated to make their data publicly available.

The available data are limited to what were collected.


Required data elements Identify the minimum data requirements a priori Supplement datasets by predicting absent data

variables or by linking to available data Time period and duration of followup

Essential that the data capture the time period of relevance

Inadequate length of followup often makes data unusable

Followup should take the natural history of the outcomes into account

Considerations for Selecting Data

Missing data Can compromise validity of the study, depending on

whether there is a pattern in the type and extent of “missingness”

Define a priori an acceptable percent of missing data for key data elements required for analysis

Use imputation techniques for missing data

Changes that may alter data availability and consistency over time Is either a singular event or a gradual shift over time Evaluate points in time where the database is

discontinuous (trend breaks)

Ensuring Quality Data (1 of 2)

Validity of key data definitions Assess the general definition of key variables and

their reliability and validity in the chosen database. Commonly used data resources may have validated

outcomes of health events (e.g., heart attack, mortality).

Creating best definitions of key variables often requires involvement of clinicians.

Compare study data to a “gold standard,” such as paper-based medical records if available.

Ensuring Quality Data (2 of 2)

Data outside the United States It may be easier to obtain complete, long-term data

from countries with single-payer systems that have regional or national electronic medical record systems.

Consider possible regional variations in exposure, availability, cost, and health care systems.

Point-of-care data collection and interactive voice response/other technologies The advantages of a new and timely data stream

include more detailed data that can be used to spot trends or patterns.

Data streams are often specialized.

Emerging Issues and Opportunities (1 of 2)

Data pooling and networking Data pooling: combing data, at the level of the unit of

analysis, from several sources into a single cohort Networking: common protocols, data definitions, and

programming developed for several data resources Personal health records

Electronically stored health records initiated by the patient

Patient-reported outcomes (PROs) Increased availability of PROs in administrative data

of value to comparative effectiveness research

Emerging Issues and Opportunities (2 of 2)

Summary Checklist (1 of 2)

Guidance Key Considerations

Propose data source(s) that include data required to address primary and secondary research questions

• Ensure that the data resource is appropriate for addressing the study question

• Ensure that key variables needed to conduct the study are available in the data source

Describe details of data source(s) selected for the study

• The nature of the data (claims, paper, or electronic medical records; if prospective, how is/was the information collected and from whom)

•Coding system(s) that may be used (e.g., ICD9 or ICD10; HCPCS; etc.)•Population included in the data source (ages, geography, etc.)• Other features (e.g., health plan membership, retention rate [i.e., average

duration of followup for members in the database, proportion of patients with followup sufficiently long for the study purpose])

•Time period covered by the data source(s)• If non-U.S. data, describe relevant differences in health care and how

this will affect results

Summary Checklist (2 of 2)

Guidance Key Considerations

Describe validation or other quality assessments that have been conducted on the data source(s) that are relevant to the data elements required for the study

• If validation/quality assessments have not previously been performed, propose a method to assess data quality

Describe what patient identifiers are necessary for the research purpose, how they will be protected, and permissions/waivers required

Provide details on the data-linkage approach and the quality/accuracy of linkage, if applicable

• Provide enough detail to clarify the quality of the linkage approach

selection of data sources for observational comparative effectiveness research prepared for: agency...

Documents

regulatory data

type of data

administrative data

data available

data elements

registry data

primary data slide

researchappropriate