selection of data sources for observational comparative effectiveness research prepared for: agency...
TRANSCRIPT
Selection of Data Sources for Observational Comparative
Effectiveness Research
Prepared for:Agency for Healthcare Research and Quality (AHRQ)
www.ahrq.gov
This presentation will: Propose data source(s) that include data required
to address primary and secondary research questions
Describe details of data source(s) selected for the study
Describe validation or other quality assessments that have been conducted on the data source that are relevant to the data elements required for the study
Describe what patient identifiers are necessary for the research purpose, how they will be protected, and the permissions/waivers required
Provide details on the data-linkage approach and the quality/accuracy of the linkage, if applicable
Outline of Material
The research question dictates the type of data
required.
The data source must be chosen carefully to
ensure it:
Can address the study question
Has a sufficient number of observations
Contains key variables including potential
confounders
Allows for sufficient length of followup
Introduction
Primary data are collected by the investigator directly from study participants to address a specific question or hypothesis.
The observational research designs that often require primary data collection are described below: Prospective observational studies:
Subjects are selected on the basis of specific characteristics, and their progress is monitored.
The investigator does not assign the exposure of interest. Registries:
Registries use an observational study design to collect data and do not specify treatments or require therapies intended to change patient outcomes.
Registry data can be used for public health surveillance, to determine incidence rates, to perform risk assessment, and to monitor progress.
Primary Data
Secondary data is data collected for other purposes that can be used to answer the research question. Electronic medical record data
The method of data collection is not standardized nationally, and the intervals between visits vary for every patient.
Patients may receive care at different facilities not captured in the electronic medical record.
Paper-based records The exclusion of sites without electronic records may
bias results. These records may be valuable as a source of primary
data for validating data available elsewhere (e.g., administrative claims data).
Secondary Data (1 of 3)
Administrative data Typically generated as part of the process of
obtaining insurance reimbursement International Classification of Disease (ICD) and
Common Procedural Terminology (CPT) Pharmacy data
Claims submitted to insurance companies for payments, as well as pharmacy dispensing records
National Drug Code (NDC): a unique 10-digit identifier to indicate active ingredient(s), dosage and strength, route of administration, and package size and type
Secondary Data (2 of 3)
Regulatory data The U.S. Food and Drug Administration (FDA) has a vast
store of data from regulatory approval submissions. The FDA is converting its older datasets into research-
appropriate data to be stored into its clinical trial repository.
These data, however, typically come from efficacy trials rather than effectiveness trials.
Repurposed trial data or data from completed observational studies Investigators who conduct clinical research studies funded
by the Federal Government are mandated to make their data publicly available.
The available data are limited to what were collected.
Secondary Data (3 of 3)
Required data elements Identify the minimum data requirements a priori Supplement datasets by predicting absent data
variables or by linking to available data Time period and duration of followup
Essential that the data capture the time period of relevance
Inadequate length of followup often makes data unusable
Followup should take the natural history of the outcomes into account
Considerations for Selecting Data
Missing data Can compromise validity of the study, depending on
whether there is a pattern in the type and extent of “missingness”
Define a priori an acceptable percent of missing data for key data elements required for analysis
Use imputation techniques for missing data
Changes that may alter data availability and consistency over time Is either a singular event or a gradual shift over time Evaluate points in time where the database is
discontinuous (trend breaks)
Ensuring Quality Data (1 of 2)
Validity of key data definitions Assess the general definition of key variables and
their reliability and validity in the chosen database. Commonly used data resources may have validated
outcomes of health events (e.g., heart attack, mortality).
Creating best definitions of key variables often requires involvement of clinicians.
Compare study data to a “gold standard,” such as paper-based medical records if available.
Ensuring Quality Data (2 of 2)
Data outside the United States It may be easier to obtain complete, long-term data
from countries with single-payer systems that have regional or national electronic medical record systems.
Consider possible regional variations in exposure, availability, cost, and health care systems.
Point-of-care data collection and interactive voice response/other technologies The advantages of a new and timely data stream
include more detailed data that can be used to spot trends or patterns.
Data streams are often specialized.
Emerging Issues and Opportunities (1 of 2)
Data pooling and networking Data pooling: combing data, at the level of the unit of
analysis, from several sources into a single cohort Networking: common protocols, data definitions, and
programming developed for several data resources Personal health records
Electronically stored health records initiated by the patient
Patient-reported outcomes (PROs) Increased availability of PROs in administrative data
of value to comparative effectiveness research
Emerging Issues and Opportunities (2 of 2)
Summary Checklist (1 of 2)
Guidance Key Considerations
Propose data source(s) that include data required to address primary and secondary research questions
• Ensure that the data resource is appropriate for addressing the study question
• Ensure that key variables needed to conduct the study are available in the data source
Describe details of data source(s) selected for the study
• The nature of the data (claims, paper, or electronic medical records; if prospective, how is/was the information collected and from whom)
•Coding system(s) that may be used (e.g., ICD9 or ICD10; HCPCS; etc.)•Population included in the data source (ages, geography, etc.)• Other features (e.g., health plan membership, retention rate [i.e., average
duration of followup for members in the database, proportion of patients with followup sufficiently long for the study purpose])
•Time period covered by the data source(s)• If non-U.S. data, describe relevant differences in health care and how
this will affect results
Summary Checklist (2 of 2)
Guidance Key Considerations
Describe validation or other quality assessments that have been conducted on the data source(s) that are relevant to the data elements required for the study
• If validation/quality assessments have not previously been performed, propose a method to assess data quality
Describe what patient identifiers are necessary for the research purpose, how they will be protected, and permissions/waivers required
Provide details on the data-linkage approach and the quality/accuracy of linkage, if applicable
• Provide enough detail to clarify the quality of the linkage approach