epi 218 database management for clinical research tables, relationships, normalization, data types,...

45
EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August 2013

Upload: brandon-paul

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

EPI 218Database Management for Clinical ResearchTables, Relationships, Normalization, Data Types, and Data Dictionaries

Michael A. Kohn, MD, MPP1 August 2013

Page 2: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Clinical Research* Choose the study design, and define the study

population, predictor variables, and outcome variables;

measure these variables and anticipate problems with measurement;

analyze the results

In this course, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements.

*Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.

Page 3: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Outline

Housekeeping Data Tables

Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1

Page 4: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Housekeeping

Epi 218

Page 5: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

• Course website: http://www.epibiostat.ucsf.edu/courses/schedule/data_management.html

• Lectures and Labs will be in China Basin Landing 6702 with overflow into 6704, 8:30 – 10:30

• “Learn MS Access 2000” videohttp://mkanders.com/learn_access_video.htmUsername: ucsfdbclassPassword: access2000(We can also loan you the video on CD.)

Page 6: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Platforms Access (Labs 1, 2 and 3) REDCap (Lab 4) QuesGen (Lab 5) OnCore (Lab 6)

May use other data management platforms for final project:-- SurveyMonkey-- Filemaker Pro-- Oncore-- OpenClinica-- Other

Page 7: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Microsoft Access Integrated desktop database management

platform Uses SQL (Structured Query Language) Has an outstanding graphical query design tool Incorporates an excellent report writer Based on the principles of the Relational Model Relationships diagram has integrated referential

integrity Very flexible, infinitely customizable NOT browser based, desktop application Using advanced features usually requires hiring

a developer

Page 8: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Microsoft AccessChanged user interface between Access 2003

and Access 2007.

If you are running Access 2003 or an earlier version, use the lab instructions for Access 2003.

If you are running Access 2007 or 2010, use the lab instructions for Access 2007.

DEB Terminal Server 185-RDS1.epi-ucsf.org has Access 2010. The others have Access 2003.

Page 10: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

REDCap Web-based research data collection

system developed at Vanderbilt Available free through UCSF

Academic Research Systems http://tinyurl.com/yh5m6ka You are both the Principal

Investigator and User 1. Model= “Do-it-yourself”

Page 11: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

QuesGen Web-enabled research data collection

and management platform developed (with UCSF input) by a private company based in Burlingame

More full-featured and customizable than REDCap, but primarily “pay-us-to-do-it” rather than “do-it-yourself”

User accounts for Epi 218 students

Page 12: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Learning Objectives develop a multi-table, relational database

for a research study using Microsoft Access query a database for monitoring and

analyzing research data learn about REDCap: basic functions,

advantages and limitations understand the advantages and costs of

other web-based platforms such as QuesGen

hear about data management for large-scale clinical trials in industry

Page 13: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Requirements

Turn in all 5 labs on time Labs are due by midnight the following

Thursday (Lab 1 due 8/8 at midnight) Complete Final Project

Due 9/18/2013

Page 14: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/20/2012

Send in a copy of your research study database*.

We prefer a database that you are currently using or will use for a research study.

However, a demonstration or pilot database is acceptable.

*If you are unable to package your database in a file to email, you can send us a link or work out another way to review your database.

Page 15: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

If you are doing secondary analysis of data collected by someone else,

obtain the data collection forms* used in the original data collection,

set up a new database that you would use for a follow-up study.

*Often easily obtained by doing a Google search or emailing the author of the original study.

Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/18/2013

Page 16: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

General description of database Data collection and entry Error checking and data validation Analysis (e.g., export to Stata) Security/confidentiality Back up

Final Project: Part BSubmit Your Data Management PlanDue 9/18/2013

Page 17: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Final ProjectDue 9/18/2013

Start thinking about this now.Build your own study database as

you work through the labs.Use extra time in lab to work on your

study database.Set up appointments with course

faculty early.

Page 18: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

TICR Professional Conduct StatementClarifications for this class

I will maintain the highest standards of academic honesty

I will neither give nor receive aid in examinations or assignments unless such cooperation is expressly permitted by the instructor

I will conduct research in an unbiased manner, reports results truthfully, and credit ideas developed and work done by others

I will not use answer keys from prior years I will write answers in my own words, and, when

collaboration is permitted, acknowledge collaborators when answers are jointly formulated

For Epi 218 – Just don’t turn in somebody else’s work as your own.

Page 19: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Rows = Records = Entities

Columns = Fields = Attributes

Data Tables

Page 20: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

DCR Chapter 16 Exercise 2

The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio. The following are base-hospital physician data collection forms for 2 enrolled patients:Lowenstein DH, Alldredge BK, Allen F, Neuhaus J, Corry M, Gottwald M, et al. The prehospital treatment of status epilepticus (PHTSE) study: design and methodology. Control Clin Trials 2001;22(3):290-309.

Alldredge BK, Gelb AM, Isaacs SM, Corry MD, Allen F, Ulrich S, et al. A comparison of lorazepam, diazepam, and placebo for the treatment of out-of-hospital status epilepticus. N Engl J Med 2001;345(9):631-7.

Page 21: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August
Page 22: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August
Page 23: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Display the data from these 2 data collection forms in a 2-row data table.

SubjectID

KitNumber

AdminDate

AdminTime

SzStopPreHosp

SzStopPreHospTime

HospArrTime

HospArrSzAct

HospArrGCSV

189 A322 3/12/1994 17:39 FALSE   17:48 TRUE  

410 B536 12/1/1998 01:35 TRUE 01:39 01:53 FALSE 4

Page 24: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Create a 9-field data dictionary for the data table

Field NameData Type Description Validation Rule

SubjectID Integer Unique Subject Identifier  

KitNumber Text(5) 5-character Investigational Pharmacy Code

 

AdminDate Date Date Study Drug Administered  

AdminTime Time Time Study Drug Administered  

SzStopPreHosp Yes/No Did seizure stop during pre-hospital course?

 

SzStopPreHospTime

Time Time seizures stopped during pre-hosp course (blank if seizure did not stop)

 

HospArrTime Time Hospital Arrival Time  

HospArrSzAct Yes/No Was there continued Seizure Activity on Hospital Arrival?

Check against SzStopPreHosp

HospArrGCSV Integer Verbal GCS on Hospital Arrival (blank if seizure continued)

Between 1 and 5

Page 25: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Methods:

Design-Nested double cohort study.Setting-KaiserSubjects-Infants with neonatal jaundice and randomly selected non-jaundiced infantsPredictor Variable-Presence or absence of jaundiceOutcome Variable- Neuropsychological score (ranging from 55 to 145) at age 5Analysis- ?

JIFeeJaundice and Infant Feeding Study

Newman, T. B., P. Liljestrand, et al. (2006). "Outcomes among newborns with total serum bilirubin levels of 25 mg per deciliter or more." N Engl J Med 354(18): 1889-900.

Page 26: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Infant Jaundice Study Data

1. Approximately 400 children2. 5 examiners (doctors)3. Approximately 700 neuropsychological examinations,

measuring weight, height, and “NPScore” (IQ)4. Some children to be examined more than once5. No examiner to see the same child twice6. If child died before age 5, store age and circumstances of

death

Page 27: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August
Page 28: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demonstration: Creating a Data Table

Label columns and enter rows of data in datasheet view

Where is predictor on data collection form?

Page 29: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demonstration: Data Dictionary

Table design view:•field (=column) names, •data types, •definitions, •validation rules

(More on data types, free-text vs. coded responses, later)

Page 30: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August
Page 31: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Acceptable table showing one set of exam results per participant.(BabyExamForFigure3)

Page 32: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demonstration

Disallowed values

Duplicate primary keys

This automatic error checking and data validation IS why you need to enter your data into a computer; it is NOT why you need a relational DBMS. Many single-table products (Filemaker Pro, SAS FSP, even Excel) can do error checking and data validation.

Page 33: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demonstration: Same Table in Excel, Stata

Excel Stata Etc

Rows = Records = EntitiesColumns = Fields =

AttributesAccess and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.

Page 34: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Normalization

Page 35: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Table of Study Subjects

Row = Individual Infant

Columns = ID#, Name, DOB, Sex, Jaundice

If some infants have more than one exam, what do you do?

Table of Study Subjects

Page 36: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Undesirable table showing multiple exam results per study participant.(BabyExamForFigure4)

Page 37: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demo

Find highest IQ Score Find all exams done in April

Page 38: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Common Error

If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, …

Or if your table is more than about 30 columns wide, It is time to restructure your

table.

Page 39: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.)(ExamBabyForFigure5)

Page 40: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demo

Find highest IQ Score Find all exams in a particular

month What is Helen’s birth date? What happened to Alejandro,

Ryan, Zachary, and Jackson?

Page 41: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

If some infants have multiple exams,

“normalize” the records into two tables, one for subjects and one for examinations.

Normalization

Page 42: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Data normalized into two tables: one (“Baby”) with rows comprising subject-specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.

Page 43: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).

Page 44: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Demonstration

Inability to create integrity violations with normalized tables.

This IS why you need a multi-table relational DBMS.

Page 45: EPI 218 Database Management for Clinical Research Tables, Relationships, Normalization, Data Types, and Data Dictionaries Michael A. Kohn, MD, MPP 1 August

Outline

Housekeeping Data Tables

Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1