epi 218 database management for clinical research tables, relationships, normalization, data types,...
TRANSCRIPT
EPI 218Database Management for Clinical ResearchTables, Relationships, Normalization, Data Types, and Data Dictionaries
Michael A. Kohn, MD, MPP1 August 2013
Clinical Research* Choose the study design, and define the study
population, predictor variables, and outcome variables;
measure these variables and anticipate problems with measurement;
analyze the results
In this course, we discuss the “nitty gritty” of collecting, storing, updating, and monitoring the study measurements.
*Private companies that make data management systems for clinical research understand “clinical research” to include only RCTs preparatory to FDA drug or device approval, not observational studies.
Outline
Housekeeping Data Tables
Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1
Housekeeping
Epi 218
• Course website: http://www.epibiostat.ucsf.edu/courses/schedule/data_management.html
• Lectures and Labs will be in China Basin Landing 6702 with overflow into 6704, 8:30 – 10:30
• “Learn MS Access 2000” videohttp://mkanders.com/learn_access_video.htmUsername: ucsfdbclassPassword: access2000(We can also loan you the video on CD.)
Platforms Access (Labs 1, 2 and 3) REDCap (Lab 4) QuesGen (Lab 5) OnCore (Lab 6)
May use other data management platforms for final project:-- SurveyMonkey-- Filemaker Pro-- Oncore-- OpenClinica-- Other
Microsoft Access Integrated desktop database management
platform Uses SQL (Structured Query Language) Has an outstanding graphical query design tool Incorporates an excellent report writer Based on the principles of the Relational Model Relationships diagram has integrated referential
integrity Very flexible, infinitely customizable NOT browser based, desktop application Using advanced features usually requires hiring
a developer
Microsoft AccessChanged user interface between Access 2003
and Access 2007.
If you are running Access 2003 or an earlier version, use the lab instructions for Access 2003.
If you are running Access 2007 or 2010, use the lab instructions for Access 2007.
DEB Terminal Server 185-RDS1.epi-ucsf.org has Access 2010. The others have Access 2003.
DEB Terminal Server Provides a remote Windows desktop with
Microsoft Office Professional Remote Desktop client software freely
available for the Mac and already part of Windows
http://www.microsoft.com/en-us/download/details.aspx?id=18140
Obtain DEB Terminal Server username and password from [email protected]
Instructions available on course syllabus page
REDCap Web-based research data collection
system developed at Vanderbilt Available free through UCSF
Academic Research Systems http://tinyurl.com/yh5m6ka You are both the Principal
Investigator and User 1. Model= “Do-it-yourself”
QuesGen Web-enabled research data collection
and management platform developed (with UCSF input) by a private company based in Burlingame
More full-featured and customizable than REDCap, but primarily “pay-us-to-do-it” rather than “do-it-yourself”
User accounts for Epi 218 students
Learning Objectives develop a multi-table, relational database
for a research study using Microsoft Access query a database for monitoring and
analyzing research data learn about REDCap: basic functions,
advantages and limitations understand the advantages and costs of
other web-based platforms such as QuesGen
hear about data management for large-scale clinical trials in industry
Requirements
Turn in all 5 labs on time Labs are due by midnight the following
Thursday (Lab 1 due 8/8 at midnight) Complete Final Project
Due 9/18/2013
Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/20/2012
Send in a copy of your research study database*.
We prefer a database that you are currently using or will use for a research study.
However, a demonstration or pilot database is acceptable.
*If you are unable to package your database in a file to email, you can send us a link or work out another way to review your database.
If you are doing secondary analysis of data collected by someone else,
obtain the data collection forms* used in the original data collection,
set up a new database that you would use for a follow-up study.
*Often easily obtained by doing a Google search or emailing the author of the original study.
Final Project: Part ASend in or Demonstrate Your Study DatabaseDue 9/18/2013
General description of database Data collection and entry Error checking and data validation Analysis (e.g., export to Stata) Security/confidentiality Back up
Final Project: Part BSubmit Your Data Management PlanDue 9/18/2013
Final ProjectDue 9/18/2013
Start thinking about this now.Build your own study database as
you work through the labs.Use extra time in lab to work on your
study database.Set up appointments with course
faculty early.
TICR Professional Conduct StatementClarifications for this class
I will maintain the highest standards of academic honesty
I will neither give nor receive aid in examinations or assignments unless such cooperation is expressly permitted by the instructor
I will conduct research in an unbiased manner, reports results truthfully, and credit ideas developed and work done by others
I will not use answer keys from prior years I will write answers in my own words, and, when
collaboration is permitted, acknowledge collaborators when answers are jointly formulated
For Epi 218 – Just don’t turn in somebody else’s work as your own.
Rows = Records = Entities
Columns = Fields = Attributes
Data Tables
DCR Chapter 16 Exercise 2
The PHTSE (Pre-Hospital Treatment of Status Epilepticus) Study was a randomized blinded trial of lorazepam, diazepam, or placebo in the treatment of pre-hospital status epilepticus. The primary endpoint was termination of convulsions by hospital arrival. To enroll patients, paramedics contacted base hospital physicians by radio. The following are base-hospital physician data collection forms for 2 enrolled patients:Lowenstein DH, Alldredge BK, Allen F, Neuhaus J, Corry M, Gottwald M, et al. The prehospital treatment of status epilepticus (PHTSE) study: design and methodology. Control Clin Trials 2001;22(3):290-309.
Alldredge BK, Gelb AM, Isaacs SM, Corry MD, Allen F, Ulrich S, et al. A comparison of lorazepam, diazepam, and placebo for the treatment of out-of-hospital status epilepticus. N Engl J Med 2001;345(9):631-7.
Display the data from these 2 data collection forms in a 2-row data table.
SubjectID
KitNumber
AdminDate
AdminTime
SzStopPreHosp
SzStopPreHospTime
HospArrTime
HospArrSzAct
HospArrGCSV
189 A322 3/12/1994 17:39 FALSE 17:48 TRUE
410 B536 12/1/1998 01:35 TRUE 01:39 01:53 FALSE 4
Create a 9-field data dictionary for the data table
Field NameData Type Description Validation Rule
SubjectID Integer Unique Subject Identifier
KitNumber Text(5) 5-character Investigational Pharmacy Code
AdminDate Date Date Study Drug Administered
AdminTime Time Time Study Drug Administered
SzStopPreHosp Yes/No Did seizure stop during pre-hospital course?
SzStopPreHospTime
Time Time seizures stopped during pre-hosp course (blank if seizure did not stop)
HospArrTime Time Hospital Arrival Time
HospArrSzAct Yes/No Was there continued Seizure Activity on Hospital Arrival?
Check against SzStopPreHosp
HospArrGCSV Integer Verbal GCS on Hospital Arrival (blank if seizure continued)
Between 1 and 5
Methods:
Design-Nested double cohort study.Setting-KaiserSubjects-Infants with neonatal jaundice and randomly selected non-jaundiced infantsPredictor Variable-Presence or absence of jaundiceOutcome Variable- Neuropsychological score (ranging from 55 to 145) at age 5Analysis- ?
JIFeeJaundice and Infant Feeding Study
Newman, T. B., P. Liljestrand, et al. (2006). "Outcomes among newborns with total serum bilirubin levels of 25 mg per deciliter or more." N Engl J Med 354(18): 1889-900.
Infant Jaundice Study Data
1. Approximately 400 children2. 5 examiners (doctors)3. Approximately 700 neuropsychological examinations,
measuring weight, height, and “NPScore” (IQ)4. Some children to be examined more than once5. No examiner to see the same child twice6. If child died before age 5, store age and circumstances of
death
Demonstration: Creating a Data Table
Label columns and enter rows of data in datasheet view
Where is predictor on data collection form?
Demonstration: Data Dictionary
Table design view:•field (=column) names, •data types, •definitions, •validation rules
(More on data types, free-text vs. coded responses, later)
Acceptable table showing one set of exam results per participant.(BabyExamForFigure3)
Demonstration
Disallowed values
Duplicate primary keys
This automatic error checking and data validation IS why you need to enter your data into a computer; it is NOT why you need a relational DBMS. Many single-table products (Filemaker Pro, SAS FSP, even Excel) can do error checking and data validation.
Demonstration: Same Table in Excel, Stata
Excel Stata Etc
Rows = Records = EntitiesColumns = Fields =
AttributesAccess and Stata have a special row at the top for column headings (=field names); Excel just uses the first row.
Normalization
Table of Study Subjects
Row = Individual Infant
Columns = ID#, Name, DOB, Sex, Jaundice
If some infants have more than one exam, what do you do?
Table of Study Subjects
Undesirable table showing multiple exam results per study participant.(BabyExamForFigure4)
Demo
Find highest IQ Score Find all exams done in April
Common Error
If you find yourself creating multiple columns for the same measurement, e.g., Date1, Score1, Date2, Score2, Date3, Score3, …
Or if your table is more than about 30 columns wide, It is time to restructure your
table.
Undesirable table with participant-specific data duplicated for each exam. (Note problem with Helen’s DOB.)(ExamBabyForFigure5)
Demo
Find highest IQ Score Find all exams in a particular
month What is Helen’s birth date? What happened to Alejandro,
Ryan, Zachary, and Jackson?
If some infants have multiple exams,
“normalize” the records into two tables, one for subjects and one for examinations.
Normalization
Data normalized into two tables: one (“Baby”) with rows comprising subject-specific information; the other (“Exam”) with rows comprising exam-specific information. Note that Helen can only have one birth date. Subjects with no exams, e.g. Alejandro, still appear in the database. “SubjectID” functions as the primary key in the “Baby” table and as the foreign key in the “Exam” table.
Figure 7. Relationships diagram showing the one-to-many relationship between the table of subjects (“Baby”) and the table of measurements (“Exam”).
Demonstration
Inability to create integrity violations with normalized tables.
This IS why you need a multi-table relational DBMS.
Outline
Housekeeping Data Tables
Rows = Records; Columns = Fields Normalization of Data Tables Start Lab 1