improving efficiency and accuracy in data management for ... › pdfs › ndrs-presentations ›...
TRANSCRIPT
![Page 1: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/1.jpg)
Improving efficiency and accuracy indata management for naturalistic driving studies
Rusan Chen
Georgetown University
![Page 2: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/2.jpg)
• Naturalistic driving studies involve complicated, dynamic datasets1
• Efficient data management is essential for the analysis results being replicable 2
• Based on my experience working on the 40-car Naturalistic Driving Study3
Overview
![Page 3: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/3.jpg)
Sound familiar?
You have multiple versions for the same file and don’t know which is which.
You cannot find an important file and think you may have deleted it.
There are two versions of the ‘latest’ draft for a paper, with the same name ‘final.doc’
![Page 4: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/4.jpg)
Efficient workflow requires proper
• Organizing
• Documenting
• Automating
• Archiving
![Page 5: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/5.jpg)
Organizing
• \Work and \Post directories are critical
• Once a file is posted, it is never changed!
Example:
C:\40Car
\ADS
\Work
\Post
40carAnalysis.doc
![Page 6: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/6.jpg)
Organizing folders
• \Post \2009
\012710 survey questionnaire analysis
\013110 personality related to risky driving
\031110 predicting C/NC from g-force
\032710 SAS Glimmix
\033010 risky friends interaction
\052110 speeding analysis
\052410 perception of risk as mediators
\060610 high vs low risky drivers
![Page 7: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/7.jpg)
Documenting
It is always better to document today than tomorrow
What to document?
• Date
• Purpose
• Data sources
• How to form new composite scores
• Steps of analysis
• Where to save the results
• To whom you sent the results
![Page 8: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/8.jpg)
Automating
• Data management involves doing the same task multiple times.
• Automating these tasks can save time and prevent errors
What to automate? (using macros and loops)
• To update, merge, and subset datasets
• to create and label new variables
• To check outliers
• To define and report missing values
• To fit a sequence of similar models
• To save analysis results
![Page 9: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/9.jpg)
short-term mid-term long-term
mirror backup archive
Archiving: to protect your files
![Page 10: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/10.jpg)
Thank you!
![Page 11: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times](https://reader034.vdocuments.site/reader034/viewer/2022042400/5f0e793c7e708231d43f6b52/html5/thumbnails/11.jpg)
Reference
• Long, JS (2009) The workflow of data analysis using Stata. Stata Press, TX: College Station