best practices data collection
DESCRIPTION
TRANSCRIPT
![Page 1: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/1.jpg)
Best PracticesCreating Research Data
Sherry LakeJuly 31, 2012 University of Florida Data Management Workshop
![Page 2: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/2.jpg)
WHY?
Following these Best Practices…….
• Will improve the usability of the data by you or by others
• Your data will be “computer ready”• Your data will be ready to share with others
![Page 3: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/3.jpg)
Spreadsheet Examples
![Page 4: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/4.jpg)
Spreadsheet Problems?
![Page 5: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/5.jpg)
Problems
• Dates are not stored consistently
• Values are labeled inconsistently• Data coding is inconsistent• Order of values are different
![Page 6: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/6.jpg)
Problems
• Confusion between numbers and text
• Different types of data are stored in the same columns
• The spreadsheet loses interpretability if it is sorted
![Page 7: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/7.jpg)
Best Practices Data Organization
• Lines or rows of data should be complete – Designed to be machine readable, not human
readable (sort)
![Page 8: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/8.jpg)
Best Practices Data Organization
• Include a Header Line 1st line (or record) • Label each Column with a short but
descriptive name– Names should be unique– Use letters, numbers, or “_” (underscore)– Do not include blank spaces or symbols (+ - & ^ *)
![Page 9: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/9.jpg)
Best Practices Data Organization
• Columns of data should be consistent – Use the same naming convention for text data
• Columns should include only a single kind of data– Text or “string” data – Integer numbers– Floating point or real numbers
![Page 10: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/10.jpg)
Use Standardized Formats
• ISO 8601 Standard for Date and Time– YYYYMMDDThh:mmss.sTZD
20091013T09:1234.9Z 20091013T09:1234.9+05:00
• Spatial Coordinates for Latitute/Longitude– +/- DD.DDDDD -78.476 (longitude)
+38.029 (latitude)
![Page 11: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/11.jpg)
File Names
![Page 12: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/12.jpg)
File Names
• Use descriptive names• Not too long• Don’t use spaces• Try to include time,
place & theme• May use “-” or “_”
![Page 13: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/13.jpg)
File Names
• String words together with Caps (VegBiodiv_2007)
• Think about using version numbers
• Don’t change default extensions (txt, jpg, csv,…)
![Page 14: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/14.jpg)
Quantitative Assurance/Control
Dataset Creation & Integrity Errors• Use a data entry program
– Program to catch typing errors
– Program pull-down menu options
• Perform double entry of the data
• Manually check 5 – 10% of data records
• Check for out-of-range values (plotting)
• Check for missing or impossible values
• Perform statistical summaries (random samples)
![Page 15: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/15.jpg)
Analyzing Data - Notes
• Keep Original File– Uncorrected copy– Make “read-only”
• Make notes on transformations• Any changes, save as a new file• Use scripted code to transform and correct
data
![Page 16: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/16.jpg)
Analyzing Data
• Use a scripted program (R, SAS, SPSS, Matlab)– Steps are recorded in textual format– Can be easily revised and re-executed– Helps sharing and repetition– Easy to document
• GUI-bases analysis may be easier, but harder to reproduce
![Page 17: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/17.jpg)
Document EVERYTHING!
• Create a Project Document File– More than a Lab Notebook– Data Management Plan
• Start at the beginning of the project and continue throughout data collection & analysis– Why you are collecting data– Exact details of methods of collecting & analyzing
![Page 18: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/18.jpg)
Document EVERYTHING!
• Details such as:– Names of data & analysis files associated with
study– Definitions for data and codes (include missing
value codes, names) example– Units of measure (accuracy and precision)– Standards or instrument calibrations
![Page 19: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/19.jpg)
Choosing File Formats
• Accessible Data (in the future)– Non-proprietary (software formats)– Open, documented standard– Common, used by the research community– Standard representation (ASCII, Unicode)– Unencrypted & Uncompressed– Media formats (hardware formats)
![Page 20: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/20.jpg)
Preferred Format Choices
• PDF, not Word• ASCII, not Excel• MPEG-4, not Quicktime• TIFF or JPEG2000, not GIF or JPG• XML or RDF, not RDBMS
Good if not software specific
![Page 21: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/21.jpg)
Best Practices
1. Use Consistent Data Organization2. Use Standardized Formats3. Assign Descriptive File Names4. Perform Basic Quality Assurance/ Quality Control5. Use Scripted Program for Analysis and Keep Notes6. Document EVERYTHING! (Define Contents of Data
Files )7. Use Consistent, Stable and Open File Formats
![Page 22: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/22.jpg)
Best Practices BibliographyBorer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some
simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214.
Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.
Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.
Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.
![Page 23: Best practices data collection](https://reader038.vdocuments.site/reader038/viewer/2022102716/54be8e2f4a79594c308b456d/html5/thumbnails/23.jpg)
23
Questions? Discussion?
• Sherry LakeSenior Scientific Data Consultant, UVA Library
• [email protected]• Twitter: shlakeuva• Slideshare: http://www.slideshare.net/shlake• Web: http://www.lib.virginia.edu/brown/data