code and data management
TRANSCRIPT
![Page 1: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/1.jpg)
CODE AND DATA MANAGEMENT Toni Rosati Lynn Yarmey
![Page 2: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/2.jpg)
… Reproducibility is the foundation of science
… Journals are starting to require data deposit
… You want to get credit for producing data (data citations)
… Others can use and build on your work (data reuse)
… Recreating a figure from a 2006 paper shouldn’t be painful
… Funders tell us so (See NSF, NIH, NOAA, etc)
Data Management is Important! Because…
![Page 3: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/3.jpg)
Outline • Back up often • Sharing code • File naming • Metadata • Sharing data • A data search tool
![Page 4: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/4.jpg)
![Page 5: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/5.jpg)
But why would you only backup when you can do so much more?...
Tips: - 1 working copy on your computer - 1 copy on infrastructure near you - 1 copy on infrastructure far away
Back up
SHARE!!
![Page 6: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/6.jpg)
• Good backup • Collaboration • People don’t have to contact you to get and understand the code
• Faster and easier than other options (emailing individuals or sharing on servers)
• ……
Why Share Code?
![Page 7: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/7.jpg)
Why Share Code? • Version control • Commenting gives public and brief history • Work on multiple computers with the same code– flexibility in where you work (no USB drive necessary)
• Keep code with metadata/user instructions • No bureaucracy • FREE!
![Page 8: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/8.jpg)
What is Git?
• Git is a distributed revision control and source code management (SCM) system capable of dealing with non-linear workflows
• “As with most other distributed revision control systems,
and unlike most client-server systems, every Git working directory is a full-fledged repository with complete history and full version tracking capabilities, independent of network access or a central server.” (Wikipedia)
![Page 9: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/9.jpg)
GitHub
![Page 10: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/10.jpg)
Sharing Code – GitHub.com
![Page 11: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/11.jpg)
Sharing Code – GitHub.com
GitHub serves as the location of record for VIC at: https://github.com/UW-Hydro/VIC
![Page 12: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/12.jpg)
![Page 13: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/13.jpg)
File Naming • Make names unique and meaningful! • Include (as appropriate):
- Project name or acronym - Study title - Location - Data type - Researcher initials - Date - Data stage - Version number - File type
Think “long-term”
![Page 14: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/14.jpg)
Metadata What would someone unfamiliar with your data need in order to evaluate, understand, and reuse them? How about someone:
- who works in your lab? - from a different lab in your field? - who is in a related interdisciplinary field? - who researches a completely different area? - who works for a newspaper? Congress?
![Page 15: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/15.jpg)
Metadata is the difference between:
![Page 16: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/16.jpg)
Metadata is Data about Data • Units? • Resolution? • What do the Column names mean? • Caveats? Known data issues or missing values? • How data were collected? • Where forcing data came from? • How many layers were used in this model?
“Information that describes the content, quality, condition, origin, and other characteristics of data or other pieces of information. Metadata for spatial data may describe and document its subject matter; how, when, where, and by whom the data was collected; availability and distribution information; its projection, scale, resolution, and accuracy; and its reliability with regard to some standard. Metadata consists of properties and documentation. Properties are derived from the data source (for example, the coordinate system and projection of the data), while documentation is entered by a person (for example, keywords used to describe the data).” Esri
![Page 17: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/17.jpg)
Metadata • What happens without good
metadata?
• You have no idea what the data mean
• You think you understand the data, so you use it… • …but you use it totally wrong
• You waste hours (or days) trying to find out more about the data
![Page 18: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/18.jpg)
Sharing Data
These days, Dr. Hodes said, “the old model in which researchers jealously guarded their data is no longer applicable.” http://www.nytimes.com/2011/04/04/health/04alzheimer.html
![Page 19: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/19.jpg)
Sharing/Finding Data
www.nsidc.org/acadis/search
![Page 20: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/20.jpg)
Organize now…. or….
Thank you!
![Page 21: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/21.jpg)
Data Reuse
• Our team enables Arctic sciences by ensuring datasets are well documented and can be understood by re-users.
• The trick with data re-use is to
find the dataset… • then become familiar enough
with a dataset… • to be able to combine it with
other data … • and extract accurate results.
![Page 22: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/22.jpg)
Data Curation
• Metadata • Usability • Documentation • Training • Re-use • Tools • A little marketing • Partnering
• Consensus building • Data management plans
for grant proposals • Integrating social and
physical sciences • Data quality checks • Data analysis
![Page 23: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/23.jpg)
DOIs and Citations • Digital Object Identifiers (DOI) officially name a resource. • A DOI is essentially a stable, permanent URL.
• Information about a digital object may change over time, including where to find it, but its DOI name will not change.
• “The DOI System provides a framework for persistent identification, managing intellectual content, managing metadata, linking customers with content suppliers, facilitating electronic commerce, and enabling automated management of media.” (DataCite.org)
![Page 24: CODE AND DATA MANAGEMENT](https://reader033.vdocuments.site/reader033/viewer/2022051711/5868e3821a28abb4408c2a25/html5/thumbnails/24.jpg)
Beyond ACADIS – Other Resources General Info and help -
Earth Science Information Partners (ESIP): http://wiki.esipfed.org/ UVA Libraries: http://www2.lib.virginia.edu/brown/data/
Data Management Plan and other tools – DMP Tool: https://dmp.cdlib.org/ DataOne: https://www.dataone.org/cattools/Data%20and%20Metadata
%20Management Metadata -
Excel Plug-in tool (in development): http://www.cdlib.org/cdlinfo/2011/09/01/facilitating-data-management-dcxl/ Lists of Standards (not complete!) for bio, climate, ecology, oceanography - http://
marinemetadata.org/conventions Stanford-based portal for medical/bio -
http://bioportal.bioontology.org/resources