data management (ridm) research information · non-digital text (lab books, field notebooks,...
TRANSCRIPT
Research Information & Data Management (RIDM)
Introductions:
Ellie Ransom: Research Services Coordinator, @CU_SEL, [email protected]
Amy Nurnberger: Research Data Manager, @DataAtCU, [email protected]
The Plan:& Introductions:
Amy Nurnberger: Research Data Manager, @DataAtCUEllie Ransom: Research Services Coordinator, @CU_SEL
The Plan for Research & Information Data (RID):➔ Identify it➔ Manage it➔ Document it➔ Secure it➔ Deal with it
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 |
Identify it:
Anthony Elia at http://hdl.handle.net/10022/AC:P:19828 | Oscilloscope by Voltcraft by Hannes Grobe, https://commons.wikimedia.org/wiki/File:Oscilloscope-voltcraft_hg.jpg, cc-by-3.0 | Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3. http://www.mopp.qut.edu.au/D/D_02_08.jsp
Material or information "on which an argument, theory,
test or hypothesis, or another research output is
based."
Identify it - What is it?
➢ Non-digital text (lab books, field notebooks, archival texts)
➢ Digital texts or digital copies of text
➢ Spreadsheets
➢ Audio, video
➢ Computer Aided Design/CAD
➢ Statistics (SPSS, SAS)
➢ Databases
➢ Geographic Information Systems (GIS) and spatial data
➢ Digital copies of images
➢ Non-digital images
➢ Matlab files & Models
➢ Metadata & Paradata
➢ Data visualizations
➢ Computer code
➢ Standard operating procedures and protocols
➢ Protein or genetic sequences
➢ Artistic products
➢ Web files
➢ Curriculum materials
➢ Collection of digital objects acquired and generated during research
Adapted from: Georgia Tech–http://libguides.gatech.edu/content.php?pid=123776&sid=3067221
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
Who has it?
Identify it:
has it!
So, what are you going to
do with it?
Manage it!
What is Research Information & Data Management (RIDM)?
– Rex Sanders
What is Research Information & Data Management (RIDM)?
existsfound
understandtrustcan use
– Rex Sanders
http://www.slideshare.net/shlake/documentation-metadatadentonlake | http://dx.doi.org/10.1890/1051-0761(1997)007%5B0330:NMFTES%5D2.0.CO;2
http://openarchaeologydata.metajnl.com/about/ , modified
YOU
Manage it when?
Manage it when?
Plan to manage:
1. What information/data are you producing?
2. How are you documenting / describing it?
3. Where are you storing it?
4. When are you sharing it?
5. Who’s responsible?
What are you producing?
Manage it: Volume
Manage it: Volume
Manage it: Volume
Manage it: Velocity
Manage it: Velocity
Manage it: Velocity
Manage it: Variety / Interoperability
Manage it:Sensitive data
IRB
Classified
Restricted
Intellectual property, e.g. patent or copyright
Ownership
HIPPA
FERPA
Manage it:Sensitive data
PII
How are you documenting
it?
Document it:
Take good notes!
???
00100100 00111111 01101010 10001000 10000101 10100011 00001000 11010011 00010011 00011001 10001010 00101110 00000011 01110000 01110011 01000100 10100100 00001001 00111000 00100010 00101001 10011111 00110001 11010000 00001000 00101110 11111010 10011000 11101100 01001110 01101100 10001001
00100100 00111111 01101010 10001000 10000101 10100011 00001000 11010011 00010011 00011001 10001010 00101110 00000011 01110000 01110011 01000100 10100100 00001001 00111000 00100010 00101001 10011111 00110001 11010000 00001000 00101110 11111010 10011000 11101100 01001110 01101100 10001001
Methods• What was done• How it was done• Instrumentation/Equipment (RASCAL
course)• LimitationsCode• All of the meaningsDescription / DocumentationLabels (w/ units!)• Codebook• Data dictionary• Laboratory notebook
00100100 00111111 01101010 10001000 10000101 10100011 00001000 11010011 00010011 00011001 10001010 00101110 00000011 01110000 01110011 01000100 10100100 00001001 00111000 00100010 00101001 10011111 00110001 11010000 00001000 00101110 11111010 10011000 11101100 01001110 01101100 10001001
Methods• What was done• How it was done• Instrumentation/Equipment (RASCAL
course)• LimitationsCode• All of the meaningsDescription / DocumentationLabels (w/ units!)• Codebook• Data dictionary• Laboratory notebook
Cd π
There are standards for documentation: http://www.dcc.ac.uk/resources/metadata-standards
Document it:
Speaking of standards…
Standards of scholarship & academia:
Document it:
Plagiarism?
Standards of scholarship & academia:
Document it:
Plagiarism
Cite stuff!
Data citation
PPublisher / Distributor
5
AAuthors &
Contributors
1
PdPublication
date
4
TTitle
2 EiElectronic ID,
e.g., DOI
3
Table of citation elements
Get Credit • Give
Credit
- Track reuse- Measure impact- Support reproducibility
https://www.force11.org/group/joint-declaration-data-citation-principles-finalCU-RDM@columbia.edu
Document it:Citation managers
http://library.columbia.edu/research/citation-management.html
Document it:Citation managers
http://library.columbia.edu/research/citation-management.html
Intellectual Property & Ownership:
Who owns it:
?????YouYour PIColumbia UniversityPublisherFunding Agency?????
How do you store it?
Store it:
Security Storage
Secure it:
Secure it:
How will you protect your or your participant’s:● Security● Privacy/ confidentiality● Intellectual property● Other rights
?
Secure it:
Secure it:BackupsWhere● Here● Near● Far
When● Regularly & frequently● Schedule it
Test it● File recovery● Checksums
Secure it:Backups
Read the fine print (what happens to your stuff when the service
inevitably dies?)
What about
Consider:● Security● Accessibility
● Cost● Longevity
, you ask?
Secure it:SecurityWho needs to/should see the data when?
IRBPIIFERPA
HIPPARestrictedClassified
CopyrightPatent potentialLicenses & IP
Consider:● Restricting physical access● Encryption● De-identification● Strong passwords (password manager)
Storing it: Some practicalities
● File formats
● File naming and organization
● Version control
• Non-proprietary• Open, documented standard• Standard representation (e.g., ASCII, Unicode)• Common, or commonly used by the research
community (e.g. FITS, CIF)• Unencrypted• Uncompressed
Some commonly recognized formats meeting these criteria: ASCII [e.g., .csv, .txt], PDF [.pdf], FLAC, TIFF, JPEG2000 [.jp2], MPEG-4 [.mp4], XML [.xml, .odf, .rdf], R [.r]
✓ Not sure about the extension? Check https://www.nationalarchives.gov.uk/PRONOM/default.htm
http://www.data-archive.ac.uk/media/2894/managingsharing.pdf | http://www.digitalpreservation.gov/formats/index.shtml?PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e3
Storing it:File formats (for interoperability & storage)
Storing it: File naming
Storing it: File naming
● Consistency: Pick a system, write it down, stick with it
● Identify necessary elements & consider their order
● Create brief, understandable names
● Date: YYYY-MM-DD or YYYYMMDD
● Version: v01, v02,…FINAL
● Try to stay away from spaces in filenames as well as the following characters: \ / : * ? “. < > | [ ] & $ (reserve . for file extensions)
● Recognize: At the file level, Firefly/browncoat/shiny.txt = Firefly/alliance/shiny.txt
Make a system. Share the system. Follow the system
Storing it:File organization● Consider organizing by logical chunks, e.g. project,
class, grant
● What makes sense for the work you’re doing? How are you likely to look for related items?
● Identify important elements & how they should be nested
● Don’t make the system too deep
● Choose brief, understandable names
● Document it!
Make a system. Share the system. Follow the system
Storing it: Versioning: Did you change the file?
Change the name!
Indicate versions● filename_v001
● report_draft_r045
● report_final_r176
● presentation_20140706
Indicate responsibility● Initials: file_v05_gh
● ID designation: file_v05_iam37
Make a system. Share the system. Follow the system
Storing it:Columbia resources● Lionmail drive● Academic Commons● The Libraries● Departmental IT
Sharing it:
But my PI told me to do it this way?
Sharing it:File naming & organizationCollaborating on a complex project?
Make sure to share and agree on your naming, organizational, and versioning systems!
Make a system. Share the system. Follow the system
Research Information & Data (RID) sharing:
:● What: Unique, reusable, relevant data
● With whom: Your future self! Your collaborators. Your research community. The world. (mind restrictions, etc.)
● When: During the project with collaborators. At pre-determined project stages. At project completion.
● How: Data Publication
● Frequently required by funding organizations
RID sharing – What?
Not all data should be archived or be kept for the same time, or in the same way. Appraise your data on the following principles:● Relevance to research mission
● Historical or scientific value
● Uniqueness
● Reliability / Integrity / Usability of data
● Replicability, or lack thereof
● Cost of management and preservation
● Adequate available documentation
● Satisfaction of requirements
RID sharing – With whom
YOU
http://openarchaeologydata.metajnl.com/about/ , modified
RID sharing – When (depends on whom)
all of the
time!
RID sharing – How, Data Publishing
● Data publication in repositories○ Institutional: http://academiccommons.columbia.
edu/○ Disciplinary, Directory: http://www.re3data.org/○ Requirements
■ long-term storage and access to data■ validation of data integrity [check-sum]■ a permanent resource locator (e.g., DOI, Purl,
hdl) to make its data persistent, unique, and citable
● Data descriptors● Data papers● Supplementary material
Using it:Columbia resources
● Open Source Software● Licensed Software● Specialized Software● High Performance Computing
Responsibility
Questions?
Contact us:Ellie | Research Services Coordinator | [email protected]
Amy | Research Data Manager | [email protected]