cal poly - data management and the dmptool
Post on 17-Oct-2014
387 views
DESCRIPTION
October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University. Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.TRANSCRIPT
Carly Strasser | @carlystrasser California Digital Library
Cal Poly Oct 2013
From Flickr by OZinOH
DMPTool The Data Management Planning Tool
Background
Current DMP tools
A walkthrough of DMPTool
DMPTool2
Roadmap
From
Flickr by (L
uciano
)
From
Flickr by Flickm
or
From
Flickr by US Arm
y En
vironm
ental C
omman
d
From
Flickr by DW08
25
C. Strasser
Courtesey of W
HOI
From
Flickr by deltaMike
From Flickr by ~Minnea~
Data management Documentation Reproducibility
From Flickr by ahockey
DMPs!
A document that describes what you will do with your
data both during your research
and after you complete your project
From Flickr by Barbies Land
What is a data management plan?
For funders: A short plan submitted alongside grant applications
But they all have different requirements and express them in
different ways
From Flickr by 401(K) 2013
An outline of – what will be collected – methods – Standards – Metadata – sharing/access – long-‐term storage
Includes how and why
• Saves time • Increases research efficiency
• Satisfies requirements
Why prepare a DMP? From Flickr by natalinha
When do you plan?
Yes.
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Proposal writing
Research
Publication
Ideas
When do you plan?
Remember!
From Flickr by gmacfadyen
• Keep your plan current • Incorporate changes • Use as a guide for daily
activities
From
Flickr by celikins
Where to start?
Small & Simple • Document what you know now • Share the plan with your team • Avoid procrastination and immobilization
Where to start?
Unruly Data
Researcher
Research Support Office
Data Library / Repository
Computing Support
Faculty Ethics Committee Etc...
DMP?
Courtesy of Martin Donnelly
Who: Support Services & Collaborators
Plan section Support
Information about data PI, co-‐PIs, research staff
Metadata content & format Librarians, data repositories
Policies for access, sharing, reuse
Funder, institute, HIPPA, IRB, users
Long-‐term storage and data management
Librarians, IT staff, data repositories
Budget Sponsored programs office, funder
DMPs: A Short History
Liz Lyon: Dealing with Data 2008
UK funder expectations 2009
2009-‐10
Federal Funding Accountability and Transparency Act 2006
Across the Pond…
2010 2010 –present
DMPs: A Short History
DMP supplement may include: 1. the types of data, samples, physical collections, software, curriculum
materials, and other materials to be produced in the course of the project
2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)
3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
4. policies and provisions for re-‐use, re-‐distribution, and the production of derivatives
5. plans for archiving data, samples, and other research products, and for preservation of access to them
NSF DMP Requirements From Grant Proposal Guidelines:
• Types of data produced
• Relationship to existing data
• How/when/where will the data be captured or created?
• How will the data be processed?
• Quality assurance & quality control measures
• Security: version control, backing up
• Who will be responsible for data management during/after project?
1. Types of data & other information
biology.kenyon.edu
C. Strasser
From Flickr by Lazurite
Wired.com
• What metadata are needed to make the data meaningful? • How will you create or capture these metadata? • Why have you chosen particular standards and approaches
for metadata?
2. Data & metadata standards
• Are you under any obligation to share data?
• How, when, & where will you make the data available?
• What is the process for gaining access to the data?
• Who owns the copyright and/or intellectual property?
• Will you retain rights before opening data to wider use? How long? • Are permission restrictions necessary? • Embargo periods for political/commercial/patent reasons? • Ethical and privacy issues? • Who are the foreseeable data users? • How should your data be cited?
3. Policies for access & sharing 4. Policies for re-‐use & re-‐distribution
• What data will be preserved for the long term? For how long?
• Where will data be preserved?
• What data transformations need to occur before preservation?
5. Plans for archiving & preservation
From Flickr by theManWhoSurfedTooMuch
• What metadata will be submitted alongside the datasets?
• Who will be responsible for preparing data for preservation? Who will be the main contact person for the archived data?
Don’t forget: Budget
• Costs of data preparation & documentation Hardware, software Personnel Archive fees
• How costs will be paid Request funding!
dorrvs.com
NSF’s Vision*
DMPs and their evaluation will grow & change over time
Peer review will determine next steps
Community-‐driven guidelines
Evaluation will vary with directorate, division, & program officer
*Unofficially
• Project name: Effects of temperature and salinity on population growth of the estuarine copepod, Eurytemora affinis
• Project participants and affiliations: Carly Strasser (University of Alberta and Dalhousie University) Mark Lewis (University of Alberta) Claudio DiBacco (Dalhousie University and Bedford Institute of Oceanography)
• Funding agency: CAISN (Canadian Aquatic Invasive Species Network) • • Description of project aims and purpose: • We will rear populations of E. affinis in the laboratory at three temperatures and three
salinities (9 treatments total). We will document the population from hatching to death, noting the proportion of individuals in each stage over time. The data collected will be used to parameterize population models of E. affinis. We will build a model of population growth as a function of temperature and salinity. This will be useful for studies of invasive copepod populations in the Northeast Pacific.
• Video Source: Plankton Copepods. Video. Encyclopædia Britannica Online. Web. 13 Jun. 2011
A DMP Example (1)
• 1. Information about data • Every two days, we will subsample E. affinis populations growing at our
treatment conditions. We will use a microscope to identify the stage and sex of the subsampled individuals. We will document the information first in a laboratory notebook, then copy the data into an Excel spreadsheet. For quality control, values will be entered separately by two different people to ensure accuracy. The Excel spreadsheet will be saved as a comma-‐separated value (.csv) file daily and backed up to a server. After all data are collected, the Excel spreadsheet will be saved as a .csv file and imported into the program R for statistical analysis. Strasser will be responsible for all data management during and after data collection.
• Our short-‐term data storage plan, which will be used during the experiment, will be to save copies of 1) the .txt metadata file and 2) the Excel spreadsheet as .csv files to an external drive, and to take the external drive off site nightly. We will use the Subversion version control system to update our data and metadata files daily on the University of Alberta Mathematics Department server. We will also have the laboratory notebook as a hard copy backup.
A DMP Example (2)
• 2. Metadata format & content • We will first document our metadata by taking careful notes in the laboratory
notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is on of the accepted formats used in Ecology, and works well for the type of data we will be producing. We will create these metadata using Morpho software, available through the Knowledge Network for Biocomplexity (KNB). The documentation and metadata will describe the data files and the context of the measurements.
A DMP Example (3)
3. Policies for access, sharing & reuse • We are required to share our data with the CAISN network after all data
have been collected and metadata have been generated. This should be no more than 6 months after the experiments are completed. In order to gain access to CAISN data, interested parties must contact the CAISN data manager ([email protected]) or the authors and explain their intended use. Data requests will be approved by the authors after review of the proposed use.
• The authors will retain rights to the data until the resulting publication is produced, within two years of data production. After publication (or after two years, whichever is first), the authors will open data to public use. After publication, we will submit our data to the KNB allowing discovery and use by the wider scientific community. Interested parties will be able to download the data directly from KNB without contacting the authors, but will still be encouraged to give credit to the authors for the data used by citing a KNB accession number either in the publication’s text or in the references list.
A DMP Example (4)
4. Long-‐term storage and data management The data set will be submitted to KNB for long-‐term preservation and storage. The authors will submit metadata in EML format along with the data to facilitate its reuse. Strasser will be responsible for updating metadata and data author contact information in the KNB.
• 5. Budget • A tablet computer will be used for data collection in the field, which
will cost approximately $500. Data documentation and preparation for reuse and storage will require approximately one month of salary for one technician. The technician will be responsible for data entry, quality control and assurance, and metadata generation. These costs are included in the budget in lines 12-‐16.
A DMP Example (5)
From
Flickr by thew
matt
Background
Current DMP tools
A walkthrough of DMPTool
DMPTool2
Roadmap
From
Flickr by (L
uciano
)
Step-‐by-‐step wizard for generating DMP
Create | edit | re-‐use | share | save | generate
Open to community
DMPonline: dmponline.dcc.ac.uk
From Flickr by Max Chu
dmptool.org
DMPTool Project
• Partners started working in January 2011 • Developed requirements, divided work • Self-‐funded / In-‐kind
• Free • Guides through creating a DMP • Helps meet funder requirements
• Supplies questions • Includes explanation/context provided by
the agency • Provides links to the agency website
dmptool.org
Wait!
Data management planning is complex & requires dialog
Range of support & understanding
Our focus: • simplify & scale the common parts • develop community • provide incremental improvement
in functionality
From Flickr by ChrisGoldNY
Background
Current DMP tools
A walkthrough of DMPTool
DMPTool2
Roadmap
From
Flickr by (L
uciano
)
From Flickr by Clonny
Researchers like it here
DMPTool can be added to campus single sign-‐on service
Researchers use campus login to access tool
Access
Customized Resources
Institution-‐specific… • Help text • Links to resources & services • Suggested answers …at different levels • All DMPs • All DMPs for a particular
funding agency • Question within a data
management plan
From Flickr by lumachrome
From Flickr by OZinOH
DMPTool2
DMPTool Uptake
0
100
200
300
400
500
600
700
800
900
1000
0
1000
2000
3000
4000
5000
6000
7000
Num
ber o
f Ins
titution
s
Num
ber o
f Plans
(solid) &
Uniqu
e Use
rs (d
ashe
d)
Unique Users Plans Institutions
DMPTool 2: Responding to the Community
Coming
Soon!
Plan Creators & Plan
Administrators
2
• Collaborative plan creation • Role-‐based user authorization & access • Better plan templates & resources
Improvements for Plan Creators
2
• Template creation: • Better plan template granularity discipline, funder, question • Better institution granularity
department, college, lab group, … • Enhanced search and browse of plans • Access to metrics for reporting & follow-‐up
New administrator Interface
2
What this means for plan creators: • Better plans • More granular help • Local input & assistance
2 Open RESTful API
?
Yelp
Google Maps
APIs
DMPTool
• Carefully thought out code • Invisible to user • Expose specific functionality and/or data
• Other functionality/data protected
Other stuff
API
Interactions Popularity
API Benefits
Improve functionality Add more functionality
Combine with their services
IMLS Grant
Improving Data Stewardship with the DMPTool
Provide librarians with the tools and resources
to claim the data management education space
blog.dmptool.org
facebook.com/dmptool
Email Tweet me My slides
[email protected] @carlystrasser slideshare.net/carlystrasser
Questions? Website Twitter
Blog
dmptool.org @TheDMPTool blog.dmptool.org