using dataverse virtual archive technology for research data management
DESCRIPTION
One of the most important components of research is access to quality data. Digital data archives must work to increase submission rates to insure that quality data exist for future researchers. This is a challenge given that recent studies show that vast amounts of data collected during publicly funded projects are not being archived. Even the best-planned methodology will not succeed when researchers use tainted data or fail to find adequate data. Social science data archivists play a key role in the effort to maintain quality sources of data for social science investigators to repurpose and reuse. The dynamic, circular movement of data between the producers and archives is critical to the future of social science research. Data archives have historically provided for this data interchange using considerable human capital. Dedicated archivists and investigators have worked together to ensure that data were processed and placed into an archive best designed for their preservation, a manual process that has become increasingly expensive and unwieldy due to the volume of data being produced and the advanced metadata required to provide future researchers enough details to reuse the study. Typical methods have the researchers working with the archives to deposit the data long after the project has been complete and the papers published. The manual creation of metadata at this point takes far long than if it were collected earlier in the research life cycle. Recent advances in archival repository software may be the key to streamlining this increasingly inefficient archival process by allowing archivist and researchers the ability to create detailed metadata earlier in the research lifecycle at a point where it will take far less time. Software allows researchers greater personal control over archival ingest processes, bridging the gap between researchers and archives and possibly increasing submission rates of valuable data to archives. Archival technology provides tools that manage automated ingest, data cataloging, advanced search and indexing, and rights and access issues. Archival tools also provide proper citation, creation of persistent identifiers, automatic creation of preservation formats, format migration, and statistical analysis of data. Customized branding and citation management can provide investigators collecting these data with a tool that will ensure that they get the credit they deserve. The Dataverse Network Technology has the potential to aid many research groups at UNC in the data management processes and has the potential for use in many disciplines. This presentation will explain the technology and its applicability for managing research data.TRANSCRIPT
Jonathan CrabtreeCheryl Thompson
Using Dataverse Virtual Archive Technology
for Research Data Management
OutlineOverview of Odum and issues around data managementConcepts around Dataverse and federated data systemsA look into Dataverse Virtual ArchivesFeatures of the Dataverse NetworkBenefits to Researchers & IT providers Exploring new possibilities
H. W. Odum Institute Archive Services• The Howard W. Odum Institute was founded in 1924.
• It is the oldest multidisciplinary social science university institute.
• Odum Archive Services is host to the third largest catalog of machine-readable social science data in the U.S.
• Founding member of Data-PASS
• Founding member of The Library of Congress NDSA
• The Odum Dataverse Network (DVN) catalog includes polling, census, and other social science and health-related data.
The ProblemDifferent needs for archives, data libraries,
researchers, journals, funding agencies…
We should preserve the
data
We should preserve the
data
I want credit for my data
I want credit for my data
We need persistent
links
We need persistent
links I need a Data Management
Plan
I need a Data Management
Plan
No publications without data
No publications without data
Cross, M. Why the Dataverse Network? Available at: thedata.org
Odum’s SolutionDataverse Network: centralized
professional archiving with distributed control and recognition
Cross, M. Why the Dataverse Network? Available at: thedata.org
•Persistent identifiers•Fixity•Backups & recovery•Metadata standards•Conversion standards•Preservation standards
•Persistent identifiers•Fixity•Backups & recovery•Metadata standards•Conversion standards•Preservation standards
•Branding & visibility•Data discovery•Ease of use•Scholarly citation•Control over updates•Terms of access & use
•Branding & visibility•Data discovery•Ease of use•Scholarly citation•Control over updates•Terms of access & use
How it works?
Cross, M. Why the Dataverse Network? Available at: thedata.org
Supporting dataConvert to a preservation format
(data and metadata)Calculate Universal Numerical
Fingerprint (UNF)Download in multiple formatsDownload a subset of the dataGenerate summary statisticsApply Zelig (R) statistical methodsVisualize time seriesDefine Terms of Use and
Permission
Cross, M. Why the Dataverse Network? Available at: thedata.org
Tabular Data:
STATA
SPSS
CSV + control card
Tab delimited + DDI
Social Network Data:
GraphML
Other data or relevant files:
All formats are accepted BUT only tabular files have full data support
Creating data citationsAuthor(s)YearTitlePersistent URL and IDUNFDistributorVersionOther optional fields
Louis Harris and Associates, Inc., 1992, "Harris 1984 Female Veterans Survey, study no. 843002", http://hdl.handle.net/1902.29/H-843002 UNF:3:4VngKZgBorG/7T6aZSaq1g== Odum Institute;Odum Institute for Research in Social Science [Distributor] V1 [Version]
Cross, M. Why the Dataverse Network? Available at: thedata.org
Managing data and versions
Contributor, curator, admin view End user view
Data File 1
Data File 1
Data File 2
Data File 2
Edit study & add new file
Cross, M. Why the Dataverse Network? Available at: thedata.org
Data never permanently deletedA study is never permanently deleted after it is released. Curators or admins can deaccession the study.
Edit study
This study is deaccessioned. [Go to other study]
Cross, M. Why the Dataverse Network? Available at: thedata.org
Supporting standardsStudy and variable metadata are exported
into XML (Dublin Core, Data Documentation Initiative – DDI, FGDC) and MARC
OAI-PMH for harvesting metadataLOCKSS for data duplication in multiple
locationsZ39.50 for distributed searchE-Z Proxy to authenticate for data accessFederations enable via standards
Cross, M. Why the Dataverse Network? Available at: thedata.org
Replicating data
Dataverse Virtual ArchivesCustom web skinsResearchers retain control of data accessCitations provide academic credit for data collection workEasy access to online research tools
Dataverse FeaturesFederated search & discoveryOnline analysisMulti-format downloadCollection organizationAutomated metadata generationCustom metadata templatesControlled ingest workflows
Data archiving in 4 steps1. Gather and convert study files to the
appropriate format
2. Log into your virtual archive
3. Add a new study
4. Add the study files
Moving beyond social scienceDataverse Network is cross-disciplinary.We are expanding the study metadata and
building communities of interested groups:[email protected]
Cross, M. Why the Dataverse Network? Available at: thedata.org
Benefits to…Researchers:Gives recognition to authors/researchers Creates a permanent data citation with UNFConverts data and study files to a preservable
formatAllows researchers to set who can access the data
(and modify this at a later point)
IT/Computer support:It’s freeDo not need additional software for DataverseOffload long-term data preservation concerns
Questions?Jonathan Crabtree, Asst. Director for
Archives & ITPhone: (919) 962-0517Email: [email protected]
Cheryl A. Thompson, Graduate Research AssistantEmail: [email protected]
Email: [email protected]