managing provenance in the social sciences: the data documentation initiative (ddi)
TRANSCRIPT
![Page 1: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/1.jpg)
Managing provenance in the Social Sciences: The Data Documentation Initiative (DDI)
Dr. Steve McEachernDirector, Australian Data Archive
![Page 2: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/2.jpg)
ADA in Brief
• The Social Science Data Archive (now ADA) was set up in 1981, housed in the Research School of Social Sciences, with a mission to collect and preserve Australian social science data on behalf of the social science research community
• The Archive holds over 5000 datasets from around 1500 studies, including national election studies; public opinion polls; social attitudes surveys, censuses, aggregate statistics, administrative data and many other sources.
• Data holdings are sourced from academic, government and private sectors.
![Page 3: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/3.jpg)
So what is a data archive?
• ‘A “trusted system” that provides... an accessible and comprehensive service empowering researchers to locate, request, retrieve and use data resources in a simple, seamless and cost effective way, while at the same time protecting the privacy, confidentiality and intellectual property rights of those involved.’
Social Sciences and Humanities Research Council of Canada. “National Data Archive Consultation Final Report: Building Infrastructure for Access to and Preservation of Research Data in Canada” URL: http://www.sshrc.ca/web/whatsnew/initiatives/da_finalreport_e.pdf [20 November 2003].
![Page 5: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/5.jpg)
About DDI
• A structured metadata specification of and for the community
• Two major development lines – XML Schemas– DDI Codebook– DDI Lifecycle
• Additional specifications:– Controlled vocabularies– RDF vocabularies for use with Linked Data
• Model based version is in development– with serialisations in XML and RDF– Includes support for provenance and process models
• Managed by the DDI Alliance– http://www.ddialliance.org
![Page 6: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/6.jpg)
DDI-Codebook
• XML based, first published in 2000• Four sections:
1. Document description: characteristics of the DDI XML document itself
2. Study description: characteristics of the Study (project) that the DDI is describing (including Related Materials: documents associated with the project, such as questionnaires, codebooks, etc.)
3. File description: characteristics of the physical data files4. Variable description: characteristics of the variables in the
data file
![Page 7: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/7.jpg)
DDI Lifecycle Model
S03 7Metadata Reuse
![Page 8: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/8.jpg)
Why can DDI Lifecycledo more?
• It is machine-actionable – not just documentary• It’s more complex with a tighter structure • It manages metadata objects through a structured
identification and reference system that allows sharing between organizations
• It has greater support for related standards• Reuse of metadata within the lifecycle of a study and
between studies
S05 8
![Page 9: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/9.jpg)
DDI Lifecycle Features
• Support for CAI instruments
• Support for longitudinal surveys
• Focus on comparison, both by design and after-the-fact (harmonization)
• Robust record and file linkages for complex data files
• Support for geographic content (shape and boundary files)
• Capability for registries and question banks
![Page 10: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/10.jpg)
Provenance in DDI
![Page 11: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/11.jpg)
DDI Codebook
• Human readable provenance• Studies:
– Attribution– Methodology– Data processing, collection, etc.– Related materials: questionnaires, technical reports, …
• Variables:– Variable name, values, labels, type– Question text– Notes
![Page 12: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/12.jpg)
DDI Lifecycle
• Machine actionable provenance• Studies:
– Attribution– Methodology– Data processing, collection, etc.– Related materials: questionnaires, technical reports, …
• Variables:– Questions– Variables– Code lists– Universes (i.e. population)– All are maintainable and re-usable– Allows provenance of concepts across studies
![Page 13: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/13.jpg)
DDI 4 (a.k.a. Views)
• Machine actionable provenance• Variable hierarchy:
– Conceptual Variable– Represented Variable– Instance Variable– Each inherits from the level above
• Management of codes and categories across the lifecycle– E.g. management of a set of missing values
• Management and transformation of individual datum(s)– Process model for data transformation and validation
![Page 14: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/14.jpg)
Managing and Depositing Data: ADA and DDI
![Page 15: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/15.jpg)
Approach
• Core archive website: – http://www.ada.edu.au
• Sub-archives focussed on specialised thematic or methodological areas- eg. http://www.ada.edu.au/indigenous/home
• “Add-on” systems for complex analysis or visualisation tasks:– Nesstar– GIS: http://gis-test.ada.edu.au– Longitudinal visualisation: Panemalia– Historical census data: http://hccda.ada.edu.au
![Page 16: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/16.jpg)
OAIS architecture
![Page 17: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/17.jpg)
Data deposit: ADAPT
![Page 18: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/18.jpg)
![Page 19: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/19.jpg)
Archival processing
Manual system with some automation tools1. Deposit:
– Review of ADAPT submission– Storage via ADAPT to file store
2. Data processing:– File format conversion (usually to SPSS for processing)– Privacy/confidentiality review– Data cleaning (in consultation with depositor)
3. Metadata processing:– DDI-C metadata creation in Nesstar Publisher
4. Publishing:– Archival storage and access format creation– Data publication to Nesstar server– Metadata publication to Nesstar and ADA CMS
![Page 20: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/20.jpg)
The ADA study page
Study information is available through the tabs at the top of the study:
• Study: information including the investigators, abstract, sample, data collection methods, and access requirements.
• Variables: a list of variables available in a quantitative dataset• Related Materials: additional documentation, links and other
related studies (eg. others in the series) that may interest youThe study page is also the access point for the ADA Nesstar
system, for:• Analysis of quantitative data online, • Download of data to your own computer.
![Page 21: Managing provenance in the Social Sciences: the Data Documentation Initiative (DDI)](https://reader035.vdocuments.site/reader035/viewer/2022062522/58d046661a28ab8e5b8b646f/html5/thumbnails/21.jpg)
The ADA Study Page