an early prototype of the comprehensive extensible data documentation and access repository (ced 2...
DESCRIPTION
Curating Data Locked within a Secure Environment is difficult By definition: Access is Restricted Lack of Curation throws up a barrier to Future Discovery and Access Replication of Results becomes increasing difficult Important! The Scientific Method depends on the ability to replicate the results of research 3TRANSCRIPT
![Page 1: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/1.jpg)
An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR)
William C. Block and Jeremy Williams,1 John Abowd and Lars Vilhuber,2
and Carl Lagoze3
1 Cornell Institute Social and Economic Research, Cornell University2 Labor Dynamics Institute, Cornell University3 School of Information, University of Michigan
Presentation at the 4th Annual European DDI User Conference (EDDI12)Norwegian Social Science Data Services, Bergen, Norway
3 December, 2012
![Page 2: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/2.jpg)
Outline
The Problem: Curation of Data Locked within a Secure Environment is DifficultNCRN Solution:• CED2AR Prototype• CED2AR Search API• DDI bridging the boundary between
confidential and public metadata
Questions and Discussion
2
![Page 3: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/3.jpg)
Curating Data Locked within a Secure Environment is difficult
• By definition: Access is Restricted
• Lack of Curation throws up a barrier to Future Discovery and Access
• Replication of Results becomes increasing difficult
• Important! The Scientific Method depends on the ability to replicate the results of research 3
![Page 4: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/4.jpg)
Research Opportunities at the Cornell Census Research Data Center
4
![Page 5: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/5.jpg)
The RDC Network
5
![Page 6: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/6.jpg)
6
We see this problem at Cornell: Research with Restricted Data increasing at CISER
![Page 7: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/7.jpg)
7Source: Raj Chetty, http://conference.nber.org/confer/2012/SI2012/LS/ChettySlides.pdf
Increasing Use of Restricted Data in Research
![Page 8: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/8.jpg)
8Source: Raj Chetty, http://conference.nber.org/confer/2012/SI2012/LS/ChettySlides.pdf
Use of Public Use Data Declining
![Page 9: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/9.jpg)
9
Proposed Solution: Cornell’s NCRN Node
Improved documentation and discoverability of both public and restricted data from the federal statistical system
CED2AR
DDI Solution to Confidential Metadata
![Page 10: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/10.jpg)
CED2AR Overview and Goals
• Collect and standardize disparate metadata into a single DDI repository
• Provide a web interface for researchers to access
• Build an API for developers to use
• Use open standards
• Provide thorough documentation10
![Page 11: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/11.jpg)
Acknowledging CS 5150 Contributions
Jeremy Williams*Benjamin PerryJustin Burden
Chantelle FarmerShudan ZhengJessica Kane
*CISER and NCRN staff member, EDDI co-author, and coordinator of the CS5150 team
11
http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Web/
![Page 12: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/12.jpg)
CED2AR Search API
The API Supports all of these query functions:
• Returno a chosen set of fields within the DDI schema
• Whereo a chosen set of supported DDI search fieldso and, or, and noto contains, starts-with, ends-with
• Sorto descending, ascending
• Limito give me results 10-50 from each codebook
The API makes interacting with the repository easier because it abstracts away the underlying XQUERY necessary to perform the query.
12
![Page 13: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/13.jpg)
CED2AR Search API Some Example DDI things (resources):• Codebooks:
o http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Query/codebooks
• Codebook Named SSBo http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Query/code
books/SSB
• Variables of Codebook Named SSBo http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Query/code
books/SSB/variables
• A particular variable in the SSB Codebook named totfam_kidso http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Query/code
books/SSB/variables/totfam_kids
13
![Page 14: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/14.jpg)
CED2AR Search API
Ability to create complex queries across codebooks
• Give me all variables across all codebooks where the variable text contains the word 'house' and the variable label contains the word 'dwelling' but does not start with the word 'number' (and sort it backwards by variable name)
o http://rschweb.ciserrsch.cornell.edu:8080/CED2AR_Query/search?return=variables&where=variabletext=*house*,variablelabel=*dwelling*,variablelabel!=number*&sort=-variablename
14
![Page 15: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/15.jpg)
NCRN DDI Solution at the Variable Level: <dataAccs>
15
![Page 16: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/16.jpg)
Variable Level Solution (continued)
16
![Page 17: An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd](https://reader035.vdocuments.site/reader035/viewer/2022062906/5a4d1b227f8b9ab059995a82/html5/thumbnails/17.jpg)
No DDI Solution at the level of a Value Label
Small tweak to the DDI Codebook Schema would fix this.17