treasure trove of data: conducting research using federal statistical surveys
DESCRIPTION
Treasure Trove of Data: Conducting Research Using Federal Statistical Surveys. So many unanswered research questions…. Census Publications. The World of Printed Reports: Statistical Abstract , 1902, 580 pages. Cost of Living Measurement. - PowerPoint PPT PresentationTRANSCRIPT
Treasure Trove of Data: Conducting Research Using Federal Statistical Surveys
So many unanswered research questions…
Census Publications3
The World of Printed Reports:
Statistical Abstract, 1902,
580 pages
4
5
Cost of Living Measurement
6
… but seriously folks… There is a Hierarchy of Federal Data
Published aggregates – dating back over a Century but also (mostly available electronically) Some predetermined geography and categories Thinner the data “slice” the more confidentiality
protection, i.e. the data’s not there anymorePublic Use file
A sub-sample of the data, only feasible for large samples
…but also with confidentiality protection (see above)
Synthetic Data (new approach)Restricted Use Micro Data
Proposals for research required Special access arrangements, terms of use, etc.
Public use data
Census Research Data centers
Demographic Data
1970, 1980, 1990 and 2000 Decennial Long Form (back to 1940 soon)
American Community Survey (effectively replacing the long form)
March CPS Earnings Supplements
Survey of Income and Program Participation
American Housing Survey
Economic Data Sets
Annual Survey of ManufacturesCensus of ConstructionCensus of Finance and InsuranceCensus of ManufacturesCensus of MiningCensus of Real EstateCensus of RetailCensus of ServicesCensus of TransportationCensus of WholesaleCharacteristics of Business Owners SurveyCommodity Flow Survey
Auxiliary Establishment SurveyLongitudinal Business DatabaseLongitudinal Research DatabaseManufacturing Energy Consumption Survey Medical Expenditure Panel Survey, Insurance ComponentNational Employer SurveyPollution Abatement Costs and ExpendituresQuarterly Financial ReportsResearch and Development SurveySurvey of Manufacturing TechnologyWorker Establishment Characteristics DatabaseR&D and Innovation Survey
Read the Forms!
Linked Household / Business data
Longitudinal Employer Household Dynamics (LEHD) Links households to place of employment Based on unemployment insurance administrative
records Covers most states Quarterly starting in 1990 “Tracks” a person based on their place of
employment Establishment (i.e. the place of work) is exact for single
plant companies Establishment is assigned for all others (using geography
and industry to improve matches)
Google “LEHD on the map”…
How to Apply
Preliminary Proposal Must Meet Basic Requirements Need for Non-Public data Maintains Confidentiality Feasibility Describes Census Benefits
(LEGAL REQUIREMENT) Scientific Merit
Work with Census Administrator to Craft Final Proposal
Restricted use Health data
Why is there health data at the Census RDCs?
This data is collected by: National Center for Health Statistics (NCHS) Agency for Healthcare Research and Quality (AHRQ)
Dual mission: to provide broad access to health data and statistics, while protecting the privacy of respondents
Most Research uses the Public Use fileNCHS and AHRQ RDCs created to provide
access to restricted use filesNow available at all Census RDCs
What type of data is it? NCHS Data
National Health Status Surveys National Health and Nutrition
Examination Survey (NHANES) I, II, and III
National Health Interview Survey (NHIS)
Longitudinal Study on Aging I and II (LSOA)
National Survey of Family Growth National Survey of Children's Health National Survey of Early Childhood Health National Survey of Children with Special
Health Care Needs National Survey of Children with Special
Health Care Needs National Asthma SurveyNational Health Care Surveys National Ambulatory Medical Care
Survey National Hospital Ambulatory Medical
Care Survey National Survey of Ambulatory Surgery National Hospital Discharge Survey
o National Nursing Home Survey (NNHS)
o National Home and Hospice Care Survey
o National Employer Health Insurance Survey
o National Health Provider Inventoryo National Immunization SurveyVital Statisticso Mortality and Multiple Mortality o Birtho Fetal Deatho National Death Indexo Marriage and DivorceLinked Data Setso Linked mortality data: NHIS,
NHANES LSOA II, NNHSo Linked Medicare Enrollment and
Claims data: NHIS, NHANES, LSOA II
o Linked Social Security Administration Data: NHIS, NHANES, LSOA II, NNHS
o Linked EPA data
What is restricted in the public use files but available in the RDC?
Every survey has at least some data that is restricted for confidentiality
Data can be restricted in a number of ways: Individual variables:
Removed Top-coded, bottom-coded, coarsened or masked Artificial information is substituted
Pieces of datasets are restricted Whole datasets are unavailable (particularly
linked files)
What’s restricted? Variables
Examples of restricted variables: Geographic variables (state, county, or metropolitan
area)Most dates (date of interview, date of death, date of
birth) Income and employment data (industry codes)Specific diagnoses (ICD-9 codes are generally
coarsened) Details about facilities (accreditation, payments,
number of employees)Some information about children and adolescents,
(e.g. height and weight, depression, behavior problems, and drug use)
Some information about race, ethnicity, and country of origin
Contextual data (nearest hospital, % of population with diploma)
Sample design variables (necessary for estimating variances)
What’s restricted? Pieces of datasets
Examples Contextual data: data can be linked to
information about area (e.g., number of hospitals, education in county, MEPS Area Resource File)
Medical Expenditure Panel Survey: Provider, Insurance, and Nursing Home Component
NHANES III: Youth Conduct Disorder Datasets, Los Angeles Demographic Dataset, Diagnostic Interview Schedule for Children
National Survey on Family Growth: self-report data and interviewer comments
What’s restricted? Datasets
Linked data sets: Mortality files linked to NHANES, NHIS, LSOA EPA emissions data linked to NHDS, NHIS, NHANES Social Security linked to NHANES, NHIS, LSOA Medicare files linked to NHANES, NHIS, LSOA
Other datasets unavailable: National Employer Health Insurance Survey National Death Index
How can I access it?
Submit a proposal to NCHS or AHRQNCHS/AHRQ evaluates for feasibility,
availability of computing resources, and likelihood of disclosure of confidential info (NOT for scientific merit)
If approved, researcher sends public use data and code
NCHS/AHRQ staff merges public use data with restricted data to create a file for use by researcher
Files are only created by NCHS/AHRQ staff
How can I access it?
Proposal must include Full research proposal Explanation of why public-use files are insufficient Data dictionary, which must identify files and years,
target sample, and variables Sample code, examples of desired output, and
software requirements Resumes of researchers, sources of funding, and
proposed dates when analysis will take place
How can I access it? (Working through NCHS/AHRQ )
Working at NCHS or AHRQ RDCs (both in Hyattsville, MD) RDC analyst prepares data prior to researcher’s arrival Researchers cannot merge own data sets or work with
more than one data set at time All output and notes must be reviewed before removal;
data files cannot be removed Support is available from RDC staff
Working with NCHS remotely Researchers send code via email and receive output back
via email Only certain SAS/SUDAAN procedures permitted; no
access to micro dataWorking with AHRQ remotely
AHRQ has no remote server Possibility of writing task order for AHRQ