treasure trove of data:  conducting research using federal statistical surveys

24
Treasure Trove of Data: Conducting Research Using Federal Statistical Surveys

Upload: xarles

Post on 23-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys. So many unanswered research questions…. Census Publications. The World of Printed Reports: Statistical Abstract , 1902, 580 pages. Cost of Living Measurement. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys

Page 2: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

So many unanswered research questions…

Page 3: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Census Publications3

Page 4: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

The World of Printed Reports:

Statistical Abstract, 1902,

580 pages

4

Page 5: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

5

Page 6: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Cost of Living Measurement

6

Page 7: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

… but seriously folks… There is a Hierarchy of Federal Data

Published aggregates – dating back over a Century but also (mostly available electronically) Some predetermined geography and categories Thinner the data “slice” the more confidentiality

protection, i.e. the data’s not there anymorePublic Use file

A sub-sample of the data, only feasible for large samples

…but also with confidentiality protection (see above)

Synthetic Data (new approach)Restricted Use Micro Data

Proposals for research required Special access arrangements, terms of use, etc.

Page 8: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Public use data

Page 9: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Census Research Data centers

Page 10: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Demographic Data

1970, 1980, 1990 and 2000 Decennial Long Form (back to 1940 soon)

American Community Survey (effectively replacing the long form)

March CPS Earnings Supplements

Survey of Income and Program Participation

American Housing Survey

Page 11: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Economic Data Sets

Annual Survey of ManufacturesCensus of ConstructionCensus of Finance and InsuranceCensus of ManufacturesCensus of MiningCensus of Real EstateCensus of RetailCensus of ServicesCensus of TransportationCensus of WholesaleCharacteristics of Business Owners SurveyCommodity Flow Survey

Auxiliary Establishment SurveyLongitudinal Business DatabaseLongitudinal Research DatabaseManufacturing Energy Consumption Survey Medical Expenditure Panel Survey, Insurance ComponentNational Employer SurveyPollution Abatement Costs and ExpendituresQuarterly Financial ReportsResearch and Development SurveySurvey of Manufacturing TechnologyWorker Establishment Characteristics DatabaseR&D and Innovation Survey

Page 12: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Read the Forms!

Page 13: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Linked Household / Business data

Longitudinal Employer Household Dynamics (LEHD) Links households to place of employment Based on unemployment insurance administrative

records Covers most states Quarterly starting in 1990 “Tracks” a person based on their place of

employment Establishment (i.e. the place of work) is exact for single

plant companies Establishment is assigned for all others (using geography

and industry to improve matches)

Google “LEHD on the map”…

Page 14: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

How to Apply

Preliminary Proposal Must Meet Basic Requirements Need for Non-Public data Maintains Confidentiality Feasibility Describes Census Benefits

(LEGAL REQUIREMENT) Scientific Merit

Work with Census Administrator to Craft Final Proposal

Page 15: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Restricted use Health data

Page 16: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

Why is there health data at the Census RDCs?

This data is collected by: National Center for Health Statistics (NCHS) Agency for Healthcare Research and Quality (AHRQ)

Dual mission: to provide broad access to health data and statistics, while protecting the privacy of respondents

Most Research uses the Public Use fileNCHS and AHRQ RDCs created to provide

access to restricted use filesNow available at all Census RDCs

Page 17: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

What type of data is it? NCHS Data

National Health Status Surveys National Health and Nutrition

Examination Survey (NHANES) I, II, and III

National Health Interview Survey (NHIS)

Longitudinal Study on Aging I and II (LSOA)

National Survey of Family Growth National Survey of Children's Health National Survey of Early Childhood Health National Survey of Children with Special

Health Care Needs National Survey of Children with Special

Health Care Needs National Asthma SurveyNational Health Care Surveys National Ambulatory Medical Care

Survey National Hospital Ambulatory Medical

Care Survey National Survey of Ambulatory Surgery National Hospital Discharge Survey

o National Nursing Home Survey (NNHS)

o National Home and Hospice Care Survey

o National Employer Health Insurance Survey

o National Health Provider Inventoryo National Immunization SurveyVital Statisticso Mortality and Multiple Mortality o Birtho Fetal Deatho National Death Indexo Marriage and DivorceLinked Data Setso Linked mortality data: NHIS,

NHANES LSOA II, NNHSo Linked Medicare Enrollment and

Claims data: NHIS, NHANES, LSOA II

o Linked Social Security Administration Data: NHIS, NHANES, LSOA II, NNHS

o Linked EPA data

Page 18: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

What is restricted in the public use files but available in the RDC?

Every survey has at least some data that is restricted for confidentiality

Data can be restricted in a number of ways: Individual variables:

Removed Top-coded, bottom-coded, coarsened or masked Artificial information is substituted

Pieces of datasets are restricted Whole datasets are unavailable (particularly

linked files)

Page 19: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

What’s restricted? Variables

Examples of restricted variables: Geographic variables (state, county, or metropolitan

area)Most dates (date of interview, date of death, date of

birth) Income and employment data (industry codes)Specific diagnoses (ICD-9 codes are generally

coarsened) Details about facilities (accreditation, payments,

number of employees)Some information about children and adolescents,

(e.g. height and weight, depression, behavior problems, and drug use)

Some information about race, ethnicity, and country of origin

Contextual data (nearest hospital, % of population with diploma)

Sample design variables (necessary for estimating variances)

Page 20: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

What’s restricted? Pieces of datasets

Examples Contextual data: data can be linked to

information about area (e.g., number of hospitals, education in county, MEPS Area Resource File)

Medical Expenditure Panel Survey: Provider, Insurance, and Nursing Home Component

NHANES III: Youth Conduct Disorder Datasets, Los Angeles Demographic Dataset, Diagnostic Interview Schedule for Children

National Survey on Family Growth: self-report data and interviewer comments

Page 21: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

What’s restricted? Datasets

Linked data sets: Mortality files linked to NHANES, NHIS, LSOA EPA emissions data linked to NHDS, NHIS, NHANES Social Security linked to NHANES, NHIS, LSOA Medicare files linked to NHANES, NHIS, LSOA

Other datasets unavailable: National Employer Health Insurance Survey National Death Index

Page 22: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

How can I access it?

Submit a proposal to NCHS or AHRQNCHS/AHRQ evaluates for feasibility,

availability of computing resources, and likelihood of disclosure of confidential info (NOT for scientific merit)

If approved, researcher sends public use data and code

NCHS/AHRQ staff merges public use data with restricted data to create a file for use by researcher

Files are only created by NCHS/AHRQ staff

Page 23: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

How can I access it?

Proposal must include Full research proposal Explanation of why public-use files are insufficient Data dictionary, which must identify files and years,

target sample, and variables Sample code, examples of desired output, and

software requirements Resumes of researchers, sources of funding, and

proposed dates when analysis will take place

Page 24: Treasure Trove of Data:   Conducting Research Using Federal Statistical Surveys

How can I access it? (Working through NCHS/AHRQ )

Working at NCHS or AHRQ RDCs (both in Hyattsville, MD) RDC analyst prepares data prior to researcher’s arrival Researchers cannot merge own data sets or work with

more than one data set at time All output and notes must be reviewed before removal;

data files cannot be removed Support is available from RDC staff

Working with NCHS remotely Researchers send code via email and receive output back

via email Only certain SAS/SUDAAN procedures permitted; no

access to micro dataWorking with AHRQ remotely

AHRQ has no remote server Possibility of writing task order for AHRQ