1 access to confidential data for statistical analysis kenneth harris, director of research data...

105
1 Access to Access to Confidential Data for Confidential Data for Statistical Analysis Statistical Analysis U .S.D EP A R TM ENT O F HEALTH AND HUM AN SERVICES C enters forD isease C ontroland Prevention N ationalC enterforH ealth Statistics U .S.D EP A R TM ENT O F HEALTH AND HUM AN SERVICES C enters forD isease C ontroland Prevention N ationalC enterforH ealth Statistics Kenneth Harris, Director of Research Data Center

Upload: berniece-bridges

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Access to Access to Confidential Data for Confidential Data for Statistical AnalysisStatistical Analysis

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

Kenneth Harris, Director of Research Data Center

2

National Center for Health Statistics National Center for Health Statistics (NCHS)(NCHS)

Despite the wide dissemination of its data Despite the wide dissemination of its data through publications, CD-ROMs, etc., the through publications, CD-ROMs, etc., the inability to release files with, for instance, inability to release files with, for instance, lower levels of geography, severely limits lower levels of geography, severely limits the utility of some data for research, the utility of some data for research, policy, and programmatic purposes and policy, and programmatic purposes and sets a boundary on one of the Center’s sets a boundary on one of the Center’s goals to increase its capacity to provide goals to increase its capacity to provide state and local area estimates.state and local area estimates.

3

NCHS (cont.)NCHS (cont.)

In pursuit of this goal and in response to the In pursuit of this goal and in response to the research community’s interest in restricted research community’s interest in restricted data, NCHS established the Research data, NCHS established the Research Data Center (RDC), a mechanism Data Center (RDC), a mechanism whereby researchers can access detailed whereby researchers can access detailed data files in a secure environment, without data files in a secure environment, without jeopardizing the confidentiality of the jeopardizing the confidentiality of the respondents.respondents.

4

Research Data CenterResearch Data Center

The NCHS Research Data Center, established The NCHS Research Data Center, established in 1998, is a facility at the NCHS headquarters in 1998, is a facility at the NCHS headquarters in Hyattsville, Maryland, where researchers are in Hyattsville, Maryland, where researchers are granted access to restricted data files needed to granted access to restricted data files needed to complete approved projects. Restricted data complete approved projects. Restricted data files may contain information, such as lower files may contain information, such as lower levels of geography, but do not contain direct levels of geography, but do not contain direct identifiers (e.g., name or social security identifiers (e.g., name or social security number). number).

5

Data RestrictionsData Restrictions

Section 308 (d) of the Public Health Service Section 308 (d) of the Public Health Service Act and the NCHS Staff Confidentiality Act and the NCHS Staff Confidentiality Manual do not permit the release of data Manual do not permit the release of data that are either identified or identifiable to that are either identified or identifiable to persons outside of NCHS.persons outside of NCHS.

6

Data Restrictions (cont.)Data Restrictions (cont.)

Identifiable data include not only direct Identifiable data include not only direct identifiers such as name, social security identifiers such as name, social security number, etc., but also data that can serve number, etc., but also data that can serve to allow inferential identification of either to allow inferential identification of either individual or institutional respondents by a individual or institutional respondents by a number of means.number of means.

7

Data Restrictions (cont.)Data Restrictions (cont.)

Research indicates that identifiability is Research indicates that identifiability is greatly enhanced if geographic identifiers greatly enhanced if geographic identifiers for state, county, census tract, block-group for state, county, census tract, block-group or block are released on public use files.or block are released on public use files.

8

Key Issues for Research Data Key Issues for Research Data AvailabilityAvailability

CONFIDENTIALITYCONFIDENTIALITYThe dissemination of data in a manner that The dissemination of data in a manner that would allow public identification of the would allow public identification of the respondent or would in any way be harmful to respondent or would in any way be harmful to him/her is prohibited and the data are immune him/her is prohibited and the data are immune from legal process. from legal process.

9

Key Issues for Research Data Key Issues for Research Data Availability (cont.)Availability (cont.)

DISCLOSUREDISCLOSUREDisclosure relates to inappropriate attribution of Disclosure relates to inappropriate attribution of information to a data subject, whether an individual or information to a data subject, whether an individual or an organization. Disclosure occurs when a data an organization. Disclosure occurs when a data subject is identified from a released file (identity subject is identified from a released file (identity disclosure), sensitive information about a data subject disclosure), sensitive information about a data subject is revealed through the released file (attribute is revealed through the released file (attribute disclosure), or the released data make it possible to disclosure), or the released data make it possible to determine the value of some characteristic of an determine the value of some characteristic of an individual more accurately than otherwise would have individual more accurately than otherwise would have been possible (inferential disclosure). been possible (inferential disclosure).

10

Appendix I – Rules for the Release Appendix I – Rules for the Release of Micro Data Filesof Micro Data Files

A.A. The data file must not contain any detailed The data file must not contain any detailed information about the subject that could information about the subject that could facilitate identification and that is not essential facilitate identification and that is not essential for research purposes (e.g., exact date of the for research purposes (e.g., exact date of the subject’s birth).subject’s birth).

B.B. Geographic places that have fewer than Geographic places that have fewer than 100,000 people are not to be identified on the 100,000 people are not to be identified on the data file.data file.

C.C. Characteristics of an area are not to appear on Characteristics of an area are not to appear on the data file if they would uniquely identify an the data file if they would uniquely identify an area of less than 100,000 people.area of less than 100,000 people.

11

Appendix I – Rules for the Release Appendix I – Rules for the Release of Micro Data Files (cont.)of Micro Data Files (cont.)

D.D. Information on the drawing of the sample which might assist Information on the drawing of the sample which might assist in identifying a data subject must not be released outside the in identifying a data subject must not be released outside the Center. Thus, the identities of primary sampling units are Center. Thus, the identities of primary sampling units are not to be made available outside the Center.not to be made available outside the Center.

E.E. Before any new or revised micro data files are published, Before any new or revised micro data files are published, they, together with their full documentation, must be they, together with their full documentation, must be approved for publication by the NCHS Director or Deputy approved for publication by the NCHS Director or Deputy Director.Director.

F.F. A micro data file containing confidential data on unidentified A micro data file containing confidential data on unidentified individuals or facilities may not be released to any person or individuals or facilities may not be released to any person or organization outside NCHS until that person, or a organization outside NCHS until that person, or a responsible representative of that organization, has first responsible representative of that organization, has first signed the statement on the Order Form, whereby he gives signed the statement on the Order Form, whereby he gives assurance that the data provided will be used only for assurance that the data provided will be used only for statistical reporting or research purposes.statistical reporting or research purposes.

12

Why NCHS Does Not Release Files Why NCHS Does Not Release Files With Lower Levels of GeographyWith Lower Levels of Geography

Research suggests that in the case of personal surveys Research suggests that in the case of personal surveys nine commonly collected variables result in the table below.nine commonly collected variables result in the table below.

Population ofPopulation of

Geopolitical AreaGeopolitical Area

Percent of Sample Percent of Sample

IdentifiableIdentifiable

25,00025,000 2424

50,00050,000 2020

100,000100,000 1414

200,000200,000 88

300,000300,000 55

400,000400,000 44

500,000500,000 33

13

Why NCHS Does Not Release Files Why NCHS Does Not Release Files With Lower Levels of Geography (cont.)With Lower Levels of Geography (cont.)

Notes: A geopolitical area may be a county, Notes: A geopolitical area may be a county, city, town, or other place with city, town, or other place with

well-well- defined boundaries. defined boundaries.

In this case, identification refers to In this case, identification refers to certaintycertainty identification.identification.

14

How Does RDC Operate?How Does RDC Operate?

On-Site AccessOn-Site Access

Remote AccessRemote Access

Staff Assisted Analytical SessionStaff Assisted Analytical Session

15

User ProceduresUser Procedures

To gain access to NCHS restricted data throughTo gain access to NCHS restricted data through

either method, user must:either method, user must: Submit a research proposal.Submit a research proposal.

An advisory and proposal review committee receives, An advisory and proposal review committee receives, reviews, and approves researcher proposalsreviews, and approves researcher proposals

Proposals are evaluated primarily on the confidentiality Proposals are evaluated primarily on the confidentiality disclosure risk.disclosure risk.

Scientific merit isScientific merit is notnot an evaluation criteria. an evaluation criteria.

Sign an affidavit of confidentiality and promise not to Sign an affidavit of confidentiality and promise not to use any method to attempt to identify respondents.use any method to attempt to identify respondents.

16

User Procedures (cont.)User Procedures (cont.)

Not take any materials or equipment into RDC Not take any materials or equipment into RDC unless approved by RDC staff.unless approved by RDC staff.

Submit data files to be merged onto NCHS data Submit data files to be merged onto NCHS data ahead of time – ahead of time – allall merging is done by RDC staff.merging is done by RDC staff.

Subject all output and/or materials removed from Subject all output and/or materials removed from the RDC to a the RDC to a disclosure limitation review.disclosure limitation review.

May not remove any NCHS restricted data files nor May not remove any NCHS restricted data files nor linked data files.linked data files.

17

Researcher Affidavit of Researcher Affidavit of ConfidentialityConfidentiality

I certify that no confidential data or information viewed or I certify that no confidential data or information viewed or otherwise obtained while I am a researcher in the National otherwise obtained while I am a researcher in the National Center for Health Statistics (NCHS), Research Data Center Center for Health Statistics (NCHS), Research Data Center (RDC) will be removed from NCHS. Further, I understand (RDC) will be removed from NCHS. Further, I understand that NCHS will perform a disclosure review and must that NCHS will perform a disclosure review and must provide approval to me before I remove any data from the provide approval to me before I remove any data from the RDC, whether it be in electronic or paper form. I RDC, whether it be in electronic or paper form. I acknowledge NCHS Confidentiality Statute, 308(d) of the acknowledge NCHS Confidentiality Statute, 308(d) of the Public Health Service Act stated below and fully understand Public Health Service Act stated below and fully understand my legal obligations to NCHS to protect all confidential my legal obligations to NCHS to protect all confidential data. Further I understand any violation I may perform is data. Further I understand any violation I may perform is punishable under 18 United States Code (USC), 1001 punishable under 18 United States Code (USC), 1001 which carries a fine of up to $10,000 or up to 5 years in which carries a fine of up to $10,000 or up to 5 years in prison. prison.

18

Researcher Affidavit of ConfidentialityResearcher Affidavit of Confidentiality (cont.)(cont.)

NCHS 308(d) Confidentiality Statute -NCHS 308(d) Confidentiality Statute - No information, if an No information, if an establishment or person supplying the information or establishment or person supplying the information or described in it is identified, obtained in the course of described in it is identified, obtained in the course of activities undertaken or supported under section 304, 305, activities undertaken or supported under section 304, 305, 306, 307, or 309 may be used for any purpose other than 306, 307, or 309 may be used for any purpose other than the purpose for which it was supplied unless such the purpose for which it was supplied unless such establishment or person has consented to its use for such establishment or person has consented to its use for such other purpose and in the case of information obtained in the other purpose and in the case of information obtained in the course of health statistical or epidemiological activities under course of health statistical or epidemiological activities under section 304 or 306, such information may not be published section 304 or 306, such information may not be published or released in other form if the particular establishment or or released in other form if the particular establishment or person supplying the information or described in it is person supplying the information or described in it is identifiable unless such establishment or person has identifiable unless such establishment or person has consented to its publication or release in other form.consented to its publication or release in other form.

19

Researcher Affidavit of ConfidentialityResearcher Affidavit of Confidentiality (cont.)(cont.)

18 United States Code, 1001 -18 United States Code, 1001 - Deliberately making Deliberately making a false statement in any matter within the jurisdiction a false statement in any matter within the jurisdiction of any Department or Agency of the Federal of any Department or Agency of the Federal Government violates 18 USC 1001 and is Government violates 18 USC 1001 and is punishable by a fine of up to $10,000 or up to 5 punishable by a fine of up to $10,000 or up to 5 years in prison.years in prison.____________________ _______________ ____________________ _______________ Researcher’s Signature Researcher’s Signature Date Date____________________ ___________________________________ _______________NCHS WitnessNCHS Witness Date Date

20

Can Researcher Merge his/her Can Researcher Merge his/her Data with NCHS ?Data with NCHS ?

Must Interact with RDC staff to ensure Must Interact with RDC staff to ensure that their data can be merged with the that their data can be merged with the NCHS data.NCHS data. User-supplied data will be merged withUser-supplied data will be merged with NCHS data by RDC staff only.NCHS data by RDC staff only. The NCHS RDC policy states that merged The NCHS RDC policy states that merged and user-supplied data will not be madeand user-supplied data will not be made available for analysis to anyone withoutavailable for analysis to anyone without the written consent of the user.the written consent of the user.

21

The Cost per ProjectThe Cost per Project

On SiteOn Site$200 per day (2 day minimum)$200 per day (2 day minimum)Remote AccessRemote Access NSFG-CDF = $500/ yearNSFG-CDF = $500/ year NHIS-polio = $500/ yearNHIS-polio = $500/ year NHIS Linked Mort. File = $250/MonthNHIS Linked Mort. File = $250/MonthNHANES Linked Mort. File = $250/MonthNHANES Linked Mort. File = $250/Month

22

The Cost per Project (cont.)The Cost per Project (cont.)

Files <= 130k records = $500 per monthFiles <= 130k records = $500 per month Files > 130k records = $1000 per monthFiles > 130k records = $1000 per month

Staff Assisted VariableStaff Assisted Variable

File Construction and SetupFile Construction and Setup

For Mortality Files = $250 per dayFor Mortality Files = $250 per day

For all Other Files = $500 per dayFor all Other Files = $500 per day

23

Do Doctors perform Do Doctors perform “defensive Cesareans”?“defensive Cesareans”?

OverviewOverview: : This topic re-examined the issues of This topic re-examined the issues of “defensive medicine” and state reforms designed to limit “defensive medicine” and state reforms designed to limit malpractice risk on the use of cesarean section delivery.malpractice risk on the use of cesarean section delivery.

NCHS Data UsedNCHS Data Used: : National Hospital Discharge Survey National Hospital Discharge Survey (NHDS)(NHDS)

Years of Data UsedYears of Data Used: : 1980 through 1992, inclusive.1980 through 1992, inclusive.

User’s Data Merged with NCHS?User’s Data Merged with NCHS? YesYes

Method of Access to NCHS Data:Method of Access to NCHS Data: Remote and Remote and

On-site AccessOn-site Access

Statistical Software Used:Statistical Software Used: SASSAS

24

Economic Model to Explain the Incidence of Economic Model to Explain the Incidence of Sexual Activity, Contraceptive Use, STD, and Sexual Activity, Contraceptive Use, STD, and

Pregnancy Among Teenage Girls.Pregnancy Among Teenage Girls.

OverviewOverview: : National Survey of Family Growth Data provide National Survey of Family Growth Data provide extensive socio-demographic information and reports of the extensive socio-demographic information and reports of the sexual histories of these women. Researcher focused on the sexual histories of these women. Researcher focused on the effects of a number of policies measured at the state-level. effects of a number of policies measured at the state-level. These included:These included:

Parental notification of consent laws.Parental notification of consent laws. Medicaid funding of abortions.Medicaid funding of abortions. Welfare generosity.Welfare generosity.

NCHS Data UsedNCHS Data Used:: National Survey of Family Growth National Survey of Family Growth (NSFG)(NSFG)

User’s Data Merged with NCHS? User’s Data Merged with NCHS? YesYes

Method of Access to NCHS Data:Method of Access to NCHS Data: Remote AccessRemote Access

Statistical Software Used:Statistical Software Used: SASSAS

25

Nursing Home Admission and Nursing Home Admission and Payment Source?Payment Source?

OverviewOverview: : This project tested if patients with Medicare This project tested if patients with Medicare were being discriminated against because their were being discriminated against because their reimbursement rate was significantly below the private reimbursement rate was significantly below the private pay rate for nursing homes. pay rate for nursing homes.

NCHS Data UsedNCHS Data Used: : National Nursing Home Survey National Nursing Home Survey (NNHS) (NNHS)

Years of Data UsedYears of Data Used: : 1985, 1995, and 19971985, 1995, and 1997

User’s Data Merged with NCHS? User’s Data Merged with NCHS? NoNo

Method of Access to NCHS Data: Method of Access to NCHS Data: Remote AccessRemote Access

Statistical Software Used:Statistical Software Used: SASSAS

26

Hardware and SoftwareHardware and SoftwareAll RDC hardware and software are standard.All RDC hardware and software are standard.HardwareHardware

Pentium IV computers with Windows 2000Pentium IV computers with Windows 2000

SoftwareSoftwareSAS (only language on ANDRE)SAS (only language on ANDRE)SudaanSudaanFortranFortranHLMHLMStataStataLimdepLimdeptext editors/viewerstext editors/viewers

Onsite workstations do NOT have email or internet Onsite workstations do NOT have email or internet accessaccess

Only access to printer is through RDC staffOnly access to printer is through RDC staff

27

Record Linkage for Record Linkage for Epidemiologic Research: Epidemiologic Research: Accessing Linked data at the NCHS Accessing Linked data at the NCHS Research Data CenterResearch Data CenterChristine S. CoxNCHS Data Users ConferenceJuly 12, 2006

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

28

Administrative records

Linked Data File

NCHS Surveys

What is Record Linkage?What is Record Linkage?

29

NCHS Linked Data: NCHS Linked Data: Major ActivitiesMajor Activities

MortalityMortality National Death IndexNational Death Index

Health Care Utilization and CostsHealth Care Utilization and Costs Medicare DataMedicare Data

Retirement and DisabilityRetirement and Disability Social Security DataSocial Security Data

30

NCHS Linked Data: Mortality NCHS Linked Data: Mortality

Eligibility statusEligibility status Assigned vital statusAssigned vital status Date of deathDate of death Age at deathAge at death Underlying and multiple causes of deathUnderlying and multiple causes of death Adjusted sample weightsAdjusted sample weights

31

Research Potential of Research Potential of Linked Mortality DataLinked Mortality Data

Living and Dying in the USA: Behavioral, Health, and Social Differentials of Adult MortalityRG Rogers, CB Nam, RA Hummer

A Semiparametric Analysis of the Body Mass Index’s Relationship to MortalityJT Gronniger

The Income-Associated Burden of Disease in the United States P Muennig, P Franks, H Jia, E Lubetkin and MR Gold

Excess Deaths Associated with Underweight, Overweight, and ObesityKM Flegal, BI Graubard, DF Williamson; MH GailJAMA. 2005;293:1861-1867.

32

NCHS Linked Data: MedicareNCHS Linked Data: Medicare

Medicare entitlement and health care utilization Medicare entitlement and health care utilization and payment data for 1991-2000and payment data for 1991-2000 Denominator fileDenominator file

MEDPAR Inpatient hospitalizationMEDPAR Inpatient hospitalization

MEDPAR Skilled nursing facilityMEDPAR Skilled nursing facility

Hospital outpatient Hospital outpatient

Home Health CareHome Health Care

HospiceHospice

Carrier (physician/supplier Part B file)Carrier (physician/supplier Part B file)

Durable Medical EquipmentDurable Medical Equipment

33

Research Potential ofResearch Potential ofLinked Medicare DataLinked Medicare Data

Examine risk factors for health conditionsExamine risk factors for health conditions Examine reliability of survey dataExamine reliability of survey data

Examine survey report of disability with program Examine survey report of disability with program participation eligibility criteriaparticipation eligibility criteria

Compare survey reported health conditions to claims Compare survey reported health conditions to claims recordsrecords

Examine disparities in Medicare service Examine disparities in Medicare service utilizationutilization

34

NCHS Linked Data: Retirement/DisabilityNCHS Linked Data: Retirement/Disability

Social Security data from Retirement, Social Security data from Retirement, Survivors, and Disability Insurance (RSDI) Survivors, and Disability Insurance (RSDI) and Supplemental Security Insurance and Supplemental Security Insurance (SSI) programs(SSI) programs Master Beneficiary Record (MBR)Master Beneficiary Record (MBR)

1962-20031962-2003 Payment History Update System (PHUS)Payment History Update System (PHUS)

1984-20031984-2003 Supplemental Security Record (SSR)Supplemental Security Record (SSR)

1974-20031974-2003

35

Research Potential of Research Potential of Linked Social Security DataLinked Social Security Data

Examine reliability of survey information for SSA Examine reliability of survey information for SSA program participation and benefitsprogram participation and benefits

Compare the health characteristics of those who take Compare the health characteristics of those who take early (age 62) Social Security benefits to those who early (age 62) Social Security benefits to those who postpone benefits postpone benefits

Policy analysis using validated survey dataPolicy analysis using validated survey data Predicting the number of people who will become disabled Predicting the number of people who will become disabled

based upon survey reported health conditions based upon survey reported health conditions Determining whether current disability entitlement funding levels Determining whether current disability entitlement funding levels

will be adequate as the population ageswill be adequate as the population ages

36

Summary NCHS Data LinkageSummary NCHS Data Linkage

XXNNHS 1985

XXXNHANES III

XXNHANES II

XXXNHANES I

XXXLSOA II

XXXNHIS 1994-1998

XNHIS 1986-2000

Retirement & Disability (SSA)

Medicare (CMS)

Mortality (NDI)

37

www.cdc.gov/nchs/r&d/nchs_datalinkage/data_linkage_activities.htm

38

Why can’t you just give Why can’t you just give me the data?me the data?

NCHS does not “own” the linked NCHS does not “own” the linked administrative dataadministrative data

NCHS data confidentiality rules prohibit NCHS data confidentiality rules prohibit the release of potentially identifiable data – the release of potentially identifiable data – special considerations concerning the special considerations concerning the protection of linked dataprotection of linked data

The RDC is the only option for access for The RDC is the only option for access for now….now….

39

Overview: Overview: Data Access ProceduresData Access Procedures

Proposal RequirementsProposal Requirements

Access MethodsAccess Methods

Helpful TipsHelpful Tips

Where to get help?Where to get help?

40

Proposal RequirementsProposal Requirements

Proposal is evaluated by review committee Proposal is evaluated by review committee Review criteriaReview criteria

Scientific and technical feasibilityScientific and technical feasibility Availability of RDC resourcesAvailability of RDC resources Disclosure risk for restricted informationDisclosure risk for restricted information The extent to which project is in accordance The extent to which project is in accordance

with the mission of NCHSwith the mission of NCHS

Special note:Special note: NCHS does not try to NCHS does not try to determine if proposals are duplicativedetermine if proposals are duplicative

41

Proposal RequirementsProposal Requirements

Cover letterCover letter Project titleProject title Abstract (maximum 300 words summarizing Abstract (maximum 300 words summarizing

project)project) Full contact informationFull contact information

Institutional affiliationInstitutional affiliation Mail address, phone, emailMail address, phone, email

Dates of proposed time at RDC (or indication of Dates of proposed time at RDC (or indication of using remote access)using remote access)

Source of funding for proposed researchSource of funding for proposed research

42

Proposal RequirementsProposal Requirements Study backgroundStudy background

Key study questions or hypothesesKey study questions or hypotheses Public health benefitsPublic health benefits

MethodsMethods Analytic approach and statistical methodsAnalytic approach and statistical methods Statistical software requirementsStatistical software requirements

Description of intended output for nondisclosure Description of intended output for nondisclosure review, e.g.review, e.g. Table shellsTable shells Model equationsModel equations Test statistics that researcher plans to remove from Test statistics that researcher plans to remove from

RDCRDC

43

Proposal RequirementsProposal Requirements

Explanation of why restricted data are needed, Explanation of why restricted data are needed, e.g. describe why publicly available data are e.g. describe why publicly available data are insufficientinsufficient

Summary of data requirements to be included in Summary of data requirements to be included in analytic fileanalytic file Identification of sampleIdentification of sample Identification of variablesIdentification of variables

Description of additional data to be supplied by Description of additional data to be supplied by researcher to be merged with NCHS or other researcher to be merged with NCHS or other data source (must clearly identify source of other data source (must clearly identify source of other data)data)

44

Proposal Requirements: Proposal Requirements: AppendicesAppendices

Current Current Curriculum VitaeCurriculum Vitae or resume for each or resume for each investigatorinvestigator

Data dictionary – complete listing of specific Data dictionary – complete listing of specific data requested and its source(s) and indicate if data requested and its source(s) and indicate if public use or restricted access variablespublic use or restricted access variables specific files and yearsspecific files and years samplesample variables (dependent, independent, matching/linking)variables (dependent, independent, matching/linking)

45

Proposal Requirements: Proposal Requirements: AppendicesAppendices

For remote-access applicantsFor remote-access applicants Description of the computer and email system Description of the computer and email system

to be used to receive outputto be used to receive output Security provisions for the computer and Security provisions for the computer and

email systemsemail systems For studentsFor students

Letter from department chair or academic Letter from department chair or academic advisor stating that student is working under advisor stating that student is working under the direction of the departmentthe direction of the department

46

Overview: Overview: RDC Data Access ProceduresRDC Data Access Procedures

Proposal RequirementsProposal Requirements

Access MethodsAccess Methods

Helpful TipsHelpful Tips

Where to get help?Where to get help?

47

Access MethodsAccess Methods

Once approved, three methods to access Once approved, three methods to access restricted datarestricted data on-site - use local computing resources in the NCHS on-site - use local computing resources in the NCHS

RDC, Hyattsville, MDRDC, Hyattsville, MD remote – submit programs electronically to be remote – submit programs electronically to be

executed in the RDC with output returned by emailexecuted in the RDC with output returned by email staff assisted – RDC staff provide on-site staff assisted – RDC staff provide on-site

programming for off-site approved researchersprogramming for off-site approved researchers For all methods of access, restricted data files For all methods of access, restricted data files

remain in RDC and output is inspected for remain in RDC and output is inspected for disclosure violationsdisclosure violations

48

On-Site AccessOn-Site Access

RDC staff constructs necessary data files, RDC staff constructs necessary data files, including merged user dataincluding merged user data

Most statistical packages available with Most statistical packages available with sufficient lead timesufficient lead time

Output subject to disclosure reviewOutput subject to disclosure review

Open only during normal working hoursOpen only during normal working hours

49

Remote Access MethodRemote Access Method

RDC staff constructs necessary data files, RDC staff constructs necessary data files, including merged user dataincluding merged user data

SAS programs only (certain procedures and SAS programs only (certain procedures and functions not allowed) – additional software functions not allowed) – additional software options expectedoptions expected

Both submitted programs and output undergo a Both submitted programs and output undergo a programmed disclosure limitation reviewprogrammed disclosure limitation review

50

RDC Staff-assisted RDC Staff-assisted Programming MethodProgramming Method

Subcontract with the RDC staff to perform Subcontract with the RDC staff to perform programming tasksprogramming tasks

Useful for those planning to use statistical Useful for those planning to use statistical software not available for the remote software not available for the remote system and who are not able to travel to system and who are not able to travel to the RDC facilitythe RDC facility

Cost is estimated for each research Cost is estimated for each research projectproject

51

Overview: Overview: RDC Data Access ProceduresRDC Data Access Procedures

Proposal RequirementsProposal Requirements

Access MethodsAccess Methods

Helpful TipsHelpful Tips

Where to get help?Where to get help?

52

RDC Helpful TipsRDC Helpful Tips

Be clear about research and data Be clear about research and data requirements (helps to determine requirements (helps to determine feasibility of project)feasibility of project) Clearly identify the sample to be included in Clearly identify the sample to be included in

the analytic filethe analytic file Provide data dictionaries for bothProvide data dictionaries for both

Public use dataPublic use data Restricted dataRestricted data

Provide examples of expected outputProvide examples of expected output

53

Overview: Overview: RDC Data Access ProceduresRDC Data Access Procedures

Proposal RequirementsProposal Requirements

Access MethodsAccess Methods

Helpful TipsHelpful Tips

Where to get help?Where to get help?

54

Visit the RDC at: www.cdc.gov/nchs/r&d/rdc.htm or email: [email protected]

55

LINKED DATA, CONTEXTUAL DATA, LINKED DATA, CONTEXTUAL DATA, and GEO-CODINGand GEO-CODING

ON-SITE and STAFF-ASSISTED ON-SITE and STAFF-ASSISTED DATA ACCESSDATA ACCESS

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

Christopher RogersResearch Data Center

[email protected]

56

Why Link Data Sets?Why Link Data Sets?

Improve modeling and make use of Improve modeling and make use of existing data.existing data.

Compensate for increased difficulties Compensate for increased difficulties taking surveys.taking surveys.

Open your mind. Open your mind.

Common Example: Common Example:

Economic variables versus Ethnic Economic variables versus Ethnic variablesvariables

57

Historical TrendsHistorical Trends

More linking of scientific data sets More linking of scientific data sets between government agencies. between government agencies. Confidential Information Protection Confidential Information Protection and Statistical Efficiency Act of and Statistical Efficiency Act of 2002 (2002 (CIPSEA.CIPSEA.) )

Confused political and social Confused political and social situation in US.situation in US.

58

Quality NCHS ResourcesQuality NCHS Resources

Linked Birth and Infant Death Data with Linked Birth and Infant Death Data with Fetal Death Data.Fetal Death Data.

Geo-coded NHIS 1986-2003 (2004-2005). Geo-coded NHIS 1986-2003 (2004-2005). Geo-coded NHANES III.Geo-coded NHANES III. Cycles 4, 5, and 6 NSFG Contextual Data.Cycles 4, 5, and 6 NSFG Contextual Data. Linked Data Sets described earlier.Linked Data Sets described earlier.

59

Linked Birth and Infant DeathLinked Birth and Infant Death Designed to study factors in infant death.Designed to study factors in infant death. Links birth and death certificates for deaths under one Links birth and death certificates for deaths under one

year of age. Includes fetal deaths for 1995-1997year of age. Includes fetal deaths for 1995-1997 Years: 1983-1991 and 1995-1997Years: 1983-1991 and 1995-1997 Numerator File (for deceased children): Parental Numerator File (for deceased children): Parental

information and behavior, prenatal care, infant health information and behavior, prenatal care, infant health variables, demographics, cause of death. variables, demographics, cause of death.

Denominator File (for control group): Parental Denominator File (for control group): Parental information and behavior, prenatal heath, infant health, information and behavior, prenatal heath, infant health, demographics.demographics.

Fetal Death Data: 1995-1997Fetal Death Data: 1995-1997 Restricted Data: County/City of mother’s residence or Restricted Data: County/City of mother’s residence or

County of child’s birth or death when under 250,000. County of child’s birth or death when under 250,000. 100,000 starting 1989.100,000 starting 1989.

60

Data ExampleData Example

From the Division of Vital Statistics. From the Division of Vital Statistics. Proposals or questions can go either to the Proposals or questions can go either to the RDC or the DVS.RDC or the DVS.

Fetal Death Data portion. Given 1989-Fetal Death Data portion. Given 1989-1999. 1999.

Linked to county level contextual data.Linked to county level contextual data. Goal to model fetal death with emphasis on Goal to model fetal death with emphasis on

ground water quality. Estimates death rates ground water quality. Estimates death rates for each county.for each county.

61

Geo-Coded NHISGeo-Coded NHIS

National Health Interview Survey. RDC National Health Interview Survey. RDC has access to files from 1963 to present. has access to files from 1963 to present. Previously geo-coded households for Previously geo-coded households for 1986-1994. Recently geo-coded by RDC 1986-1994. Recently geo-coded by RDC from 1995-2003. 2004-2005 coding in from 1995-2003. 2004-2005 coding in progress.progress.

State (2 digits), County (3 digits), Tract (6 State (2 digits), County (3 digits), Tract (6 digits), Block Group (1 digit), and Block (3-digits), Block Group (1 digit), and Block (3-4 digits) levels. Households coded to 4 digits) levels. Households coded to 1990 and 2000 Censuses.1990 and 2000 Censuses.

62

Geo-Coded NHANES IIIGeo-Coded NHANES III NHANES III is also linked to NDI Mortality NHANES III is also linked to NDI Mortality

data.data. NHANES III has been geo-coded twice. NHANES III has been geo-coded twice.

The RDC has done it at the same level of The RDC has done it at the same level of detail as NHIS.detail as NHIS.

Continuous NHANES has not been geo-Continuous NHANES has not been geo-coded yet.coded yet.

Example: Large project with neighborhood, Example: Large project with neighborhood, economic, ethnic, and individual medical economic, ethnic, and individual medical and behavioral variables. Multi-level and behavioral variables. Multi-level models.models.

63

NSFG Contextual DataNSFG Contextual Data

Contextual variables available with Cycles 4, Contextual variables available with Cycles 4, 5, and 6. Supplied for each individual in 5, and 6. Supplied for each individual in sample.sample.

Cycle 6: 1054 contextual variables at the Cycle 6: 1054 contextual variables at the state, county, tract, and block group levels. state, county, tract, and block group levels. For respondent addresses in 2000 and 2002.For respondent addresses in 2000 and 2002.

Contextual data include both economic and Contextual data include both economic and demographic characteristics of locations. demographic characteristics of locations. Easily merged by case ID to individual Easily merged by case ID to individual characteristics, behaviors, and histories.characteristics, behaviors, and histories.

64

Simple NSFG ExampleSimple NSFG Example

A simple example relating economics on A simple example relating economics on state level, ethnicity, and behavior, but not state level, ethnicity, and behavior, but not using contextual variables.using contextual variables.

Treatment States given waiver to offer Treatment States given waiver to offer more family planning services (FPS).more family planning services (FPS).

Questions:Questions: FPS effects on behaviorFPS effects on behavior FPS effect on pregnancy ratesFPS effect on pregnancy rates Differential impacts across demographic Differential impacts across demographic

subgroups?subgroups?

65

Change of Topic: Accessing Change of Topic: Accessing Data Data

On-site access to data at the RDC in On-site access to data at the RDC in Hyattsville.Hyattsville.

Staff-assisted remote access to data via e-Staff-assisted remote access to data via e-mail.mail.

Researchers often use both types of access.Researchers often use both types of access. Potential Designated Agent status. (CIPSEA)Potential Designated Agent status. (CIPSEA) The RDC has put many resources into The RDC has put many resources into

automated remote access.automated remote access.

66

On-Site AccessOn-Site Access Rules in 24 page file Rules in 24 page file GuidelinesRDC11-8-05.pdfGuidelinesRDC11-8-05.pdf

available on-line.available on-line. The RDC and NCHS surveys have knowledgeable The RDC and NCHS surveys have knowledgeable

professional staffs that review proposals carefully. professional staffs that review proposals carefully. Clients can only remove what has been approved. Clients can only remove what has been approved. Checked by staff.Checked by staff.

Exploratory Data Analysis. If needed, ask. Recent Exploratory Data Analysis. If needed, ask. Recent example: Checking general shapes of variables for example: Checking general shapes of variables for model validity. OKed by survey.model validity. OKed by survey.

Modeling needs. Recent example: Nested randomized Modeling needs. Recent example: Nested randomized geo-codes.geo-codes.

Estimation problems. Example: Single PSU in a Estimation problems. Example: Single PSU in a Stratum.Stratum.

67

Staff-Assisted Remote Staff-Assisted Remote AccessAccess

Analysis done through a particular staff member. Analysis done through a particular staff member. Usually efficient, but could be very busy.Usually efficient, but could be very busy.

Staff member determines costs based on time. Staff member determines costs based on time. Staff usually not asked to do much programming.Staff usually not asked to do much programming. Staff creates data, runs e-mailed programs, Staff creates data, runs e-mailed programs,

checks, and returns output to researcher.checks, and returns output to researcher. Staff can do exploratory analysis, if needed.Staff can do exploratory analysis, if needed. Staff can help check modeling problems.Staff can help check modeling problems. Commonly done after on-site visit.Commonly done after on-site visit.

68

Our MissionOur Mission

The RDC has a professional staff The RDC has a professional staff dedicated to helping researchers uncover dedicated to helping researchers uncover knowledge and advance understanding.knowledge and advance understanding.

69

Remote Access SystemRemote Access System

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICESCenters for Disease Control and PreventionNational Center for Health Statistics

Vijay Gambhir

70

Remote Access SystemRemote Access System

Envisioned as an integral Part of RDCEnvisioned as an integral Part of RDC Pre – onsite usagePre – onsite usage Post – onsite usagePost – onsite usage

Super store/ Convenience storeSuper store/ Convenience store

71

Basics of Remote Access SystemBasics of Remote Access System

Object oriented, event driven system Object oriented, event driven system based upon the principles of distributed based upon the principles of distributed computingcomputing

About two years of development effortsAbout two years of development efforts Set of applications called in service by Set of applications called in service by

resident componentresident component Advanced pattern recognition techniquesAdvanced pattern recognition techniques

72

Analytic Data Research by Email Analytic Data Research by Email (ANDRE)(ANDRE)

NCHS has been providing remote data NCHS has been providing remote data access to researchers through ANDRE access to researchers through ANDRE since April 1998.since April 1998.

In the past five years, ANDRE has served In the past five years, ANDRE has served 45 different data analysts and executed 45 different data analysts and executed over 9,500 SAS programs for their over 9,500 SAS programs for their research programs.research programs.

73

Main Features of ANDREMain Features of ANDRE

Completely automated systemCompletely automated system Operates round the clock Operates round the clock

without any human interventionwithout any human intervention Registered subscribers onlyRegistered subscribers only

Proposals already reviewed and approvedProposals already reviewed and approved Have an agreement with NCHS/RDCHave an agreement with NCHS/RDC

Unlimited Access during the subscription Unlimited Access during the subscription periodperiod

74

Data RequestsData Requests

Registered user can submit data requests Registered user can submit data requests by email from anywhere and at any time.by email from anywhere and at any time.

Results of the data request released to a Results of the data request released to a specified email address that has been specified email address that has been certified as secure by the subscriber and certified as secure by the subscriber and approved by NCHS/RDC.approved by NCHS/RDC.

75

AuthenticationAuthentication

Multi-levels of system security:Multi-levels of system security: Submission syntaxSubmission syntax User idUser id PasswordPassword Email/code wordEmail/code word PackagePackage Path infoPath info

76

Data Request AnalysisData Request Analysis

Compliance with the disclosure limitation Compliance with the disclosure limitation constraints of NCHSconstraints of NCHS

Integrity of the systemIntegrity of the system Resource constraints (CPU time & Storage Resource constraints (CPU time & Storage

requirements)requirements) Protection of ANDRE’s work environmentProtection of ANDRE’s work environment

77

Prevention of Direct Prevention of Direct DisclosureDisclosure

Cleaning up of the Log FileCleaning up of the Log File Categorization of SAS commands/wordsCategorization of SAS commands/words

Forbidden CommandsForbidden Commands Modifications to the CommandsModifications to the Commands Output suppressionOutput suppression

78

Sample: Original LogSample: Original Log1 options nocenter; 1 options nocenter; 2 Data one; 2 Data one; 3 Infile 'd:\nchs\respnd95.dat' lrecl=13064; 3 Infile 'd:\nchs\respnd95.dat' lrecl=13064; 4 Input4 Input5 TODAYSPG 6847-68475 TODAYSPG 6847-68476 CONSTAT1 11934-119356 CONSTAT1 11934-119357 CONSTAT2 11936-119377 CONSTAT2 11936-119378 CONSTAT3 11938-119398 CONSTAT3 11938-119399 CONSTAT4 11940-119419 CONSTAT4 11940-1194110 SEX1MTHD 11945-1194610 SEX1MTHD 11945-1194611 POST_WT 12350-12359; 11 POST_WT 12350-12359; 12 if constat1 = 'ab' then vjvar=1; else vjvar = 2; 12 if constat1 = 'ab' then vjvar=1; else vjvar = 2; 13 WGT1000=POST_WT/1000; 13 WGT1000=POST_WT/1000; 14 title 'NSFG cycle 1995'; 14 title 'NSFG cycle 1995'; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 12:15 12:15 NOTE: The infile 'd:\nchs\respnd95.dat' is:NOTE: The infile 'd:\nchs\respnd95.dat' is: File Name=d:\nchs\respnd95.dat, File Name=d:\nchs\respnd95.dat, RECFM=V,LRECL=13064RECFM=V,LRECL=13064

NOTE: Invalid numeric data, 'ab' , at line 12 column 15. NOTE: Invalid numeric data, 'ab' , at line 12 column 15. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----01 1000000111260837511521 1 1050 12 1069211241124111891 1000000111260837511521 1 1050 12 106921124112411189 101 2101 2 201 19211059110611197201 19211059110611197

…………

79

Sample: Original Log (cont.)Sample: Original Log (cont.)…………

12901 11232521101 05267213103033921811931011103 12901 11232521101 05267213103033921811931011103 01030000000321120000392702210611511200403 1344 131601030000000321120000392702210611511200403 1344 1316

13001 622501001006034 13001 622501001006034 TODAYSPG=1 CONSTAT1=5 CONSTAT2=88 CONSTAT3=88 CONSTAT4=88 SEX1MTHD=1 TODAYSPG=1 CONSTAT1=5 CONSTAT2=88 CONSTAT3=88 CONSTAT4=88 SEX1MTHD=1

POST_WT=2545.7569 vjvar=2 WGT1000=2.5457569 _ERROR_=1POST_WT=2545.7569 vjvar=2 WGT1000=2.5457569 _ERROR_=1_N_=20_N_=20NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'. NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'. The minimum record length was 13064. The minimum record length was 13064. The maximum record length was 13064. The maximum record length was 13064. NOTE: The data set WORK.ONE has 10847 observations and 9 variables. NOTE: The data set WORK.ONE has 10847 observations and 9 variables. NOTE: DATA statement used:NOTE: DATA statement used: real time 39.88 secondsreal time 39.88 seconds cpu time 12.10 secondscpu time 12.10 seconds 15 proc freq;15 proc freq;16 tables CONSTAT1 vjvar; 16 tables CONSTAT1 vjvar; 17 run; 17 run; NOTE: There were 10847 observations read from the data set WORK.ONE. NOTE: There were 10847 observations read from the data set WORK.ONE. NOTE: PROCEDURE FREQ used:NOTE: PROCEDURE FREQ used: real time 0.49 secondsreal time 0.49 seconds cpu time 0.04 secondscpu time 0.04 seconds

80

Sample: Cleaned LogSample: Cleaned Log1 options nocenter; 1 options nocenter; 2 Data one;2 Data one;3 Infile 'd:\nchs\respnd95.dat' lrecl=13064; 3 Infile 'd:\nchs\respnd95.dat' lrecl=13064; 4 Input 4 Input 5 TODAYSPG 6847-6847 5 TODAYSPG 6847-6847 6 CONSTAT1 11934-11935 6 CONSTAT1 11934-11935 7 CONSTAT2 11936-119377 CONSTAT2 11936-119378 CONSTAT3 11938-119398 CONSTAT3 11938-119399 CONSTAT4 11940-11941 9 CONSTAT4 11940-11941 10 SEX1MTHD 11945-1194610 SEX1MTHD 11945-1194611 POST_WT 12350-12359; 11 POST_WT 12350-12359; 12 if constat1 = 'ab' then vjvar=1; else vjvar = 2; 12 if constat1 = 'ab' then vjvar=1; else vjvar = 2; 13 WGT1000=POST_WT/1000; 13 WGT1000=POST_WT/1000; 14 title 'NSFG cycle 1995'; 14 title 'NSFG cycle 1995'; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 12:15 12:15 NOTE: The infile 'd:\nchs\respnd95.dat' is: NOTE: The infile 'd:\nchs\respnd95.dat' is: File Name=d:\nchs\respnd95.dat, File Name=d:\nchs\respnd95.dat, RECFM=V,LRECL=13064 RECFM=V,LRECL=13064

NOTE: Invalid numeric data, 'ab' , at line 12 column 15. NOTE: Invalid numeric data, 'ab' , at line 12 column 15.

81

Sample: Cleaned Log (cont.)Sample: Cleaned Log (cont.)NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'. NOTE: 10847 records were read from the infile 'd:\nchs\respnd95.dat'.

The minimum record length was 13064. The minimum record length was 13064.

The maximum record length was 13064. The maximum record length was 13064.

NOTE: The data set WORK.ONE has 10847 observations and 9 variables. NOTE: The data set WORK.ONE has 10847 observations and 9 variables.

NOTE: DATA statement used: NOTE: DATA statement used:

real time 39.88 seconds real time 39.88 seconds

cpu time 12.10 seconds cpu time 12.10 seconds

15 proc freq; 15 proc freq;

16 tables CONSTAT1 vjvar; 16 tables CONSTAT1 vjvar;

17 run; 17 run;

NOTE: There were 10847 observations read from the data set WORK.ONE. NOTE: There were 10847 observations read from the data set WORK.ONE.

NOTE: PROCEDURE FREQ used: NOTE: PROCEDURE FREQ used:

real time 0.49 seconds real time 0.49 seconds

cpu time 0.04 seconds cpu time 0.04 seconds

82

Forbidden CommandsForbidden Commands Commands That Pose Unacceptable Disclosure Commands That Pose Unacceptable Disclosure

RisksRisksOROR

Disallowed to Protect Integrity/Internal Environment Disallowed to Protect Integrity/Internal Environment of ANDREof ANDRE

AddAdd firstobs firstobs report report iml imlPrintPrint first. first. Pctn Pctn nofreq nofreqObsObs last. last. Pctsum Pctsum nocum nocumFirstobsFirstobs nocol nocol tabulate tabulate editor editorBrowseBrowse summary summary list list put put

83

Commands ModificationCommands Modification

Modify user’s program to enforce Modify user’s program to enforce restrictions on options allowed with certain restrictions on options allowed with certain SAS procedures to prevent objectionable SAS procedures to prevent objectionable info appearing in the outputinfo appearing in the output

PROC MEANS n mean std;PROC MEANS n mean std;

84

Output SuppressionOutput Suppression

Wiping out of extreme values from the Wiping out of extreme values from the output of Proc Univariate.output of Proc Univariate.

Suppressing complete output line (Procs Suppressing complete output line (Procs Means, corr, Univariate, etc) where Means, corr, Univariate, etc) where sample size less than the minimum sample size less than the minimum acceptable value.acceptable value.

85

Proc Means SuppressionProc Means SuppressionThe MEANS ProcedureThe MEANS Procedure

Variable Label N Mean Std DevVariable Label N Mean Std Dev----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------EXPEND_R Current expend/pupil in public schl/1000 5424 5.0830820 1.3958710EXPEND_R Current expend/pupil in public schl/1000 5424 5.0830820 1.3958710 *** Values Suppressed ****** Values Suppressed ***RPUB87 exp. for contr. serv. and supplies 1997$ 5424 23472052.60 18806802.86RPUB87 exp. for contr. serv. and supplies 1997$ 5424 23472052.60 18806802.86RPUB92 exp. for contr. serv. and supplies 1997$ 5424 34800922.98 30481634.59RPUB92 exp. for contr. serv. and supplies 1997$ 5424 34800922.98 30481634.59PRGPRO Coordinated Pregnancy Prevention Program 1708 0.0679157 0.2516749PRGPRO Coordinated Pregnancy Prevention Program 1708 0.0679157 0.2516749HIVED HIV/AIDS Education 1708 3.5146370 0.8044378HIVED HIV/AIDS Education 1708 3.5146370 0.8044378 *** Values Suppressed ****** Values Suppressed ***PRGPRO87 Coordinated Pregnancy Prevention Program 5424 0.0540192 0.2260764PRGPRO87 Coordinated Pregnancy Prevention Program 5424 0.0540192 0.2260764HIVED87 HIV/AIDS Education 5424 3.4968658 0.8008324HIVED87 HIV/AIDS Education 5424 3.4968658 0.8008324WT_PER15 % Wt females aged 15-19/total 15-19 5424 0.7279681 0.1265796WT_PER15 % Wt females aged 15-19/total 15-19 5424 0.7279681 0.1265796BK_PER15 % Bk females aged 15-19/total 15-19 5424 0.1409869 0.0932332BK_PER15 % Bk females aged 15-19/total 15-19 5424 0.1409869 0.0932332HS_PER15 % Hs females aged 15-19/total 15-19 5424 0.0962413 0.1055191HS_PER15 % Hs females aged 15-19/total 15-19 5424 0.0962413 0.1055191TEENMMC2 Teenmom by cohort (1,2,3r) 1201 1.7119067 0.7715351TEENMMC2 Teenmom by cohort (1,2,3r) 1201 1.7119067 0.7715351C18_2_1S R in C2 (vs 1) at 18-19 endpt (1,2) 1770 1.5248588 0.4995228C18_2_1S R in C2 (vs 1) at 18-19 endpt (1,2) 1770 1.5248588 0.4995228TM2_1S18 R tnmm in Coh 2 (vs 1)-age 18 @ ext 358 1.4804469 0.5003168TM2_1S18 R tnmm in Coh 2 (vs 1)-age 18 @ ext 358 1.4804469 0.5003168AGE_12 Date R = 12 in century months 6450 979.5613953 69.3124265AGE_12 Date R = 12 in century months 6450 979.5613953 69.3124265STRTST IA5 Date R started living in current sta 3870 1132.55 753.2066507STRTST IA5 Date R started living in current sta 3870 1132.55 753.2066507BDAYCENM R date of birth 6450 835.5613953 69.3124265BDAYCENM R date of birth 6450 835.5613953 69.3124265RAVPAY95 real av. an. pay 95 dollars 5424 26933.93 2826.80RAVPAY95 real av. an. pay 95 dollars 5424 26933.93 2826.80PERCAFDC percent of households receiving AFDC 5424 0.0422254 0.0127307PERCAFDC percent of households receiving AFDC 5424 0.0422254 0.0127307SALARY teacher salaries real 96-97$$$ 5424 35338.66 5729.11SALARY teacher salaries real 96-97$$$ 5424 35338.66 5729.11----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

86

Proc Univariate OutputProc Univariate OutputUnsuppressedUnsuppressed

The SAS System 9The SAS System 9 14:09 Sunday, October 24, 1999 14:09 Sunday, October 24, 1999 Univariate Procedure Univariate Procedure Variable=AVHRATET Variable=AVHRATET Moments Quantiles(Def=5) Moments Quantiles(Def=5) N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008 N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008 Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588 Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588 Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152 Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152 Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639 Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639 USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645 USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645 CV -40.5821 Std Mean 0.039598 1% -12.7402 CV -40.5821 Std Mean 0.039598 1% -12.7402 T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321 T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321 Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949 Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949 M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463 M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463 Sgn Rank -1303593 Pr>=|S| 0.0001 Sgn Rank -1303593 Pr>=|S| 0.0001 Extremes Extremes Lowest Obs Highest Obs Lowest Obs Highest Obs -13.5463( 1547) -0.90519( 649) -13.5463( 1547) -0.90519( 649) -13.5397( 1836) -0.81756( 1094) -13.5397( 1836) -0.81756( 1094) -13.4637( 2084) -0.76928( 1739) -13.4637( 2084) -0.76928( 1739) -13.4413( 1127) -0.5907( 21) -13.4413( 1127) -0.5907( 21) -13.4402( 1088) -0.25314( 400) -13.4402( 1088) -0.25314( 400)

87

Proc Univariate OutputProc Univariate OutputSuppressedSuppressed

The SAS System 9The SAS System 9 14:09 Sunday, October 24, 1999 14:09 Sunday, October 24, 1999 Univariate Procedure Univariate Procedure Variable=AVHRATET Variable=AVHRATET Moments Quantiles(Def=5) Moments Quantiles(Def=5) N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008 N 2283 Sum Wgts 2283 100% Max -0.25314 99% -1.62008 Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588 Mean -4.66219 Sum -10643.8 75% Q3 -3.56179 95% -2.37588 Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152 Std Dev 1.892017 Variance 3.57973 50% Med -4.50491 90% -2.79152 Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639 Skewness -2.11919 Kurtosis 6.892929 25% Q1 -5.30374 10% -6.07639 USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645 USS 57792.36 CSS 8168.944 0% Min -13.5463 5% -7.19645 CV -40.5821 Std Mean 0.039598 1% -12.7402 CV -40.5821 Std Mean 0.039598 1% -12.7402 T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321 T:Mean=0 -117.738 Pr>|T| 0.0001 Range 13.29321 Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949 Num ^= 0 2283 Num > 0 0 Q3-Q1 1.741949 M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463 M(Sign) -1141.5 Pr>=|M| 0.0001 Mode -13.5463 Sgn Rank -1303593 Pr>=|S| 0.0001 Sgn Rank -1303593 Pr>=|S| 0.0001

88

Proc Univariate OutputProc Univariate OutputSuppressed (sample size = 1)Suppressed (sample size = 1)

Univariate Procedure Univariate Procedure

Variable=FREQ (sum) freq Variable=FREQ (sum) freq

Moments Quantiles(Def=5) Moments Quantiles(Def=5)

Serious Disclosure limitation ViolationsSerious Disclosure limitation Violations

Values too low to releaseValues too low to release

Output of Proc Univariate withheldOutput of Proc Univariate withheld

89

Proc Freq SuppressionProc Freq Suppression (One-Way Tables) (One-Way Tables)

Suppress at least two consecutive rows to Suppress at least two consecutive rows to prevent derivation of suppressed values prevent derivation of suppressed values from cumulative totals.from cumulative totals.

Disallow single row output.Disallow single row output.

90

One-Way Freq TableOne-Way Freq TableSuppressedSuppressed

Cumulative Cumulative Cumulative Cumulative LOGRNTOPAT Frequency Percent Frequency PercentLOGRNTOPAT Frequency Percent Frequency Percent ---------------------------------------------------------------------------------------------------------------------------------- 0.2277839309 ????? ????? ????? ?????0.2277839309 ????? ????? ????? ????? 0.2277839309 ????? ????? ????? ?????0.2277839309 ????? ????? ????? ????? 0.2305236586 5 0.08 6429 97.99 0.2305236586 5 0.08 6429 97.99 0.231111721 5 0.08 6434 98.06 0.231111721 5 0.08 6434 98.06 0.232058915 ????? ????? ????? ?????0.232058915 ????? ????? ????? ????? 0.232058915 ????? ????? ????? ?????0.232058915 ????? ????? ????? ????? 0.2436220827 ????? ????? ????? ?????0.2436220827 ????? ????? ????? ????? 0.2436220827 ????? ????? ????? ?????0.2436220827 ????? ????? ????? ????? 0.2498117984 6 0.09 6456 98.40 0.2498117984 6 0.09 6456 98.40 0.2504106777 6 0.09 6462 98.49 0.2504106777 6 0.09 6462 98.49 0.2513144283 18 0.27 6480 98.77 0.2513144283 18 0.27 6480 98.77 0.2595111955 6 0.09 6486 98.86 0.2595111955 6 0.09 6486 98.86 0.2670627852 ????? ????? ????? ?????0.2670627852 ????? ????? ????? ????? 0.2670627852 ????? ????? ????? ?????0.2670627852 ????? ????? ????? ????? 0.2736958305 5 0.08 6500 99.07 0.2736958305 5 0.08 6500 99.07 0.2814124594 5 0.08 6505 99.15 0.2814124594 5 0.08 6505 99.15 0.3022808719 6 0.09 6511 99.24 0.3022808719 6 0.09 6511 99.24 0.3364722366 10 0.15 6521 99.39 0.3364722366 10 0.15 6521 99.39

91

One-Way Freq TableOne-Way Freq Tablesuppressed (cont.)suppressed (cont.)

Cumulative Cumulative Cumulative Cumulative

LOGRNTOPAT Frequency Percent Frequency PercentLOGRNTOPAT Frequency Percent Frequency Percent

----------------------------------------------------------------------------------------------------------------------------------

0.3403258059 ????? ????? ????? ?????0.3403258059 ????? ????? ????? ?????

0.3403258059 ????? ????? ????? ?????0.3403258059 ????? ????? ????? ?????

0.3715635564 6 0.09 6537 99.63 0.3715635564 6 0.09 6537 99.63

0.3856624808 ????? ????? ????? ?????0.3856624808 ????? ????? ????? ?????

0.3856624808 ????? ????? ????? ?????0.3856624808 ????? ????? ????? ?????

0.6931471806 6 0.09 6550 99.83 0.6931471806 6 0.09 6550 99.83

1.2527629685 ????? ????? ????? ?????1.2527629685 ????? ????? ????? ?????

1.2527629685 ????? ????? ????? ?????1.2527629685 ????? ????? ????? ?????

1.2527629685 ????? ????? ????? ?????1.2527629685 ????? ????? ????? ?????

92

Proc Freq SuppressionProc Freq Suppression (Two-way Tables) (Two-way Tables)

Rows and columns totals preservedRows and columns totals preserved Cells with values less than the acceptable Cells with values less than the acceptable

minimum are suppressedminimum are suppressed Additional suppressions to ensure that no Additional suppressions to ensure that no

row and no column has single row and no column has single suppression.suppression.

Logical stitching of horizontal and vertical Logical stitching of horizontal and vertical splits.splits.

93

Proc Freq: Two-way Tables Proc Freq: Two-way Tables SuppressionSuppression

TABLE OF FAMREL BY FAMSIZER TABLE OF FAMREL BY FAMSIZER FAMREL FAMSIZER FAMREL FAMSIZER Frequency| Frequency| Percent | Percent | Row Pct | Row Pct | Col Pct | 2| 3| 4| 5| Total Col Pct | 2| 3| 4| 5| Total ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 3 | 94 | 388 | 792 | 533 | 2206 3 | 94 | 388 | 792 | 533 | 2206 | 3.97 | 16.40 | 33.47 | 22.53 | 93.24 | 3.97 | 16.40 | 33.47 | 22.53 | 93.24 | 4.26 | 17.59 | 35.90 | 24.16 | | 4.26 | 17.59 | 35.90 | 24.16 | | 98.95 | 96.28 | 96.12 | 94.34 | | 98.95 | 96.28 | 96.12 | 94.34 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 4 | ?????? | 9 | 22 | 27 | 104 4 | ?????? | 9 | 22 | 27 | 104 | ?????? | 0.38 | 0.93 | 1.14 | 4.40 | ?????? | 0.38 | 0.93 | 1.14 | 4.40 | ?????? | 8.65 | 21.15 | 25.96 | | ?????? | 8.65 | 21.15 | 25.96 | | ?????? | 2.23 | 2.67 | 4.78 | | ?????? | 2.23 | 2.67 | 4.78 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 6 | ?????? | 6 | 10 | 5 | 56 6 | ?????? | 6 | 10 | 5 | 56 | ?????? | 0.25 | 0.42 | 0.21 | 2.37 | ?????? | 0.25 | 0.42 | 0.21 | 2.37 | ?????? | 10.71 | 17.86 | 8.93 | | ?????? | 10.71 | 17.86 | 8.93 | | ?????? | 1.49 | 1.21 | 0.88 | | ?????? | 1.49 | 1.21 | 0.88 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ Total 95 403 824 565 2366 Total 95 403 824 565 2366 4.02 17.03 34.83 23.88 100.00 4.02 17.03 34.83 23.88 100.00 (Continued) (Continued)

94

Proc Freq: Two-way Tables Proc Freq: Two-way Tables Suppression (Cont.)Suppression (Cont.)

checking frequencies 4checking frequencies 4 12:01 Thursday, May 6, 1999 12:01 Thursday, May 6, 1999 TABLE OF FAMREL BY FAMSIZER TABLE OF FAMREL BY FAMSIZER FAMREL FAMSIZER FAMREL FAMSIZER Frequency| Frequency| Percent | Percent | Row Pct | Row Pct | Col Pct | 6| 7| 8| 9| Total Col Pct | 6| 7| 8| 9| Total ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 3 | 209 | 98 | 19 | 73 | 2206 3 | 209 | 98 | 19 | 73 | 2206 | 8.83 | 4.14 | 0.80 | 3.09 | 93.24 | 8.83 | 4.14 | 0.80 | 3.09 | 93.24 | 9.47 | 4.44 | 0.86 | 3.31 | | 9.47 | 4.44 | 0.86 | 3.31 | | 90.48 | 83.05 | 59.38 | 74.49 | | 90.48 | 83.05 | 59.38 | 74.49 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 4 | 13 | 10 | ?????? | 12 | 104 4 | 13 | 10 | ?????? | 12 | 104 | 0.55 | 0.42 | ?????? | 0.51 | 4.40 | 0.55 | 0.42 | ?????? | 0.51 | 4.40 | 12.50 | 9.62 | ?????? | 11.54 | | 12.50 | 9.62 | ?????? | 11.54 | | 5.63 | 8.47 | ?????? | 12.24 | | 5.63 | 8.47 | ?????? | 12.24 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ 6 | 9 | 10 | ?????? | 13 | 56 6 | 9 | 10 | ?????? | 13 | 56 | 0.38 | 0.42 | ?????? | 0.55 | 2.37 | 0.38 | 0.42 | ?????? | 0.55 | 2.37 | 16.07 | 17.86 | ?????? | 23.21 | | 16.07 | 17.86 | ?????? | 23.21 | | 3.90 | 8.47 | ?????? | 13.27 | | 3.90 | 8.47 | ?????? | 13.27 | ---------+--------+--------+--------+--------+ ---------+--------+--------+--------+--------+ Total 231 118 32 98 2366 Total 231 118 32 98 2366 9.76 4.99 1.35 4.14 100.00 9.76 4.99 1.35 4.14 100.00

95

Fully Automated and Expert Fully Automated and Expert system?system?

Fully automated? Fully automated? Reboot to deal with memory leakage.Reboot to deal with memory leakage.

Confidentiality Expert? How reliable? Confidentiality Expert? How reliable? As good as underlying algorithms. Needs As good as underlying algorithms. Needs

constant monitoring constant monitoring

96

97

98

99

100

101

What is new?What is new?

Improved and expanded hardware Improved and expanded hardware platformplatform

Two machines dedicated to heavy remote Two machines dedicated to heavy remote access usageaccess usage

Three additional machines dedicated to Three additional machines dedicated to general remote access usagegeneral remote access usage

102

What is New?What is New?

Sudaan now available to remote access Sudaan now available to remote access usersusers

Proc CrosstabProc Crosstab Proc RlogistProc Rlogist Proc RegressProc Regress Proc MultilogProc Multilog Proc SurvivalProc Survival

103

What is newWhat is new

Proc DescriptProc Descript Other new Sudaan procedures will be Other new Sudaan procedures will be

made available shortlymade available shortly Plans to make Stata available through Plans to make Stata available through

remote accessremote access

104

What is newWhat is new

Web Component of ANDRE under Web Component of ANDRE under construction.construction.

On-line scanning of users’ codeOn-line scanning of users’ code Valuable research tools and information Valuable research tools and information

readily available to the users.readily available to the users.

105

Contact InformationContact Information

For general Questions/CommentsFor general Questions/CommentsEmail: Email: [email protected] Phone: (301) 458-4732 Phone: (301) 458-4732

For On-site Info:For On-site Info:Email: Email: [email protected] Phone: (301) 458-4097 Phone: (301) 458-4097

For Remote Access Info:For Remote Access Info:Email: Email: [email protected] Phone: (301) 458-4226 Phone: (301) 458-4226