from health research to social research: privacy, methods, approaches leslie l roos distinguished...
TRANSCRIPT
From Health Research to Social Research: Privacy, Methods, Approaches
Leslie L Roos
Distinguished ProfessorDepartment of Community Health
SciencesManitoba Centre For Health Policy
For Comparison: Panel Study of Income Dynamics (PSID)
• Compare administrative data with the only social science study noted on “the NSF’s list of its fifty most significant projects in its fifty year history”
• Highlights issues in extending population-based (health) administrative data to facilitate social research
Information-Rich Environment: Using Administrative Data
ResearchRegistry
Medical
VitalStatistics
Home Care NursingHome
Hospital
ProviderPharmaceutical
Immunization Monitoring
Inflammatory Bowel Disease Database
Sleep Lab Clinical Data
Alcoholism Panel Surveys
Cancer Registry Educational and Social Data
Heart Health Survey
Aging in Manitoba Study
National Population Health
Survey
Table 1. Comparing Longitudinal Primary Data and Population-Based Administrative Data
CharacteristicsLongitudinal Primary Data (Panel Study in Income Dynamics)
Population-based Administrative Data (Manitoba and other sites)
Number of cases Several thousand or smaller (5,000 households in PSID)
Often over a million
Cost High on a per person basis Very low on a per person basis
Representativeness Often national Often from a province or state
Population Studied Subjects sampled and tracked Built on a registry of an entire population
Research Design Often complex designs need to increase power and control costs
Given a population, complex designs can be imposed retrospectively as needed
Record Linkage Useful in some contexts Critical to check data quality and expand scope of information sources
Individual Follow-up Before and after an event Before and after an event
Coverage and Loss to Follow-up
Nonresponse and differential attrition possible Differential attrition possible
Updating New data must be collected and merged with existing data
Multifile information must be cleaned and merged with existing data. Cleaning relies on record linkage
Time Information must be collected (typically annually or at longer intervals)
Information provided at relatively short intervals (from daily to annually)
Place Information at time of study (historical reconstruction possible)
Detailed information usually provided close to date of move
Longitudinal Goes back many years/Corrections for immigration in PSID)
Goes back many years
Neighborhoods Flexible construction from postal code or census area
Flexible construction from postal code or census area; large N may permit flexible assignment to generate nearest 'neighbors'
Life Events Collected as part of design Possibly available from registry or other sources
Family and Intergenerational Data
Collected as part of design (PSID); sibling and intergenerational studies facilitated
Assessing a family composition at any point in time possible; sibling and twin studies facilitated
Limits Important information likely to be collectable for entire sample
Important information may be missing or available only for a subpopulation
Measures Defined by researchers; scaling possible Defined by others for administrative purposes; creating meaningful variables may be very time-consuming; scaling possible
Intellectual History Scope of data collection often expanded to provide a rich data set
Scope expanded beyond initial health care data by receiving files from other agencies
Expanding Capabilities
a) Link files to incorporate new data sets while preserving privacy and confidentiality
b) Measure such outcomes as educational achievement at population level
Moving from health research to social research involves preparatory work
with databases, which must be organized to:
Expanding Capabilities (cont’d)
c) Use place of residence data (for any point in time) to calculate the number of moves, number of years in relatively poor neighborhoods, etc.
d) Estimating family composition at any point in time (tracking marital status, family size, ages of siblings and twins). Social variables used for more powerful research designs.
• Canada Foundation for Innovation and provincially funded Data Laboratory
• Highest standards of security, privacy & confidentiality of data
• No names, no addresses
• Probabilistic record linkages across files as needed
• Data for research, not for administrative use
• Provincial privacy offices kept fully informed
Respect for Privacy In Building Manitoba
Database
Linking while Preserving Privacy
• Multi stage de-identification process:– Trustee preparation– Manitoba health links Trustee’s file to
the encrypted PHIN– Crosswalk file provided to MCHP – Trustee provides data file– MCHP stores data separately and
unlinked
Currently Available: Measures to Understand Well
BeingAvailable for population for one or more years:
• Age, grade level, school attendance, marks, achievement tests for Grade 3, Grade 9, Grade 12
• Well-identified health conditions such as asthma and diabetes plus a measure of general health status can be studied through childhood and adolescence
• Healthcare utilization and costs• Receipt of social assistance
Not Available
• Survey-based measure of household income and education are not available for entire population
• Manitoba comparisons of income measure found risk estimates for health status measures derived from neighborhood income not attenuated relative to those from household income
• May not be true everywhere
Small Area Measures from Statistics Canada
• Publicly available• Use dissemination/enumeration areas• Often encompass several six digit postal
code areas• Indicators such as mean household
income and education, unemployment rates, etc.
• Canadian studies of the socioeconomic gradient in utilization often use small area markers
Steps for comparing educational achievement across geographical
areas parallel those developed for health research
1) Obtaining indicators of socioeconomic risk for each small area
2) Developing individual data on educational achievement
3) Calculate rates of achievement based on small areas
4) Track the entire population5) Combine census enumeration areas as
appropriate (for quartiles or quintiles based on socioeconomic status of area of residence)
Data Used
• Up to nine years of Manitoba birth cohorts (1978-1987, excluding the 1983 cohort)
• Takes advantage of available provincial Grade 12 test scores
• About 13,000 in each cohort surviving and remaining in the province for the first 18 years of life
• Large sample for same-sex pairs (33,000) and multiple births (2,000)
Population-Based Data Provide a
Different Perspective
Grade 12 (S4) Performance, by Winnipeg SES Group, Language Arts Standards
Test, 2001/02
75%83%
92%87%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Low Low-Mid Middle High
SES
Pass/Fail rates of test writers 18 year olds who should have written
27%
52%
65%
77%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Low Low-Mid Middle High
SES
Withdrawn
In Grade 11(S3) or lower
In Grade 12(S4), but noLA Test Mark
Drop Course,Absent,Exempt, Incomplete
Fail
Pass
Estimating Family Circumstances
• Number of children in the family• Birth order• Mother’s marital status at birth of first
child• Age of mother at birth of first child• Whether or not family was receiving
income assistance• Number of changes of residence
Other Social Variables
• Number of years in different types of families
• Number of “family structure” changes (parental separations, remarriages)
• Number of years living with disabled parent
• NOTE: Almost all variables can be measured at different points in time
Importance of Social Variables
• The first six social variables (i.e. number of children, birth order, etc.) were better predictors of educational achievement than survey model (includes household income and parental education) from PSID
• Predictive power varies with outcome selected (much lower for health outcomes)
• Used as control variables in sibling and sibling/neighbourhood research
Emergence of the Socioeconomic Gradient
Association between socioeconomic status and health may vary at different life stages
Questions include:
1) How do socioeconomic gradients evolve for both males and females as children grow older?2) What can we learn about gradient development by looking at the trajectories of individuals over time?3) Are socioeconomic gradients in child health changing over time?
Problems with Non-Experimental Design
• Analyzing casual relationship is difficult
• Omitted variables and measurement error are likely to bias the coefficients attached to measured variables
• Standard statistical analyses suffer from this problem (a growing literature in economics on this issue)
• Whole population, sibling and twin studies each have different strengths and weaknesses
• Siblings and twin designs ‘control’ differently often using ‘family fixed effects’ statistical models
Problems with Non-Experimental Design
Literature Review
• Relationship between birth weight and infant mortality decreases when differences between twins examined
• Twin samples help eliminate unobserved heterogeneity across families BUT twins generally of lower birth weight than singletons
• Gestational length not examined in twin studies
• Canadian data provide uniform access to health insurance whereas coverage in U.S. may vary even among siblings
Approach One: Favourite Economist Tactic
• Summarizing cross-family variation (each family as a separate condition) provides estimate of effects of presumably causal individual variables (such as birth weight) by differencing out family-specific characteristics affecting all children.
• Comparisons of siblings within families help make up for lack of control over variables measuring household income and parental education
Comparing Twin and Sibling Estimates
• Exploits benefits of both identification strategies
• Siblings provide a more representative sample but strategy risks biased estimates:A) Potential change in parental investment following the birth of first childB) Potential change in socioeconomic status
between births
• Using twins eliminates these potential biases but sample is limited and unrepresentative
• Patterns of postnatal development (as a function of infant health) may differ for siblings and for twins h could affect results
Results
• Infant health strongly predicted both high school completion and income assistance take-up and length, controlling for a number of possible cofounders
• Long-term consequences of infant health were found across families, within siblings, and within twin pairs.
Measuring Infant Health: Five Models*
1) Ordinary Least Squares (OLS) using entire sample
2) OLS using sample of children with siblings3) Sibling sample including family fixed
effects4) OLS using sample of twins5) Twin sample including family fixed effects
*Looked at longer-term measures of child health and social outcomes
Approach Two:Sibling Neighbourhood Design
A) Compare correlations for siblings and unrelated neighbours of similar ages
B) Use to identify an “upper bound on the influence of neighbourhoods, because neighbour correlations reflect similar family backgrounds as well as shared community backgrounds”
C) Extend this approach to consider schoolmates and peers
Study Opportunity Structures
• Urban-rural comparisons
• Socio-economic gradient (comparing means and correlations by income)
• Comparisons across dimensions of well being
Sibling Neighbourhood Design
• Uses location of each family in a 6 digit postal code area
Matches a) with neighbour in postal code area, then b) those ‘leftover’, with closest ‘leftover’ in
census enumeration area (neat linear program)
Figure 1. Sibling - Neighbourhood Designs
Neighbourhood A
Family 1Sibling 1aSibling 1b
Family 2Sibling 2aSibling 2b
Neighbourhood B
Family 3Sibling 3aSibling 3b
Family 4Sibling 4aSibling 4b
Sibling and Neighbour Correlations Girls Outside Winnipeg:
1978-1987 Cohorts (excluding 1983)Pregnancy Before Age 19 Sibling (Unadjusted) Neighbour (Unadjusted)Neighbour (Adjusted)
.576*
.368*
.120*
Income Assistance After Age 18 ( 1978-1982) Sibling (Unadjusted) Neighbour (Unadjusted)Neighbour (Adjusted)
.612*
.174*
.000*
Held back by Grade 12 Sibling (Unadjusted) Neighbour (Unadjusted)Neighbour (Adjusted)
.675*
.369*
.111*
A Guaranteed Annual Income Experiment
• A population-based guaranteed annual income experiment in Dauphin, MB from 1974 to 1978 substantially reduced poverty.
To study long-term effects: • Propensity matching created a 3-to-1 control
for all Dauphin and rural municipality residents.
• A second set of controls age-and sex-matched residents living in the test site before the experiment
Research Questions
1) Did the elimination of income insecurity when children were particularly vulnerable affect their lives after the experiment ended?
2) Will young children in families receiving assistance have experienced better health and social outcomes as adolescents?
Provincial Centres:
1) Anchored in a ‘place’2) Feedback to ministries/data suppliers
essential3) Deliverables negotiated4) “Smiling persistence” to obtain data
sets
From Health Research to Social Research
1) Have reviewed issues of measurement, data organization, and analytical strategy
2) Expanded ability to examine outcomes through different stages in the life course