using the social network data from add health sunbelt social networks conference february 13, 2001...
TRANSCRIPT
Using the Social Network Data From Add Health
Sunbelt Social Networks ConferenceFebruary 13, 2001
New Orleans
James Moody
•Introduction: What and Why•Background to Add Health •Levels of Network Data•Composition & Pattern•Networks on both sides of the equation
•Network Data structures•Adjacency Matrices•Adjacency Lists
•Network Data in Add Health•In School Friendship Nominations•In Home Friendship Nominations
•Constructing Networks•Total Networks•Local Networks•Peer Groups
•Analyses Using Networks•Networks as dependent variables•Networks as independent variables
History of the National Longitudinal Survey of Adolescent Health*better known as Add Health.
* a program project designed by J. Richard Udry and Peter S. Bearman, and funded by a grant HD31921 from the National Institute of Child Health and Human Development to the Carolina Population Center, University of North Carolina at Chapel Hill, with cooperative funding participation by the following agencies: The National Cancer Institute; The National Institute of Alcohol Abuse and Alcoholism; the National Institute on Deafness and other Communication Disorders; the National Institute on Drug Abuse; the National Institute of General Medical Sciences; the National Institute of Mental Health; the Office of AIDS Research, NIH; the Office of Director, NIH; The National Center for Health Statistics, Centers for Disease Control and Prevention, HHS; Office of Minority Health, Centers for Disease Control and Prevention, HHS, Office of the Assistant Secretary for Planning and Evaluation, HHS; and the National Science Foundation.
•Initially proposed as an adolescent version of the National Health and Social Life Study (Laumann et al) known as the “Teen Sex” study.
•Jesse Helms’ crew decided that asking teens about sexual behavior was inappropriate, and the study had the dubious distinction of being the only study ever explicitly outlawed.
•Fortunately, the same legislation stipulated that NIH fund a national health survey, and from the ashes of Teen Sex, Add Health was born.
•Funded at $24M for the first 4 years, Add Health was designed to provide a comprehensive image of the state of adolescent health and the behaviors that affect adolescent health.
The Add Health Design:Adolescents in Social Context
IndividualAttributesAttitudesBehaviorCapacities
Health Status
Family
Dyadic Relations
Peers and Networks
School
Community/Neighborhood
Parent In-Home
Saturated Sample
In-School
In-School
Contextual Database
ContextualContextual Data BaseNeighborhood
Community CharacteristicsHealth ServiceSchool Context
In-Home ParentParentingFamily DataRelations Between Family and AdolescentIdeal Sequence
SaturationPartnersRelational DataBehavior Characteristicsof Peers and Peer Group
In-SchoolSocial NetworksPeer GroupsSample Information Genetic High SES BlackContextual Variables
Substantive Domains covered in the Add Health DesignIndividual: Family: Relations, Peers & Nets: Community •Demographics
•Detailed, multiple race /ethnic categories•Immigrant status •Socio-Economic Status
•Health Status•Nutrition•STD & Sexual Behavior•Exposure•Emotional•Physical•Insurance/ Access
•Daily Activities•Exercise•TV/Hobbies
•Academic Exposure•Subjects taught•Sexual knowledge•Future Expectations
•Risk taking activity•Delinquency•Drugs•Fighting and Violence•Motivation
•Personality•Religion•Neighborhood assessments
•Detailed Household Roster•Family Structure
•Parental Interview•Sibling relations•Parental behaviors•Multiple observations in the same family•Parent’s knowledge of adol:
•Activities•Friends
•Adolescent Assessment of parents expectations and rule behavior•Twin Design
•Population sample in schools provides complete network images•Constructed network data•Friendship nomination files•Romantic relation characteristics
•Real and Ideal•Relationship timing and duration•Information from both sides of the relation in many cases•Peer assessment of peer activity: not just respondent assessment
•GIS links for spatial analysis•Contextual data at the Block Group, City, County and State level•Topics include:
•Population•Vital Statistics•Group Quarters•Households•Income•Poverty•Education•Labor Force•Housing•At Risk Children•Health Care•STD Levels•Crime•Religion•Elections•Social Welfare•Gov’t Expenditure•Abortion Access•Tobacco•Health Policy
Sampling Structure for Add Health
Sampling Frame of Adolescents and Parents N = 100,000+ (100 to 4,000 per pair of schools)
Disabled SampleSaturationSamples
from 16 Schools
Main Sample 200/Community
Ethnic Samples
GeneticSamples
School Sampling Frame = QED
Identical Twins Fraternal Twins Full Sibs Half SibsUnrelated Pairs
in Same HH
HS HS HS HS HS
Feeder Feeder Feeder Feeder Feeder
Cuban
High EducBlack
Puerto Rican
Chinese
Cuban
Add Health School Sampling Strategy
Core Sample 12,104
White: 8,467 Black: 2,384 Hispanic: 1,456
Male: 4,075 Female: 4,392 Male: 1,092 Female: 1,292 Male: 708 Female: 748
TwoParents:
3150
OneParent
760
Two Parents3325
OneParent
842
Two Parents
496
OneParent
466
Two Parents
569
OneParent
593
Two Parents
477
OneParent
184
Two Parents
505
OneParent
189
7th: 484 108 552 126 92 75 88 98 62 28 76 26
8th: 522 115 526 137 93 93 93 102 82 18 78 33
9th: 543 135 574 156 73 84 95 97 79 34 78 38
10th: 536 141 540 151 80 80 99 99 94 37 92 23
11th: 581 132 551 125 79 64 91 96 87 29 89 31
12th: 445 108 524 117 67 50 87 84 58 30 80 30
The National Longitudinal Study of Adolescent Health: Demographic Sub-Sample Sizes
Deductive Disclosure Risks:
Start with: 536White, Male, 10th Graders in Two parent Households:
Who are Jewish:10
And Have No Siblings:
1
Start with: 484White, Male, 7th Graders in Two parent Households:
Who Have Ever Been Held Back A Grade in School:87
And Play Basketball:
5
And Smoke:
1
Start with: 87Black, Female, 12th Graders in Two parent Households:
Who have Never been Held Back:77
And Smoke Regularly:
5
Deductive Disclosure Risks:
And Have 2 siblings
1
And are Catholic
1
Deductive Disclosure Risks:
Start with: 98Black, Female, 7th Graders in One parent Households:
Who Are Baptist:41
And have no Siblings:
9
And have one Sibling:
13
And have > one Sibling:
19
And Play Baskettball:
1
And Smoke:
1
And are Born in April:
1
Levels of Network Data
ego
Best Friends
ego
Local Network
Peer Group
ego
Measuring Network ContextPatterns
Pattern measures capture some feature of the distribution of relations across nodes in the network. These include:
•Density: % of all possible ties actually made•Reciprocity: likelihood that given a tie from i to j there will also be a tie from j to i. •Transitivity: extent to which friends of friends are also friends•Hierarchy: Is there a status order to nominations? How is it patterned?•Clustering: Are there significant groups? How so?•Segregation: Do attributes (such as race) and nominations correspond?•Distance: How many steps separate the average pair of persons in the school? Is this larger or smaller than expected?•Block models: What is the implied role structure underlying patterns of relations?
These features (usually) require having nomination data from each person in the network.
Measuring Network ContextComposition
Composition measures capture characteristics of the population of people within a given network level. These include:
•Heterogeneity: How dispersed are actors with respect to a given attribute?•Means: What is the mean GPA of ego’s friends? How likely is it that most of ego’s friends will go to college?•Dispersion: What is the age-range of people ego hangs out with?
These features can often be measured from the simple ego network.
Analysis with Social Network data
Networks as Dependent Variables•Interest is in explaining the observed patterns of relations.•Examples:
•Why are some schools segregated and others not?•What accounts for differences in hierarchy across schools?•What accounts for homophily in friendship choice?
•Tools:•Descriptive tools to capture properties•Standard analysis tools at the level of networks to explain the measures•p* and other specialized network statistical and simulation models
Networks as independent Variables•Interest is in explaining behavior with network context (Peer influence/ context models)•Examples:
•Is ego’s probability of smoking related to the smoking levels of those he/she hangs out with? (compositional context)•Is the transition to first intercourse affected by the peer context? •Are isolated students more likely to carry weapons to school than those in dense peer groups? (positional context)
•Tools:•Depends on dependent variable•Peer influence models•Dyad models•Contextual models, with network level as nested context (students within peer groups)
Analysis with Social Network data
Network Data Structures
1 2
3
5 4
GraphAdjacency Matrix
Arc ListNode ListSend
11234444555
Recv23421235134
Network Analysis Programs
1) UCI-NET•General Network analysis program, runs in Windows•Good for computing measures of network topography for single nets•Input-Output of data is a little chunky, but workable.•Not optimal for large networks•Available from:
Analytic [email protected]
2) STRUCTURE •“A General Purpose Network Analysis Program providing Sociometric Indices, Cliques, Structural and Role Equivalence, Density Tables, Contagion, Autonomy, Power and Equilibria In Multiple Network Systems.”•DOS Interface w. somewhat awkward syntax•Great for role and structural equivalence models•Manual is a very nice, substantive, introduction to network methods•Available from a link at the INSNA web site:http://www.heinz.cmu.edu/project/INSNA/soft_inf.html
Network Analysis Programs
3) NEGOPY•Program designed to identify cohesive sub-groups in a network, based on the relative density of ties.•DOS based program, need to have data in arc-list format•Moving the results back into an analysis program is difficult.•Available from:
William D. Richardshttp://www.sfu.ca/~richards/Pages/negopy.htm
4) PAJEK •Program for analyzing and plotting very large networks•Intuitive windows interface•Used for all of the real data plots in this presentation•Mainly a graphics program, but is expanding the analytic capabilities•Free•Available from:
Network Analysis Programs
5) Cyram Netminer for Windows: A new exploratory tool for networks
6) SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)•is a collection of IML and Macro programs that allow one to:
a) create network data structures from the Add Health nominationsb) import/export data to/from the other network programsc) calculate measures of network pattern and compositiond) analyze network models
•Allows one to work with multiple, large networks•Easy to move from creating measures to analyzing data•All of the Add Health data are already in SAS•Available by sending an email to:
Network Data Collected in Add HealthIn -School Network Data
•Complete Network Data collected in every school•Each student was asked to name up to 5 male and 5 female friends•These data provide the basic information needed to construct network context measures.•Due to response rates, we computed data on 129 of the 144 total schools.•Variable is named MF<#>AID form male friend, FF<#>AID for female friends.
Slide here of the survey instrument
Network Data Collected in Add HealthIn -School Network Data
Nomination Categories:•Matchable people inside ego’s school or sister school
•People who were present that dayID starting with 9 and are in the sample
•People who were absent that dayID starting with 9, but not in the school sample
•People in ego’s school, but not on the directoryNomination appears as 99999999
•People in ego’s sister school, but not on the directorNomination appears as 88888888
•People not in ego’s school or the sister schoolNomination appears as 77777777
•Other special codes•Nominations appears as 99959995
Nominator Categories•Matchable nominator
Person who was on the roster, ID starts is 9.•Unmatchable nominator
Person who was NOT on the roster, ID starts with 5 or 8
Network Data Collected in Add HealthIn -School Network Data
Example 1. Ego is a matchable person in the School
EgoM
M
M
M
Out
Un
True Network
EgoM
M
M
M
Out
Un
Observed Network
UnOut
Network Data Collected in Add HealthIn -School Network Data
Example 2. Ego is not on the school roster
M
M
M
M
M
Un
True Network
M
M
M
M
M
Un
Observed Network
Un
Un
Un
Network Data Collected in Add HealthIn -School Network Data
Characteristics of the Add Health School Sample
All Schools Schools w. network dataSample Characteristics
Number of schools 144 129Number of students 90,118 75,871
School TypePublic 89.6% 89.9%Private 10.4 10.1
Grade RangeJunior High School 40.6% 40.3%High School 43.4 43.47 - 12 16.1 16.3
Region***
West 19.4% 15.5%Midwest 22.9 24.0South 40.9 42.6North East 16.7 17.8
Demographic Characteristics% of schools >70% single race 52.7% 55%Family SES 6.03 6.02
Behavioral CharacteristicsSmoke Regularly 14.4% 14.7%Sexually active 32.3 32.9Expect to go to College 76.2 76.3
Active in school activities*p<.05, ** p<=.01, ***p<=.001.
Local - Network Characteristics (Std. Dev. in parentheses)Same Sex Cross Sex
Total Male Female Male: Female Female: MaleIn-school nominationsa 5.68
(3.45)3.08
(1.98)3.57
(1.74)2.19
(2.08)2.54
(1.95)
Out-of-school nominations 1.04(1.87)
0.42(0.98)
0.45(0.93)
0.42(1.09)
0.78(1.28)
Local network densitya 0.18(0.19)
0.22(0.24)
.26(.26)
.19(.25)
.15(.23)
Reciprocity rateb 0.40(0.30)
0.40(0.35)
0.51(0.34)
0.29(0.35)
0.27(0.34)
7th - 8th grade 0.36(0.29)
0.38(0.35)
0.46(0.33)
0.23(0.33)
0.20(0.30)
9th - 10th grade 0.38(0.30)
0.39(0.35)
0.52(0.34)
0.25(0.33)
0.26(0.34)
11th - 12th grade 0.45(0.31)
0.43(0.36)
0.56(0.34)
0.37(0.37)
0.32(0.36)
a) Includes nominations to people not sampledb) Proportion of ego's nominations that are reciprocated
Network Data Collected in Add HealthIn -School Network Data
Network Data Collected in Add HealthIn -Home Network Data
•Network Data were collected in both Wave1 and Wave 2 Surveys•There were two procedures:
•Saturated Settings•Attempted to survey every student from the In-School sample.•2 large schools, and 10 small schools.•Was supposed to replicate the in-school design exactly.
•Unsaturated Settings•Each person was only asked to name one other person
•In both cases, the design was not always carried out. As such, some of the students in the saturated settings were allowed to name only one male and one female friend, while some students who were in the non-saturated settings were asked to nominate a full slate of 5 and 5.
Network Data Collected in Add HealthIn -Home Network Data
Data Usage Notes:•Romantic Relation Overlap
For the W1 and W2 friendship data, any friendship that was also a romantic relation was recoded to 55555555, to protect the romantic relation nominations.
•Bad Machine on Wave 2 DataData on from one school in wave 2 seems to be corrupted. We have no way to show this for certain, but it seems to be the case that data from machines 200065 or 200106 gave incorrect data. We suspect this is so, because almost everyone who used these two machines “nominated” the same person multiple times. This results in one person having an abnormally large in-degree.
•All nomination #s are now valid•Unlike the in-school data, Ids starting with something other than ‘9’ can be nominated.
•Same out-of-sample special codes•All other special codes for these data are the same as in the in-school data.
Network Data Collected in Add HealthIn -Home Network Data
Descriptive Statistics for Saturated Settings
Constructing Network MeasuresTotal Network
To construct the social network from the nomination data, we need to integrate each person’s nominations with every other nomination.
Methods:1) Export the Nomination data to construct network in other program
MOST of the other programs require you to pre-process the data a great deal before they can read them. As such, it is usually easier to create the files in SAS first, then bring them into UCINET or some such program.
2) Construct the network in SASThe best way to do this is to combine IML and the MACRO language. SAS IML
lets you work with matrices in a (fairly) strait forward language, the SAS MACRO language makes it easy to work with all of the schools at once.
Programs already set up to do this are available in SPAN.
Constructing Network MeasuresAdjacency Matrices
The key to analyzing / measuring the total network is constructing either an adjacency matrix or an adjacency list. These data structures allow you to directly identify both the people ego nominates and the people that nominate ego. Thus, the first step in any network analysis will be to construct the adjacency matrix.
To do this you need to:1) Identify the universe of possible people in the network. This is usually the same as the set of people that you have sampled. However, if you want to include ties to non-sampled people you may make the universe include all people named by anyone.
2) create a blank matrix with n rows and n columns.
3) loop over all respondents, placing a value in the column that corresponds to the persons they nominate. This can be binary (named or not) or valued (number of activities they do with alter).
Constructing Network MeasuresLocal Networks.
•To create and calculate measures based only on the people ego nominates, you can work directly from the nomination list (don’t need to construct the adjacency matrix).
•To create and calculate measures based on the received or reciprocated ties, you need to have a list of people who nominate ego, which is easiest to get given the adjacency matrix.
•To calculate positional measures (density, reciprocity, etc.) all you need is the nomination data.
•To calculate compositional data, you need both the nomination data and matching attribute data.
Constructing Network MeasuresPeer Groups.
Identifying cohesive peer groups requires first specifying what a cohesive peer group is. Potential definitions could be:
a) all people within k steps of ego (extended ego-network)b) a set of people who interact with each other often (relative density)c) a set of people with a particular pattern of ties (a closed loop, for example)
UCINET, STRUCTURE, NEGOPY and SPAN all provide methods for identifying cohesive groups. They all differ on the underlying definition of what constitutes a group. The FACTIONS algorithm in UCINET and NEGOPY’s algorithm use relative density. The CROWD algorithm is SPAN uses a combination of relative density and pattern.
Once you have constructed the adjacency matrix, you can export to these other programs fairly easily. However, most of them are QUITE time consuming (FACTIONS, for example, is a bear) and take a good deal of time to run, so be sure you have identified exactly what you want before you start processing….
Constructing Network MeasuresPeer Groups Characteristics.
Identifying Cohesive Sub-Groups
•Cohesion: The group is difficult to separate; the connection of the group does not depend on one relation or person.
•Groupness: Relative to the rest of the network, a cohesive sub - group has high relational volume.
• Inclusion: Some people are not in groups while others bridge groups.
Examples of Peer groups within Add Health High SchoolsCrowds Algorithm
Observed Clustering within Adolescent Social Networks
• On average, 65% of a school’s adolescents are in
cohesive sub-groups.• 87% of all relations are within sub-groups.• The average sub-group has 22 members.• The average diameter for a sub-group is 3 steps. • The mean segregation index is .96 (1=Complete,
0=Random)
Network Characteristics of Sub Groups
Observed Clustering within Adolescent Social NetworksDistribution of Characteristic within groups, relative to school distribution
Grade
34%
Race
65%
College
84%
GPA
86%
Activities
79%
Smoking
74%
Groups 23 & 24 Group 1
Group 15 Group 18
Constructing Network DataSchool Level
2
4
30
13
16
3
1
20
7
24
5
19
17
27
1810
15
23
25
14 31
12
21
Mostly Seniors
Mostly Juniors
Mostly Sophomores
Mostly Freshmen
Mixed Grades
Directed Arrow
Constructing Network DataSchool Level
Inter-Group Relations
Sa
me
Ra
ce F
rie
nd
ship
Pre
fere
nce
(b 1
)
Racial Heterogeneity
.1 .8
-.2
1.6
.3 .6
.4
1.0
Countryside h.s.
Same race friendship preferenceby racial heterogeneity
Analysis Using Network DataNets as Dependent Variable: Racial Segregation
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Same Race
SES
GPA
Both Smoke
College
Drinking
FightReciprocity
Same Sex
Same Clubs
Transitivity
Intransitivity
Same Grade
Analysis Using Network DataNets as Dependent Variable: Modeling the network
Network Model Coefficients, In school Networks
RegulationLow High
Low
High
Anomic Altruistic
Egotistic Fatalistic
Relational Structures and Forms of Suicide
Integration
Analysis Using Network DataNets as Independent Variable: Suicide
Measuring Isolation and Anomie.
Ego
Alter
Third( )
Peer Anomie
Intransitivity
Isolation
School
Analysis Using Network DataNets as Independent Variable: Suicide
Effect of Friendship Structure on Suicidal ThoughtsNet of demographic, family, school, religion and personal characteristcs.
Males FemalesOR 95% CI OR 95% CI
Network Isolation 0.665 (0.307 - 1.445) 2.010 (1.073 - 3.765) Intransitivity Index 0.747 (0.358 - 1.558) 2.198 (1.221 - 3.956) Friend Attempted Suicide 2.725 (2.187 - 3.395) 2.374 (2.019 - 2.791) Trouble with People 0.999 (0.912 - 1.095) 1.027 (0.953 - 1.106)
Analysis Using Network DataNets as Independent Variable: Suicide
Race/Ethnicity
Pro
bab
ility
of
carr
yin
g a
wea
po
n
a) Figure represents predicted probabilities model 6 of table 5, holding all other variables at the full sample mean.
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
White Black Hispanic Asian Native American Other
Males
Females
Analysis Using Network DataNets as Independent Variable: Weapons
Probability of Carrying a Weapon by Race and Gender
Positive:
Pro
ba
bili
ty o
f ca
rryi
ng
a w
ea
po
n t
o s
cho
ol
Negative:
character of peer context
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.08
0
0.19
1
0.3
2
0.41
3
0.52
4
0.63
5
0.74
6
0.85
7
Peer Group Deviance
School Oriented Peer Group
Social Outsiders
Analysis Using Network DataNets as Independent Variable: Weapons
Network Effects on Weapon Carrying
Analysis Using Network DataNets as Independent Variable: Sexual Debut
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
76-100%51-75%26-50%1-25 %0 %
The Effect of Peer Group Composition on Sexual Debut*
Proportion of High-Risk Adolescents in Peer Group
N=380 N=1898 N=2026 N=660 N=88
*Probability of experiencing sexual debut during the 18 months following the in-school survey. Controlling for age, socio-demographic characteristics, family and peer group characteristics (see table A1, model 6). Bearman and Bruckner, 1999
Est
imat
ed P
roba
bili
ty o
f S
exua
l Deb
ut
0.00
0.05
0.10
0.15
0.20
76-100 %51-75%26-50%1-25 %0 %no friends
The Effect of Close Friends' Risk Status on Pregnancy Risk*
Proportion of Low-Risk Male and Female Close Friends
N=308 N=932 N=100 N=517 N=550 N=427
*Probability of experiencing a pregnancy during the 18 months following the in-school survey. Controlling for age, socio-demographic and individual characteristics, family characteristics, and popularity (see table B1, model 3), Bearman and Brukner 1999.
Est
imat
ed P
roba
bili
ty o
f P
regn
ancy
Analysis Using Network DataNets as Independent Variable: Pregnancy
Wave III Respondents
Wave II participants• Main sample plus special samples• Aged 18-25
Partners or original participants• 2,000 couples
Add Health Wave III: The Transition to Adulthood
• How is what happens in adolescencerelated to what happens in youngadulthood?
• The influence of adolescent contextson young adult outcomes
Additional Content of Wave III
AHPVT
Social security number
Longitude and latitude
College context
Physical measurements
Biomarkers
Network Transitions
Special Features
CASI event history calendar
Preloaded data from Waves I and II
Re-interviews with STI-positive individuals
Binge-drinking sample
High school transcript data
Wave III Questionnaire Content
Family relationships Relationships
Friends Pregnancies and births
Education Delinquency and violence
Work experience Involvement with criminal justice system
General health Tobacco, alcohol, drugs, suicide
Mental health Mentoring
Illnesses, disabilities Civic participation
Marriage/cohabitation Religion and spirituality
Sexual experiences and STDs Gambling