types of surveys

99
1

Upload: tania

Post on 31-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Types of Surveys. Cross-sectional surveys a specific population at a given point in time will have one or more of the design components stratification clustering with multistage sampling unequal probabilities of selection Longitudinal - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Types of Surveys

1

Page 2: Types of Surveys

2

Types of SurveysCross-sectional• surveys a specific population at a given point in

time• will have one or more of the design components

• stratification• clustering with multistage sampling• unequal probabilities of selection

Longitudinal• surveys a specific population repeatedly over a

period of time• panel• rotating samples

Page 3: Types of Surveys

3

Cross Sectional Surveys

Sampling Design Terminology

Page 4: Types of Surveys

4

Methods of Sample Selection

Basic methods

• simple random sampling

• systematic sampling

• unequal probability sampling

• stratified random sampling

• cluster sampling

• two-stage sampling

Page 5: Types of Surveys

5

Simple Random Sampling

Why?• basic building block of sampling• sample from a homogeneous group of units

How?• physically make draws at random of the units

under study• computer selection methods: R, Stata

0 10 20 30 40 50 60 70 80 90 100

Page 6: Types of Surveys

6

Systematic Sampling

Why?• easy• can be very efficient depending on the structure of

the populationHow?• get a random start in the population• sample every kth unit for some chosen number k

0 10 20 30 40 50 60 70 80 90 100

Page 7: Types of Surveys

7

Additional Note

Simplifying assumption:

• in terms of estimation a systematic sample is often treated as a simple random sample

Key assumption:

• the order of the units is unrelated to the measurements taken on them

Page 8: Types of Surveys

8

Unequal Probability SamplingWhy?• may want to give greater or lesser weight to

certain population units• two-stage sampling with probability proportional

to size at the first stage and equal sample sizes at the second stage provides a self-weighting design (all units have the same chance of inclusion in the sample)

How?• with replacement• without replacement

Page 9: Types of Surveys

9

With or Without Replacement?

• in practice sampling is usually done without replacement

• the formula for the variance based on without replacement sampling is difficult to use

• the formula for with replacement sampling at the first stage is often used as an approximation

Assumption: the population size is large and the sample size is small – sampling fraction is less than 10%

Page 10: Types of Surveys

10

Stratified Random Sampling

Why?• for administrative convenience• to improve efficiency• estimates may be required for each stratumHow?• independent simple random samples are chosen

within each stratum

0 10 20 30 40 50 60 70 80 90 100

Page 11: Types of Surveys

11

Example: Survey of Youth in Custody

• first U.S. survey of youths confined to long-term, state-operated institutions

• complemented existing Children in Custody censuses.

• companion survey to the Surveys of State Prisons• the data contain information on criminal histories,

family situations, drug and alcohol use, and peer group activities

• survey carried out in 1989 using stratified systematic sampling

Page 12: Types of Surveys

12

SYC Design

strata• type (a) groups of smaller institutions• type (b) individual larger institutions

sampling units• strata type (a)

• first stage – institution by probability proportional to size of the institution

• second stage – individual youths in custody

• strata type (b)• individual youths in custody

• individuals chosen by systematic random sampling

Page 13: Types of Surveys

13

Cluster Sampling

Why?• convenience and cost• the frame or list of population units may be

defined only for the clusters and not the units

How?• take a simple random sample of clusters and

measure all units in the cluster

0 10 20 30 40 50 60 70 80 90 100

Page 14: Types of Surveys

14

Two-Stage Sampling

Why?

• cost and convenience

• lack of a complete frame

How?

• take either a simple random sample or an unequal probability sample of primary units and then within a primary take a simple random sample of secondary units

0 10 20 30 40 50 60 70 80 90 100

Page 15: Types of Surveys

15

Synthesis to a Complex DesignStratified two-stage cluster samplingStrata• geographical areasFirst stage units• smaller areas within the larger areasSecond stage units• householdsClusters• all individuals in the household

Page 16: Types of Surveys

16

Why a Complex Design?

• better cover of the entire region of interest (stratification)

• efficient for interviewing: less travel, less costly

Problem: estimation and analysis are more complex

Page 17: Types of Surveys

17

Ontario Health Survey

• carried out in 1990• health status of the population was

measured• data were collected relating to the risk

factors associated with major causes of morbidity and mortality in Ontario

• survey of 61,239 persons was carried out in a stratified two-stage cluster sample by Statistics Canada

Page 18: Types of Surveys

18

OHSSample Selection• strata: public health units

– divided into rural and urban strata

• first stage: enumeration areas defined by the 1986 Census of Canada and selected by pps

• second stage: dwellings selected by SRS

• cluster: all persons in the dwelling

Page 19: Types of Surveys

19

Longitudinal Surveys

Sampling Design

Page 20: Types of Surveys

20

Schematic RepresentationPanel Survey

0

1

2

3

4

Respondents

Tim

e

Page 21: Types of Surveys

21

Schematic Representation

Rotation Survey

0

1

2

3

4

Respondents

Tim

e

Page 22: Types of Surveys

22

British Household Panel Survey

Objectives of the survey

• to further understanding of social and economic change at the individual and household level in Britain

• to identify, model and forecast such changes, their causes and consequences in relation to a range of socio-economic variables.

Page 23: Types of Surveys

23

BHPS: Target Population and Frame

Target population

• private households in Great Britain

Survey frame

• small users Postcode Address File (PAF)

Page 24: Types of Surveys

24

BHPS: Panel Sample

• designed as an annual survey of each adult (16+) member of a nationally representative sample

• 5,000 households approximately• 10,000 individual interviews approximately.

• the same individuals are re-interviewed in successive waves

• if individuals split off from original households, all adult members of their new households are also interviewed.

• children are interviewed once they reach the age of 16• 13 waves of the survey from 1991 to 2004

Page 25: Types of Surveys

25

BHPS: Sampling DesignUses implicit stratification embedded in two-stage

sampling• postcode sector ordered by region• within a region postcode sector ordered by socio-

economic group as determined from census data and then divided into four or five strata

Sample selection• systematic sampling of postcode sectors from ordered

list• systematic sampling of delivery points (≈ addresses or

households)

Page 26: Types of Surveys

26

BHPS: Schema for Sampling

Page 27: Types of Surveys

27

Survey Weights

Page 28: Types of Surveys

28

Survey Weights: Definitions

initial weight• equal to the inverse of the inclusion probability

of the unit

final weight• initial weight adjusted for nonresponse,

poststratification and/or benchmarking• interpreted as the number of units in the

population that the sample unit represents

Page 29: Types of Surveys

29

Interpretation

Interpretation• the survey

weight for a particular sample unit is the number of units in the population that the unit represents

Not sampled, Wt = 2, Wt = 5, Wt = 6, Wt = 7

Page 30: Types of Surveys

30

Effect of the Weights• Example: age

distribution, Survey of Youth in Custody

Age

Counts

Sum of Weights

11 1 28 12 9 149 13 53 764 14 167 2143 15 372 3933 16 622 5983 17 634 5189 18 334 2778 19 196 1763 20 122 1164 21 57 567 22 27 273 23 14 150 24 13 128

Totals 2621 25012

Page 31: Types of Surveys

31

Unweighted Histogram

Age Distribution of Youth in Custody

0

0.05

0.1

0.15

0.2

0.25

0.3

11 12 13 14 15 16 17 18 19 20 21 22 23 24

Age

Pro

po

rtio

n

Page 32: Types of Surveys

32

Weighted Histogram

Age Distribution of Youth in Custody

0

0.05

0.1

0.15

0.2

0.25

0.3

11 12 13 14 15 16 17 18 19 20 21 22 23 24

Age

Pro

po

rtio

n

Page 33: Types of Surveys

33

Weighted versus Unweighted

Weighted and Unweighted Histograms

0

0.05

0.1

0.15

0.2

0.25

0.3

11 12 13 14 15 16 17 18 19 20 21 22 23 24

Age

Pro

po

rtio

n

Weighted Unweighted

Page 34: Types of Surveys

34

Observations

• the histograms are similar but significantly different• the design probably utilized approximate

proportional allocation

• the distribution of ages in the unweighted case tends to be shifted to the right when compared to the weighted case• older ages are over-represented in the dataset

Page 35: Types of Surveys

35

Survey Data Analysis

Issues and Simple Examples from Graphical Methods

Page 36: Types of Surveys

36

Basic Problem in

Survey Data Analysis

Page 37: Types of Surveys

37

Issues

iid (independent and identical distribution) assumption

• the assumption does not not hold in complex surveys because of correlations induced by the sampling design or because of the population structure

• blindly applying standard programs to the analysis can lead to incorrect results

Page 38: Types of Surveys

38

Example: Rank Correlation Coefficient

Pay equity survey dispute: Canada Post and PSAC• two job evaluations on the same set of people (and

same set of information) carried out in 1987 and 1993

• rank correlation between the two sets of job values obtained through the evaluations was 0.539

• assumption to obtain a valid estimate of correlation: pairs of observations are iid

Page 39: Types of Surveys

39

Scatterplot of Evaluations

• Rank correlation is 0.539

0 100 200

0

100

200

Rank in 1987

Ran

k in

199

3

Page 40: Types of Surveys

40

A Stratified Design with Distinct Differences Between Strata

• the pay level increases with each pay category (four in number)

• the job value also generally increases with each pay category

• therefore the observations are not iid

Page 41: Types of Surveys

41

Scatterplot by Pay Category

2

3

4

5

0 100 200

0

100

200

Rank in 1987

Ran

k in

199

3

Page 42: Types of Surveys

42

Correlations within Level

Correlations within each pay level

• Level 2: –0.293

• Level 3: –0.010

• Level 4: 0.317

• Level 5: 0.496

Only Level 4 is significantly different from 0

Page 43: Types of Surveys

43

Graphical Displays

first rule of data analysis • always try to plot the data to get some initial

insights into the analysis

common tools• histograms• bar graphs• scatterplots

Page 44: Types of Surveys

44

Histogramsunweighted

• height of the bar in the ith class is proportional to the number in the class

weighted• height of the bar in the ith class is proportional to

the sum of the weights in the class

Page 45: Types of Surveys

45

Body Mass Index

measured by• weight in kilograms

divided by square of height in meters

• 7.0 < BMI < 45.0• BMI < 20: health

problems such as eating disorders

• BMI > 27: health problems such as hypertension and coronary heart disease

Page 46: Types of Surveys

46

BMI: Women

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

BMI 11 14 17 20 23 26 29 32 35 38 41 44

Page 47: Types of Surveys

47

BMI: Men

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

BMI 11 14 17 20 23 26 29 32 35 38 41 44

Page 48: Types of Surveys

48

BMI: Comparisons

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

BMI 11 14 17 20 23 26 29 32 35 38 41 44

Women Men

Page 49: Types of Surveys

49

Bar GraphsSame principle as histograms

unweighted• size of the ith bar is proportional to the number in

the class

weighted• size of the ith bar is proportional to the sum of the

weights in the class

Page 50: Types of Surveys

50

Ontario Health SurveyDistribution of Levels of Happiness by Marital

Status

0% 20% 40% 60% 80% 100%

Married

Single

Widowed

Divorced

Ma

rita

l Sta

tus

Percentage

Happy Somewhat happy Somewhat unhappy Unhappy Very unhappy

Page 51: Types of Surveys

51

Scatterplots

unweighted

• plot the outcomes of one variable versus another

problem in complex surveys

• there are often several thousand respondents

Page 52: Types of Surveys

52

20 30 40 50 60

Age

1020

3040

BM

IScatterplot of BMI by Age and Sex

Page 53: Types of Surveys

53

Solution

• bin the data on one variable and find a representative value

• at a given bin value the representative value for the other variable is the weighted sum of the values in the bin divided by the sum of the weights in the bin

Page 54: Types of Surveys

54

0 10 20 30 40

Age

2324

2526

27

Bin

ned-

BM

I

BMI Trends by Age and Sex

Page 55: Types of Surveys

5512 22 32 42

BMI

15

20

25

30

DB

MI

DBMI versus BMI (binned)

Bubble Plots• size of the circle is related to the sum of the surveys weights in the

estimate• more data in the BMI range 17 to 29 approximately

Page 56: Types of Surveys

56

Computing Packages

STATA and R

Page 57: Types of Surveys

57

Available Software for Complex Survey Analysis

• commercial Packages:• STATA• SAS• SPSS• Mplus

• noncommercial Package• R

Page 58: Types of Surveys

58

STATAdefining the sampling design: svyset

– examplesvyset [pweight=indiv_wt], strata(newstrata) psu(ea) vce(linear)

– output:pweight: indiv_wt VCE: linearized Strata 1: newstrata SU 1: ea FPC 1: <zero>

Page 59: Types of Surveys

59

R: survey package

• define the sampling design: svydesign– wk1de<-

svydesign(id=~ea,strata=~newstrata,weight=~indiv_wt,nest=T,data=work1)

• output> summary(wk1de)Stratified 1 - level Cluster Sampling designWith (1860) clusters.svydesign(id = ~ea, strata = ~newstrata, weight = ~indiv_wt, nest = T, data = work1)

Page 60: Types of Surveys

60

Syntax• STATA:

– svy: estimate– Example: least squares estimation– svyset [pweight=indiv_wt], strata(newstrata) psu(ea)– svy: regress dbmi bmi

• R:– svy***(*, design, data=, ...)– Example: least squares estimation– wk2de<-

svydesign(id=~ea,strata=~newstrata,weight=~indiv_wt,nest=T,data=work2)

– svyglm(dbmi~bmi, data=work2,design=wk2de)

Page 61: Types of Surveys

61

Available Survey Commands

R STATA

Descriptive Yes Yes

Regression Yes Yes (More)

Resampling Yes Yes

Longitudinal Yes No

PMLE Yes Yes

Calibration Yes No

Page 62: Types of Surveys

62

Survey Data Analysis

Contingency Tables

and

Issues of Estimation of Precision

Page 63: Types of Surveys

63

General Effect of Complex Surveys on Precision

• stratification decreases variability (more precise than SRS)

• clustering increases variability (less precise than SRS)

• overall, the multistage design has the effect of increasing variability (less precise than SRS)

Page 64: Types of Surveys

64

Illustration Using Contingency Tables

• two categorical variables that can be set out in I rows and J columns

• can get a survey estimate of the proportion of observations in the cell defined by the ith row and jth column:

ijp

Page 65: Types of Surveys

65

Example: Ontario Health Survey

• rows: five levels describing levels of happiness that people feel

• columns: four levels describing the amount of stress people feel

• Is there an association between stress and happiness?

Page 66: Types of Surveys

66

STATA Commands

use "I:\workshopjune\work.dta", clear svyset [pweight=indiv_wt], strata(newstrata) psu(ea) svy: tabulate happiness stress (running tabulate on estimation sample) Number of strata = 72 Number of obs = 48057 Number of PSUs = 1860 Population size = 7961780.7 Design df = 1788

Page 67: Types of Surveys

67

STATA Output

• table on stress and happiness

• estimated proportions in the table with test statistic

------------------------------------------------------- | stress happiness | 1 2 3 4 Total ----------+-------------------------------------------- 1 | .042 .2567 .2856 .085 .6692 2 | .026 .1426 .0935 .0109 .2731 3 | .0106 .0246 .0085 8.5e-04 .0446 4 | .004 .0045 .0015 8.4e-04 .0108 5 | .0016 3.4e-04 2.0e-04 2.1e-04 .0023 | Total | .0841 .4288 .3893 .0978 1 ------------------------------------------------------- Key: cell proportions Pearson: Uncorrected chi2(12) = 3674.8280 Design-based F(8.66, 15484.10)= 89.2775 P = 0.0000

Page 68: Types of Surveys

68

Possible Test Statisticsadapt the classical test statistic

• need the sampling distribution of the statistic

Wald Test

• need an estimate of the variance-covariance matrix

Page 69: Types of Surveys

69

Estimation of Variance or Precision

• variance estimation with complex multistage cluster sample design:

• exact formula for variance estimation is often too complex; use of an approximate approach required

• NOTE: taking account of the design in variance estimation is as crucial as using the sampling weights for the estimation of a statistic

Page 70: Types of Surveys

70

Some Approximate Methods

• Taylor series methods

• Replication methods• Balanced Repeated Replication (BRR)

• Jackknife

• Bootstrap

Page 71: Types of Surveys

71

Replication Methods• you can estimate the variance of an

estimated parameter by taking a large number of different subsamples from your original sample

• each subsample, called a replicate, is used to estimate the parameter

• the variability among the resulting estimates is used to estimate the variance of the full-sample estimate

• covariance between two different parameter estimates is obtained from the covariance in replicates

• the replication methods differ in the way the replicates are built

Page 72: Types of Surveys

72

AssumptionsThe resulting distribution of the test statistic is

based on having a large sample size with the following properties

• the total number of first stage sampled clusters (or primary sampling units) is assumed large• the primary sample size in each stratum is small

but the number of strata is large• the number of primary units in a stratum is large

• no survey weight is disproportionately large

Page 73: Types of Surveys

73

Possible Violations of Assumptions• the complex survey (stratified two-sample

sampling, for example) was done on a relatively small scale

• a large-scale survey was done but inferences are desired for small subpopulations

• stratification in which a few strata (or just one) have very small sampling fractions compared to the rest of the strata

• The sampling design was poor resulting in large variability in the sampling weights

Page 74: Types of Surveys

74

Survey Data Analysis

Linear and Logistic

Regression

Page 75: Types of Surveys

75

General Approach

• form a census statistic (model estimate or expression or estimating equation)

• for the census statistic obtain a survey estimate of the statistic

• the analysis is based on the survey estimate

Page 76: Types of Surveys

76

Regression

Use of ordinary least squares can lead to

• badly biased estimates of the regression coefficients if the design is not ignorable

• underestimation of the standard errors of the regression coefficient if clustering (and to a lesser extent the weighting) is ignored

Page 77: Types of Surveys

77

Example: Ontario Health SurveyRegress desired body mass index (DBMI) on body

mass index (BMI)

STATA Unweighted Weighted Intercept

Estimate 10.877 11.196 10.877 S.E. 0.141 0.064 0.065

Slope

Estimate 0.4958 0.4716 0.4858 S.E. 0.0058 0.0025 0.0026

Page 78: Types of Surveys

78

Simple Linear Regression Model

• typical regression model

• linear relationship plus random error

• errors are independent and identically distributed

0)e E(e,σ) E(e0,)E(e

ex βαy

ji22

ii

iii

Page 79: Types of Surveys

79

Census Statistic

• census estimate of the slope parameter

• Problem: the assumption of independent errors in the population does not hold

• Solution: the least squares estimate is a consistent estimate of the slope

N

1 i

2i

N

1 iii

)X(x

)Y)(yX(xβB

Page 80: Types of Surveys

80

Survey Estimate• the census estimate B is now the parameter of

interest• the survey estimate is given by

• estimate obtained from an estimating equation• the estimate of variance cannot be taken from the

analysis of variance table in the regression of y on x using either a weighted or unweighted analysis

i

2ii

iiii

)X(xw

)Y)(yX(xwb

Page 81: Types of Surveys

81

Variance Estimation

Again, estimate of the variance of b is obtained from one of the following procedures

• Taylor linearization

• Jackknife

• BRR

• Bootstrap

Page 82: Types of Surveys

82

Issues in Analysis

• application of the large sample distributional results• small survey• regression analysis on small domains of interest

• multicollinearity • survey data files often have many variables

recorded that are related to one another

Page 83: Types of Surveys

83

Multicollinearity Example: Ontario Health Survey

Two regression models: regress desired body mass index on

• actual body mass index, age, gender, marital status, smoking habits, drinking habits, and amount of physical activity

• all of the above variables plus interaction terms: marital status by smoking habits, marital status by drinking habits, physical activity by age

Page 84: Types of Surveys

84

Partial STATA OutputNo interaction terms

Interaction terms present

------------------------------------------------------------------------------ | Linearized dbmi | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bmi | .4375517 .0066716 65.58 0.000 .4244667 .4506368 age | .0157202 .0014647 10.73 0.000 .0128475 .0185929 _Imarital_2 | .1413547 .0498052 2.84 0.005 .0436718 .2390377 _Imarital_3 | .4752516 .1416521 3.36 0.001 .1974293 .7530739 _Imarital_4 | -.0349268 .0749697 -0.47 0.641 -.1819648 .1121113 _Isex_2 | -2.192169 .036238 -60.49 0.000 -2.263243 -2.121095

------------------------------------------------------------------------------ | Linearized dbmi | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bmi | .4369983 .0066473 65.74 0.000 .4239608 .4500357 age | .0027515 .0045811 0.60 0.548 -.0062335 .0117364 _Imarital_2 | .020803 .283399 0.07 0.941 -.5350276 .5766337 _Imarital_3 | .8300453 .3153888 2.63 0.009 .2114731 1.448618 _Imarital_4 | -.486307 .4478352 -1.09 0.278 -1.364646 .3920324 _Isex_2 | -2.193464 .0362143 -60.57 0.000 -2.264491 -2.122437

Page 85: Types of Surveys

85

Comparison of Domain MeansDomains and Strata• both are nonoverlapping parts or segments of a

population• usually a frame exists for the strata so that

sampling can be done within each stratum to reduce variation

• for domains the sample units cannot be separated in advance of sampling

Inferences are required for domains.

Page 86: Types of Surveys

86

Regression Approach• use the regression commands in STATA

and declare the variables of interest to be categorical

• example: DBMI relative to BMI related to sex and happiness index

STATA commandsuse "I:\workshopjune\work.dta", clear svyset [pweight=indiv_wt], strata(newstrata) psu(ea) . . xi:svy: regress ratio i.sex*i.happiness i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted) i.happiness _Ihappiness_1-5 (naturally coded; _Ihappiness_1 omitted) i.sex*i.happi~s _IsexXhap_#_# (coded as above) (running regress on estimation sample)

Page 87: Types of Surveys

87

STATA Output

------------------------------------------------------------------------------ | Linearized ratio | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Isex_2 | -.0555096 .0022378 -24.81 0.000 -.0598986 -.0511206 _Ihappines~2 | .0036588 .0033689 1.09 0.278 -.0029487 .0102663 _Ihappines~3 | .0038151 .0082526 0.46 0.644 -.0123708 .0200009 _Ihappines~4 | .0256273 .0181474 1.41 0.158 -.0099653 .0612199 _Ihappines~5 | .0736566 .086237 0.85 0.393 -.0954801 .2427933 _IsexXhap_~2 | -.0088389 .0046613 -1.90 0.058 -.0179811 .0003032 _IsexXhap_~3 | -.0292948 .0114269 -2.56 0.010 -.0517063 -.0068833 _IsexXhap_~4 | -.0720886 .0224737 -3.21 0.001 -.1161663 -.0280108 _IsexXhap_~5 | -.1428534 .0978592 -1.46 0.145 -.3347848 .0490779 _cons | .9628054 .0016317 590.05 0.000 .9596051 .9660058 ------------------------------------------------------------------------------

Page 88: Types of Surveys

88

Logistic Regression

• probability of success pi for the ith individual

• vector of covariates xi associated with ith individual

• dependent variable must be 0 or 1, independent variables xi can be categorical or continuous

Does the probability of success pi depend on the covariates xi – and in what way?

Page 89: Types of Surveys

89

Census Parameter

Obtained from the logistic link function

and the census likelihood equation for the regression parameters

Note: it is the log odds that is being modeled in terms of the covariate

ii

i αp1

pln xβ

Page 90: Types of Surveys

90

Example: Ontario Health Survey

How does the chance of suffering from hypertension depend on:

• body mass index• age• gender• smoking habits• stress• a well-being score that is determined from self-

perceived factors such as the energy one has, control over emotions, state of morale, interest in life and so on

Page 91: Types of Surveys

91

STATA Commandsuse "I:\workshopjune\work.dta", clear svyset [pweight=indiv_wt], strata(newstrata) psu(ea) recode hyper (1=1) (2=0) (hyper: 24258 changes made) xi:svy: logit hyper bmi age i.sex i.smoktype i.stress i.wellbe i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted) i.smoktype _Ismoktype_1-4 (naturally coded; _Ismoktype_1 omitted) i.stress _Istress_1-4 (naturally coded; _Istress_4 omitted) i.wellbe _Iwellbe_1-4 (naturally coded; _Iwellbe_1 omitted) (running logit on estimation sample)

Page 92: Types of Surveys

92

STATA Output part I

xi:svy: logit hyper bmi age i.sex i.smoktype i.stress i.wellbe i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted) i.smoktype _Ismoktype_1-4 (naturally coded; _Ismoktype_1 omitted) i.stress _Istress_1-4 (naturally coded; _Istress_4 omitted) i.wellbe _Iwellbe_1-4 (naturally coded; _Iwellbe_1 omitted) (running logit on estimation sample) Number of strata = 72 Number of obs = 25871 Number of PSUs = 1849 Population size = 4341226.9 Design df = 1777 F( 12, 1766) = 64.99 Prob > F = 0.0000

Page 93: Types of Surveys

93

STAT Output part II------------------------------------------------------------------------------ | Linearized hyper | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bmi | .1029348 .00803 12.82 0.000 .0871855 .118684 age | .0850085 .0040016 21.24 0.000 .0771601 .0928569 _Isex_2 | -.0094895 .0832978 -0.11 0.909 -.1728615 .1538825 _Ismoktype_2 | -.1068761 .100976 -1.06 0.290 -.3049203 .0911682 _Ismoktype_3 | -.1391754 .2245528 -0.62 0.535 -.5795907 .3012399 _Ismoktype_4 | -.1862018 .1050622 -1.77 0.077 -.3922601 .0198566 _Istress_1 | .4201336 .2115243 1.99 0.047 .005271 .8349961 _Istress_2 | .0103797 .2055384 0.05 0.960 -.3927428 .4135022 _Istress_3 | -.177385 .2015597 -0.88 0.379 -.572704 .217934 _Iwellbe_2 | -.6197166 .2755986 -2.25 0.025 -1.160248 -.0791852 _Iwellbe_3 | -.7841664 .2593617 -3.02 0.003 -1.292853 -.2754803 _Iwellbe_4 | -1.07929 .2600326 -4.15 0.000 -1.589292 -.5692879 _cons | -8.12002 .441972 -18.37 0.000 -8.98686 -7.25318 ------------------------------------------------------------------------------

Page 94: Types of Surveys

94

GEE: Generalized Estimating Equations

Dependent or response variable• well-being measured on a 0 to 10 scale• focus is on women onlyIndependent or explanatory variables’• has responsibility for a child under age 12 (yes = 1, no = 2)• marital status (married = 1, separated = 2, divorced = 3,

never married = 5 [widowed removed from the dataset])• employment status (employed = 1, unemployed = 2, family

care = 3)STATA syntax

tsset pid year, yearlyxi: xtgee wellbe i.mlstat i.job i.child i.sex [pweight = axrwght], family(poisson) link(identity) corr(exchangeable)

Page 95: Types of Surveys

95

GEE Results

------------------------------------------------------------------------------ | Semi-robust wellbe | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Imlstat_2 | 1.206905 .2036603 5.93 0.000 .8077382 1.606072 _Imlstat_3 | .3732488 .120658 3.09 0.002 .1367635 .6097342 _Imlstat_5 | -.0250266 .077469 -0.32 0.747 -.1768631 .1268098 _Ichild_2 | -.0456858 .063007 -0.73 0.468 -.1691773 .0778056 _Ijobc_2 | .9498503 .4045538 2.35 0.019 .1569394 1.742761 _Ijobc_3 | .0124392 .1827747 0.07 0.946 -.3457926 .370671 _cons | 1.922769 .0554797 34.66 0.000 1.814031 2.031507 ------------------------------------------------------------------------------

Page 96: Types of Surveys

96

For each type of initial marital statusMarried

Separated or divorced

Never married

------------------------------------------------------------------------------ | Semi-robust wellbe | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ichild_2 | .0666723 .0672237 0.99 0.321 -.0650836 .1984283 _Ijobc_2 | .888502 .720494 1.23 0.218 -.5236403 2.300644 _Ijobc_3 | .2989137 .2369747 1.26 0.207 -.1655482 .7633756 _cons | 1.825918 .0562928 32.44 0.000 1.715586 1.93625 ------------------------------------------------------------------------------

-------------+---------------------------------------------------------------- _Ichild_2 | -.5800375 .2041848 -2.84 0.005 -.9802324 -.1798426 _Ijobc_2 | .9851042 .5063179 1.95 0.052 -.0072607 1.977469 _Ijobc_3 | -.2799635 .290873 -0.96 0.336 -.8500642 .2901371 _cons | 2.406 .1951377 12.33 0.000 2.023538 2.788463 ------------------------------------------------------------------------------

-------------+---------------------------------------------------------------- _Ichild_2 | -.6732289 .1847309 -3.64 0.000 -1.035295 -.3111629 _Ijobc_2 | 1.239189 .8163575 1.52 0.129 -.3608422 2.83922 _Ijobc_3 | -.2405778 .6582919 -0.37 0.715 -1.530806 1.049651 _cons | 2.777478 .1734716 16.01 0.000 2.43748 3.117476 ------------------------------------------------------------------------------

Page 97: Types of Surveys

97

Cox Proportional Hazards Model

Dependent or outcome variable• time to breakdown of first marriage

Independent or explanatory variables• gender• race (white/non-white)• Age in 1991 (restricted to 18 – 60)• financial position: comfortable=1, doing

alright=2, just about getting by=3, quite difficult=4, very difficult =5

Page 98: Types of Surveys

98

STATA Commands

• Command for survival data set up

• Command for Cox proportional hazards mode

xi: stcox i.sex i.arace aage i.afisit

stset tvariable [pweight = axrwght], failure(fail==1) scale(1)

Page 99: Types of Surveys

99

STATA Output

------------------------------------------------------------------------------ | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Isex_2 | 1.251224 .1483865 1.89 0.059 .9917185 1.578635 _Iarace_1 | 1.979298 .7844764 1.72 0.085 .9102175 4.304047 aage | .9366 .0056464 -10.86 0.000 .9255984 .9477324 _Iafisit_2 | 1.226635 .201547 1.24 0.214 .8889056 1.692682 _Iafisit_3 | 1.519284 .2527755 2.51 0.012 1.096523 2.10504 _Iafisit_4 | 1.95182 .3985054 3.28 0.001 1.308124 2.912263 _Iafisit_5 | 1.936742 .5864388 2.18 0.029 1.069869 3.506006 ------------------------------------------------------------------------------