c p e s nimh collaborative psychiatric epidemiology surveys cpes sample designs and analysis weights...

38
C P E S C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys

CPES Sample Designs

and Analysis Weights

Steve Heeringa

Page 2: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 204/21/23

Sample Design and WeightingDocumentation

• www.icpsr.umich.edu/CPES • Table of contents

– Sample design– Weighting

• Detailed description of sample design and weighting procedures.

• Also, Heeringa et al. (2004). “Sample Designs and Sampling Methods for the Collaborative Psychiatric Epidemiology Studies (CPES).” International Journal of Methods in Psychiatric Research, 13(4), 221- 239.

Page 3: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 304/21/23

CPES Estimation and Inference

• CPES is based on the integration of three nationally representative multi-stage area probability samples.

• Estimation will require weighted analysis to compensate for disproportionate sampling.

• Inference (design-based) will require the use of variance estimation procedures for complex sample designs.

Page 4: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 404/21/23

Complex Sample Design

• Probability sample design:– Each population element has a known, non-

zero selection probability

– Properly weighted, sample estimates are unbiased or nearly unbiased for the corresponding population statistic.

– Variance of sample statistics can be estimated from the sample data (measurability)

Page 5: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 504/21/23

Complex Sample Design

• “Complex sample” :– A probability sample developed using

sampling procedures such as stratification, clustering and weighting designed to improve statistical efficiency, reduce costs or improve precision for subgroup analyses relative to simple random sampling (SRS)

– Unbiased estimates with measurable sampling error are still possible

– Independence of observations, equal probabilities of selection may no longer hold

Page 6: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 604/21/23

Where are Complex Sample Designs Used?

• Complex sample designs are the rule and not the exception in sample-based studies in the Social Sciences, Epidemiology, Public Health, Agriculture, Natural Resources and many other scientific fields.

• Multi-stage area probability sample designs for household surveys: NCS-R, NSAL, NLAAS, NHIS, NHANES, CPS, etc.

Page 7: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 704/21/23

Multi-Stage Sample of U.S. Households

Page 8: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 804/21/23

Consortium in Psychiatric Epidemiology Studies (CPES)

• Sampled persons are interviewed about mental health and mental health-related matters.

• Separate estimates (oversampling) required for domains defined by:– Age-sex groups, African Americans,

Latinos, Asians and All Others.

Page 9: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 904/21/23

CPES SAMPLE DESIGN(Primary stage)

• The PSUs are mostly single counties or metropolitan statistical areas (MSAs).

• Stratify PSUs by region and MSA status.• Select PSUs by probability proportionate

to estimated size (PPES) sampling (a few selected with certainty).

Page 10: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1004/21/23

CPES SAMPLE DESIGN(Second Stage)

• The SSUs are area segments. Area segments are Census blocks or combinations of Census blocks with a minimum number of households, e.g. 50 for NCS-R.

• Area segments are selected with PPES-typically 6-20 area segments per PSU

Page 11: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1104/21/23

CPES SAMPLE DESIGN(Third Stage)

• Before selection can occur, field staff list the housing units in the selected area segments.

• For each segment, a subsample of housing units is selected from the prepared list.

Page 12: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1204/21/23

CPES SAMPLE DESIGN(Fourth Stage)

Conduct a screening interview to classify persons by domain.

• Subsample persons to give required sampling rates by domain.– Survey weights will reflect this subsampling

step

• Conduct interviews with sample persons (SPs).

Page 13: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1304/21/23

Basic Weighting Approach

• Suppose sample element i was selected with probability πi. Then sample element i represents (1 / πi ) elements in the population.

• That is, count the element i in the analysis by giving it a weight of

wi = (1 / πi ).

• For example, a sample element selected with probability 1/10 represents 10 elements in the population.

Page 14: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1404/21/23

Weighting

Weighting is used to compensate for• Unequal probabilities of selection• Nonresponse (typically, a unit that fails to

respond)• In poststratification to adjust weighted

sample distributions for certain variables (e.g., age and sex) to make them conform to the known population distribution.

Page 15: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1504/21/23

Overall Weight

CPES weights incorporate simultaneously all three components, unequal probabilities of selection, nonresponse, and poststratification:

sel nr pstratW W W W

Page 16: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1604/21/23

Weighted Estimation

• In probability samples in which the sample elements have been selected with different probabilities and each case has been assigned a weight, Wi=1/ πi, the unbiased estimator of the population mean is:

1

1

n

i ii

weighted n

ii

W yy

W

Page 17: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1704/21/23

Weighted Estimation

• The simple unweighted estimator of the sample mean is a special case of the weighted estimator of the population mean with Wi=1 for all cases:

1 1 1

1 1

1

1

n n n

i i i ii i i

weighted n n

ii i

W y y yy y

nW

Page 18: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1804/21/23

Weighted Estimates of Population Statistics

2

2

2 1

1

xy

1,

1

Weighted estimate of the Variance of a Random Variable: S

1

Weighted Estimate of the Covariance of Variables y and x: S

( )

1

weighted

n

i ii

n

ii

n

i i ii

xy weighted n

ii

W y ys

W

W y y x xs

W

Page 19: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 1904/21/23

Weighted Estimation

1

2

1

Weighted Estimate of the

Simple Linear Regression Coefficient:

( )ˆ

( )

.

n

i i ii

weighted n

i ii

W y y x x

W x x

Page 20: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2004/21/23

Weighted Estimation

1

1

Weighted Estimate of

Multiple Linear Regression Coefficients,

( ' ) '

where: W is an n x n diagonal matrix with weight,

W 0 0

W= 0 0

0 0 n

X WX X WY

W

Page 21: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2104/21/23

Weighted Estimation

'

'

Pseudo-Maximum Likelihood, Weights are included in the score equations, e.g. for Logistic Model

ln

h i h i h i

ih

h i h i h iih

L BU B

w x y

w x P

Page 22: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2204/21/23

Effect of Weights on Variances of Estimates of Means

LW = Relative increase in sampling variances of estimates due to weighting

Generally, oversampling for subgroups:LW,SUB < LW

22 1

2

1

~ ( ) 1

n

ii

n

i

WCV W n

W

Page 23: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2304/21/23

Example of Weighting Lossin Disproportionate Sampling

Stratum% Hispanic Population

Oversampling Rate Weight

% of Hispanic Sample

1 19.2% 1:1 4 7%

2 22.8% 2:1 2 7%

3 24.1% 3:1 1.33 26%

4 33.9% 4:1 1 50%

100% 100%

Page 24: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2404/21/23

Example of Weighting Lossin Disproportionate Sampling

21

2

1

For 1000:

2759.92,148,569

1

1000 1 .284

ni

Wn

i

n

WL n

W

Page 25: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2504/21/23

Scaling of Weights

• Analysis based on:

* **1; ;

: a and b are constants

ii i i i

i

WW W a W W

b

where

will yield the same results (provided the software program treats the weights correctly).

Page 26: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2604/21/23

Normalizing Weights

,

1

i,cent

cent

Properties of normalized weights include:

W

W 1.0

i cent i n

ii

nW W

W

n

Page 27: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2704/21/23

Example: Distribution of NCS Part 2 Weights (Normalized)

MomentsN 5877 Sum Weights 5877

Mean 1.00000007 Sum Observations

5877.00039

Std Deviation 1.16010775 Variance 1.34584999

Skewness 2.8311973 Kurtosis 7.9882

Uncorrected SS 13785.2153 Corrected SS 7908.21452

Coeff Variation 116.010767 Std Error Mean 0.01513284

Page 28: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2804/21/23

CPES Weighting Approach

• Separately, NCS-R, NSAL and NLAAS cases are assigned to 1 of 12 distinct race/ancestry groups– Vietnamese, Filipino, Chinese, All other

Asian, Cuban, Puerto Rican, Mexican, All other Hispanic, Afro-Caribbean, African-American, White, All Other

– Persons with multiple ancestry are assigned to the first applicable discrete category in this list.

Page 29: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 2904/21/23

CPES Weighting Approach

• Separately, NCS-R, NSAL and NLAAS cases are assigned to 1 of 11 distinct geographic domains defined based on the race/ancestry composition of the Census tract in which they resided at the time of interview.– Hawaii is only represented in NLAAS

Page 30: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3004/21/23

CPES Weighting Approach

• The March 2002 Current Population Survey (CPS) was used to estimate the total size of the adult population in each race/ancestry stratum.

• Within each race/ancestry stratum the distribution of the population to the 11 geographic domains was estimated based on the CPES study which could provide the most precise estimates, e.g. NSAL for African Americans.

Page 31: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3104/21/23

CPES Weighting Approach

• Separately, NCS-R, NSAL and NLAAS study-specific weights (see chart) were post-stratified to the 12 x 11 grid of population totals for race/ancestry by geographic domain.

Page 32: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3204/21/23

CPES Weighting Approach

• CPES weight values for the pooled data were constructed by scaling each individual study weight by a factor equal to the proportion of the total sample in a grid cell that originated with the component study.

• The result is a CPES weight that sums to the CPS 2002 post-stratification controls in the 12 x 11 grid.

Page 33: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3304/21/23

CPES Weighting Approach

• Long and Short form weight variables– NCS-R uses a two-phase sampling approach in which only a subsample of cases progress from Part 1 to Part 2.

• Therefore, there are 2 CPES weights:– CPESWTLG – used when one or more

variables of interest are measured only in NCS-R Part 2

– CPESWTSH – used when all variables of interest are measured in NCS-R Part 1

Page 34: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3404/21/23

CPES Weighting Approach

• Weights for pooled analysis of data from only two of the three CPES data sets:– NCNLWTLG, NCNLWTSH: NLAAS and NCS-R– NCNSWTLG, NCNSWTSH: NSAL and NCS-R– NSNLWT : NLAAS and NSAL

• Derived as CPES weights but with final scaling step limited to proportion of sample in cell from the two studies.

Page 35: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3504/21/23

Sample Sample Weight Mean S.E. CV

NCSR Short NCSRWTSH 1.00 0.0054 52.28

NCSR Long NCSRWTLG 1.00 0.0127 95.82

NLAAS --- NLAASWGT 6333.6 86.77 93.42

NSAL --- NSALWTPN 8022.2 165.84 161.22

CPES Short CPESWTSH 10468.2 75.25 101.69

CPES Long CPESWTLG 12693.4 152.03 153.49

NCSR-NLAAS Short ncnlwtsh 15061.7 100.10 78.44

NCSR-NLAAS Long ncnlwtlg 20290.6 261.12 130.87

NCSR-NSAL Short ncnswtsh 13588.6 106.91 97.52

NCSR-NSAL Long ncnswtlg 17731.9 248.46 152.04

NLAAS-NSAL --- nsnlwt 19553.1 608.91 322.59

Table 1: Descriptive Statistics for CPES Study Weights

Page 36: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3604/21/23

Sample Sample size

Un-weighted Prevalence

% (s.e.)

Weighted Prevalence

% (s.e.)

White-Non-Hispanic

NCSRa 6696 17.95(0.47) 17.75(0.55)

CPESa 7567b 18.00(0.44) 17.92(0.54)

Black

NCSRa 1176 11.99(0.95) 11.29(1.04)

NSAL 3433 10.63(0.53) 10.31(0.58)

CPESa 4609b 10.98(0.46) 10.39(0.47)

Hispanic

NLAAS 2554 15.66(0.72) 13.79(0.68)

CPESa 3615 15.60(0.60) 13.71(0.63)

Table 2: Un-weighted and Weighted Prevalence of Major Depressive Disorder

a =short sample, b= excluding missing cases s.e.=standard error

Page 37: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3704/21/23

Sample Sample size

Un-weighted Prevalence

% (s.e.)

Weighted Prevalence

% (s.e.)

White-Non-Hispanic

NCSRa 6696 6.45(0.30) 6.16(0.31)

CPESa 7566b 6.25(0.28) 6.08(0.32)

Black

NCSRa 1176 4.34(0.59) 4.14(0.74)

NSAL 3430 3.53(0.31) 3.26(0.36)

CPESa 4606b 3.73(0.28) 3.48(0.34)

Hispanic

NLAAS 2554 3.48(0.36) 2.82(0.40)

CPESa 3615 3.76(0.32) 3.03(0.34)

Table 3: Un-weighted and Weighted Prevalence of Generalized Anxiety Disorder

a =short sample, b= excluding missing cases s.e.=standard error

Page 38: C P E S NIMH Collaborative Psychiatric Epidemiology Surveys CPES Sample Designs and Analysis Weights Steve Heeringa

C P E SC P E S

NIMH Collaborative Psychiatric Epidemiology Surveys 3804/21/23

Sample Sample size

Un-weighted Prevalence

% (s.e.)

Weighted Prevalence

% (s.e.)

White-Non-Hispanic

NCSRa 4180 10.26(0.47) 6.84(0.54)

CPESb 4180b 10.26(0.47) 6.92(0.53)

Black

NCSRa 679 12.37(1.26) 7.4(0.86)

NSAL 3419c 9.15(0.49) 9.10(0.54)

CPESa 4098c 9.69(0.46) 8.76(0.49)

Hispanic

NLAAS 2554 5.29(0.44) 4.42(0.45)

CPESa 3259c 6.17(0.42) 4.74(0.39)

Table 4: Un-weighted and Weighted Prevalence of Post Traumatic Stress Disorder

a =long sample, b= only assessed in NCS-R , c= excluding missing cases s.e.=standard error