1 redefining the unit nonresponse adjustment cells for the survey of residential alterations and...
TRANSCRIPT
1
Redefining the Unit Nonresponse Adjustment Cells for the Survey of
Residential Alterations and Repairs (SORAR)
Laura T. Ozcoskun and
Katherine Jenny Thompson
Presented By Samson Adeshiyan
2
Outline
• Background
• The Problem
• The Authors’ Recipe for a Solution
• Some Empirical Results Interspersed
3
Survey of Residential Alterations and Repairs (SORAR) Background• Monthly data collection• Low unit response rates• Key item: Total Expenditures
• Maintenance and Repairs• Improvements
• Multi-stage sample of Housing Units (HUs)• Privately-owned vacant HUs (Vacant)• Rental and 5+ unit properties (Rental)
• Modified Half-Sample Variance Estimator
4
The Problem (Motivation)
• SORAR’s three-stage weighting procedure• Duplication control (field subsampling)• Unit non-response adjustment • Post-stratification adjustment
• Suspected that variables used to define unit nonresponse weighting cells not highly related to• Response propensity or• Cell means
5
Response Model
• “Quasi-Randomization” (Oh & Scheuren 1983)• Covariate dependent, missing-at-random (MAR) response
mechanism• Response propensity (p) is a random variable.
• Minimum requirements for weighting cells:1. Heterogeneous response propensities or
2. Heterogeneous cell means
• Optimal adjustment cells satisfy both conditions.
6
The Authors’ Recipe
• Determine Eligible Sets of Classification Variables
• Determine Uncollapsed Cells (Full Model)• Logistic Regression Analysis
• Determine Collapsed Cells (Reduced Model)• General Linear Hypothesis Tests• Relative Efficiency Diagnostic (MSE Ratios)• Time Series Plots of Adjustment Factors
7
Step 1: Find Sets of Classification Variables for Cells
• Respondent requirements per cell:• Actual Cell Size 5
• needed for logistic regression
• Effective “Sample” (cell) Size 5
• Categorical variables
8
Cell Sizes• Effective “Sample” (Cell) Size
• rp is the Actual cell size of cell p
• DEFFp is the design effect for item Y in cell p• indicates efficient design for item Y
p
pp DEFF
rr ~
pp rr ~
9
Candidate Cells (SORAR)• Candidate cell variables (categorical)
• Region (currently used)• Metropolitan Statistical Area (MSA) status
(currently used)• Tenure (Vacant/Rental)• Single-unit vs. Multi-unit
• Candidate cross classifications• Region/MSA Status/Single or Multi-Unit• Region/Tenure/Single or Multi-Unit
10
Step 2: Uncollapsed Cells (Full Model)
• Response Propensity Modeling
• Logistic Regression• Complex survey adaptations of Roberts, Rao,
and Kumar (1987) to test statistics
• Full and reduced (nested) models• Want all effects to be significant in full model• Would like to reject majority of nested models
11
Logistic Regression (SORAR)
• 18 months
• Separate full and reduced models for each month
• Between-cell covariance approximations = 0 (anti-conservative) = -0.25 = -0.50 (conservative)
12
Model 1: Region/MSA/Single or Multi-Unit
Hypothesis = 0 = -0.25 = -0.50
Rejected Not Rejected
Rejected Not Rejected
Rejected Not Rejected
REGION = MSA = HU =0 (Full) 18 0 18 0 18 0
REGION = MSA=0|HU
0 14 4 13 5 10 8
REGION = HU=0|MSA
0 18 0 18 0 18 0
MSA = HU=0|REGION
0 18 0 18 0 18 0
REGION = 0| MSA
0, HU 0 12 6 12 6 9 9
MSA = 0| REGION
0, HU 0 8 10 8 10 8 10
HU = 0| REGION
0, TEN 0 18 0 18 0 18 0
Very sensitive to correlation assumptionsIndicates necessity of including Single/Multi-Unit in
weighting cellsRegion and MSA less necessary given Single/Multi-Unit
13
Model 2: Region/Tenure/Single or Multi-Unit
Insensitive to correlation assumptions (change)Indicates necessity of including Single/Multi-Unit in
weighting cells (unchanged)Region and Tenure often necessary (change)
Hypothesis = 0 = -0.25 = -0.50
Rejected Not Rejected
Rejected Not Rejected
Rejected Not Rejected
REGION = TEN = HU =0 (Full) 18 0 18 0 18 0
REGION = TEN=0|HU
0 18 0 18 0 17 1
REGION = HU=0|TEN
0 18 0 18 0 18 0
TEN = HU=0|REGION 0 18 0 18 0 18 0
REGION = 0| TEN
0, HU 0 14 4 14 4 11 7
TEN = 0| REGION
0, HU 0 13 5 13 5 13 5
HU = 0| REGION
0, TEN 0 18 0 18 0 18 0
14
Step 3: Collapsed Cells (Reduced Model)
• General Linear Hypothesis Tests
• Relative Efficiency Diagnostic
• Time Series Plots of Estimated Nonresponse Adjustment Factors
15
General Linear Hypothesis Test
Hypothesis Tests• H0: and (collapse rows) • H0: and (collapse columns)
Not done with SORAR (cell estimates too variable)
2111 yy 2212 yy 1211 yy 2221 yy
Classification variable k
11y (cell 1) 12y (cell 2) Classification variable k’
21y (cell 3) 22y (cell 4)
16
Relative Efficiency DiagnosticMSE Ratios
• Modified from Eltinge and Yanasaneh (1997)• Definitions
approximately model-unbiased estimate under full model
model-biased estimate under a collapsed weighting
procedure
(under model assumption)
• Mean squared error ratio:
FY
CY
)ˆ(ˆ)ˆ(ˆFF YVYESM
)ˆ(ˆ)ˆ(ˆ)ˆ(ˆ 2CCC YBYVYESM
)ˆ(ˆ)ˆ(ˆ)ˆ(ˆ 2
F
CCC
YV
YBYV
17
SORAR MSE Ratios: Total Expenditures
• Tenure dropped: Median RH = 1.02
• HU Category dropped: Median RT = 0.93
• On average, RH is both greater than one and closer to one than RT
• Not terrifically compelling evidence for either collapsing
• How can values be less than 1?• Function of using empirical data
• Collapsed variances smaller or equivalent to uncollapsed variances
• Estimated bias often “negligible”
18
Time Series Plots of Adjustment Factors
• Visual, less statistical • Fewer assumptions
• Full procedure and collapsed procedure adjustment factors• Within region (SORAR)• Inverse of response propensities (SORAR)
19
Candidate Cells: Region by Single/Multi for Vacant Properties
• Original adjustment factors very different in scale
• Collapsed factors are far from both original factors
0
2
4
6
8
10
12
14
16
Vacant Single-Unit Property Factors Vacant Multi-Unit Property Factors
Collapsed Vacant Units
20
Candidate Cells: Region by Single/Multi for Rental Properties
• Original adjustment factors very different in scale
• Collapsed factors are far from both original factors (c.f. multi-unit factors)
0
2
4
6
8
10
12
14
16
Rental Single-Unit Property Factors Rental Multi-Unit Property Factors
Collapsed Rental Units
21
Candidate Cells: Region by Tenure for Single-Unit Properties
• Scale of original factors “similar” (compared to earlier slide)
• Collapsed factors different for single units
0
2
4
6
8
10
12
14
16
Vacant Single-Unit Property Factors Rental Single-Unit Property Factors
Collapsed Single Unit
22
Candidate Cells: Region by Tenure for Multi-Unit Properties
• Scale of original factors similar
• Collapsed factors similar to original factors
0
2
4
6
8
10
12
14
16
Vacant Multi-Unit Property Factors Rental Multi-Unit Property Factors
Collapsed Multi Unit
23
Final Recommendation (SORAR)
• Full weighting cells• Region/Tenure/Single or Multi-Unit
• Collapsed weighting cells• Region/Single or Multi-Unit• Region
24
Conclusion
• Started with a recipe• Model-development tools• Diagnostic tools
• Modified the recipe for our survey• Considered and dropped diagnostics (data-based)
• Ended up with a new main course• More statistically defensible unit nonresponse
adjustment cells.