the weighting strategy of the canadian community health survey cathlin sarafin methodologist...

53
The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Upload: edwina-riley

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Weighting Strategy of the Canadian Community Health

Survey

Cathlin SarafinMethodologist

Statistics Canada

March 25, 2008

Page 2: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Outline

Introduction Methodology The Canadian Community Health Survey (CCHS)

The Multiple Frames

The Weighting Strategy of the CCHS

Methodology Recruitment Process

Page 3: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Introduction

Methodology Structure: You

Recruits are called Junior Methodologists

Your Unit 2 to 7 Methodologists supervised by one Senior Methodologist

Your Section3 to 6 units working on related projects, managed by a Chief

Your Division A division has roughly 100 people, usually all together on one

floor of the building

Page 4: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Introduction

Every person has their own responsibilitiesSenior Methodologist outlines tasksDiscuss options and approaches as a team

Page 5: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Introduction

Variance estimation Data quality indicators Record linkage Time series Data analysis Disclosure control Research and development

Survey Methodology:

Frame creation Sampling Questionnaire design Data collection methods Data processing Edit and imputation Weighting and estimation

Page 6: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The CCHS

Collects general health information on the Canadian population

Estimates produced for more than 120 Health Regions (HRs) across Canada

Produces estimates on: Health Risk Factors Health Status Health Care Services

Page 7: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The CCHS

The CCHS was introduced in 2000 Data was collected every second year for a total

sample size of 130,000 per year

It was redesigned in 2007 Data is now collected continuously for a total

sample size of ≈ 65,000 respondents per year Annual files are released Multi-year files will be produced starting in 2009

Page 8: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The CCHS

A cross-sectional survey Survey a specific population for

a given period of time

A longitudinal survey Survey a specific population

repeatedly over time

Page 9: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The CCHS

Target population: Individuals living in private dwellings aged 12

years old and over Exclusions: those living on Indian Reserves

and Crown Lands, residents of institutions, full-time members of the Canadian Forces and residents of some remote areas

CCHS covers ~98% of the Canadian population

Page 10: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The CCHS

Has a complex, multi-stage, dual frame design Area frame (49%) Telephone list frame (50%) Random digit dialing (RDD) frame (1%)

The telephone frames compliment the area frame in most HRs

Page 11: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

Units are geographical areas Target sampling units are not listed

Based on Labour Force Survey (LFS) design 6 rotation groups Stratified probability proportional to size sample of

clusters Systematic sample of dwellings

Random selection of a start

Probabilistic sample of one individual per household

Page 12: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

Str

atu

m #

1S

trat

um

#2

1. Each province is divided into geographic strata

2. Clusters selected within strata (PPS sampling) 1st stage

3. Dwellings selected within clusters (systematic sampling) 2nd stage

4. People selected within responding

dwellings 3rd stage

Province XYZProvince XYZ

LFS Sample Selection

Page 13: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

Why use such a design? Stratification:

Better coverage of the entire region of interest

Increases precision

Clustering: Efficient for interviewing (less travel, less costly)

Decreases precision

Page 14: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

The CCHS selection process: The LFS provides a list of available starts

(systematic samples) within each cluster The clusters are mapped to the CCHS HRs A random selection of starts is chosen within

a HR Probabilistic sample of one individual per

household

Page 15: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

2-phase sample 1st phase is the LFS sample of starts within

the LFS strata 2nd phase is the CCHS sample of starts within

the HRs

Page 16: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Area Frame

Why use the LFS? No adequate list of addresses available Costly to create and maintain such a frame LFS has good coverage of target population It is a monthly sample conducted at Statistics

Canada Continually updated

Page 17: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Telephone Frame

List of telephone numbers from across Canada

Created using InfoDirect© files

Stratified by HR

SRSWOR sample of phone numbers

Probabilistic sample of one individual per household

Page 18: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The RDD Frame

Phone numbers are grouped into banks

Banks are assigned to a HR

Computer randomly generates the last 2 numbers

Probabilistic sample of one individual per household

Page 19: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Dual Frame Design

Multiple frames are used to: Improve the coverage of the target population Reduce costs

Area Frame Covers target population Costly to implement

Listing costs Face-to-face interview costs

Page 20: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Dual Frame Design

Telephone Frame Only covers population with listed phone

numbers Undercoverage may bias the estimates Growing problem with the increasing popularity of

cell phones Less costly to implement

Calls made from regional offices

Page 21: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Dual Frame Design

RDD Frame Inefficient

Results in a large amount of out-of-scope numbers

Used alone for 2 northern regions LFS is not adequate for these 2 regions

Used as a complement to the area frame in Whitehorse and Yellowknife Quality of telephone frame is considered poor

in these regions

Page 22: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

The Weighting Strategy of the CCHS

Area Frame

A4 - Household nonresponse

A3 - Out-of-scope dwellings

A2 - Stabilization

A1 – Sub-cluster adjustment

A0 – Initial weight

Telephone Frame

T4 - Multiple phone lines

T3 - Household nonresponse

T2 - Out-of-scope numbers

T1 - Number of collection periods

T0 - Initial weight

Final CCHS Weight6

Combined Frame

I5 - Calibration

I4 - Winsorization

I3 – Person nonresponse

I1 - Integration

I2 – Person selection

Page 23: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Sampling Weights

Number of people in the population represented by the interviewed person Ex: wi = 500

Can be broken down into 3 major steps: Design weights Nonresponse adjustment Calibration

Page 24: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Design Weights

Weights determined by the design of the survey

They are the inverse of the inclusion probability A person selected according to a sampling fraction of

1% will have a weight of 1/0.01 = 100

The design weights in the CCHS are calculated separately for each frame

Sampling fractions differ between HRs, therefore design weights are not uniform

Page 25: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

List Frame Design Weights

The sample is stratified by HR, so weights are calculated within HR

It is an SRSWOR of phone numbers

Probability of selection within HR g is

g

gi N

n

Page 26: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Area Frame Design Weights

The LFS is redesigned every 10 years A sample 20 year sample plan created

The LFS provides a list of available starts Typically consists of 40 columns and 6 rows

per LFS stratum Each row represents a rotation group Each column represents a monthly LFS sample

Page 27: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Area Frame Design Weights

LFS

Stratum

Rotation Cluster Start Cluster Start Cluster Start

50 1 1 1 1 2 1 3

50 2 2 4 2 5 3 6

50 3 7 8 7 9 7 10

50 4 6 1 6 2 4 3

50 5 9 4 9 5 9 6

50 6 5 16 5 12 5 13

One LFS sample

Page 28: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Area Frame Design Weights

The LFS provides a weight for one LFS sample A weight for every start in one column

This weight is used to assign a weight to all available starts

The weights are then redistributed to the CCHS selected starts within each HR

S

RWW

lfss

Page 29: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Nonresponse Adjustments

The design weights are corrected for total nonresponse (NR) All the variables for the respondent are missing

Complete refusal

Unable to contact the respondent

Respondent absent for the duration of the survey

language barrier

Information obtained is unusable

Page 30: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Nonresponse Adjustments

There are 2 types of NR in the CCHS Household level Person level

The weights of the nonrespondents have to be redistributed to the respondents Form groups based on auxiliary information

Page 31: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

NR Adjustments

There are several methods available for the creation of response homogeneity groups (RHGs)

The CCHS uses the scoring method Logistic regression is used to obtain a

probability of response ( ) for every unit Groups are formed based on the values of p

p

Page 32: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

NR Adjustments

Logistic Regression Models Variables include geographic information,

process data and socio-economic indicators Variables derived from process data include:

Number of attemptsTime/day of attemptCalled on weekday/weekend

Page 33: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

NR Adjustments

Initial groups are formed using a clustering algorithm in SAS

These groups are then collapsed to ensure: A response rate of at least 50% At least 20 observations

The adjustment within each RHG is

r

iiD

n

iiD

NR

W

Wa

1

1

Page 34: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration of Frames

Area Frame

Telephone Frame

No phone line

Unlisted phone number

Listed phone number

Page 35: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration of Frames

Area Frame Population = A

Sample = SA

Telephone Frame Population = B

Sample = SB

BA SAB

SAB YY ˆ1ˆYint

Page 36: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration

Integration factor:

A number between 0 and 1 For CCHS it is based on sample size

BA

A

nn

n

Page 37: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration

Parameter of interest:

Unbiased estimates

BSABYE ˆYE AS

AB

Page 38: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration

Composite estimation

BSABY1YE AS

AB

BSABYE1YE AS

AB

Page 39: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration of Frames

Possible to integrate only the overlapping populations covered by the 2 frames

Problem identifying the overlapping portion for the area frame due to nonresponse Possible to impute these cases

BAA SAB

SAB

SA YYY ˆ1ˆˆYint

Page 40: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration of Frames

Area Frame

Telephone Frame

SB

SAB

SA

SAU

Page 41: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Integration of Frames

Logistic regression is used to assign a probability of belonging to the non-common part SA

The final integration method is

BAA SAB

SAB

SA YYpYp ˆ1ˆ)1(ˆYint

Page 42: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Calibration

Weights are adjusted to match population projection counts Based on the Census Adjusted to account for births, deaths, immigration

and emigration

The rounded average of the monthly projection counts is used within each post-stratum

Page 43: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Calibration

Why is calibration used? Gives confidence when estimating totals Improves precision of the estimates

If auxiliary variables are well correlated to the survey variables

Adjusts for coverage inadequacies when the survey population differs from the target population

Page 44: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Calibration

In the CCHS All post-strata with at least 20 observations are

calibrated at the HR by age by sex levelHR: 120 across CanadaAge groups: 12-19, 20-29, 30-44, 45-64 and 65+

Sex: Male and Female

Page 45: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Calibration

Age Group

Number of Observations

12-19 15

20-29 40

30-44 53

45-64 18

65+ 31

Age Group Number of Observations

12-19 25

20-29 40

30-44 53

45-64 22

65+ 31

Females MalesExample: HR 2Post-strata = HR by age by sex Post-strata = HR by sex Post-strata = Prov by age by sex

Page 46: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Final Weights

Master: Contains all variables for all respondentsShare: Contains all variables for the subset of people who agreed to share (subset of records)PUMF: Contains a subset of variables for all respondents (subset of variables)Dummy: Contains a subset of records from the master file. Scrambled data used for testing and remote access purposesBootstrap: Created for variance estimation purposesSpecial Requests: linkage, different geographies, etc.

Page 47: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Methodology

Typical tasks: Write computer programs to solve problems or

explore data Attend meetings Write documentation Present our work at seminars Work on different committees

Page 48: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Methodology

Working Conditions Permanent job Continuous learning:

Computer courses Statistics and methodology courses Language courses Seminars, conferences and publications

Page 49: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Methodology

All methodologists work at the Head Office in Ottawa

Page 50: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Recruitment

Our recruitment campaign takes place each fall

Detailed presentations at the Universities by early October

It is a 3 step process: On-line application

Starts in September Deadline in mid-October

Written Exam Early November

Interview January

Page 51: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Recruitment

Who can apply? Persons residing in Canada and Canadian

citizens residing abroad Preference will be given to Canadian citizens

Bilingualism No preference is given to those who speak both

English and French

Page 52: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

For more information please contact

www.statcan.ca Under:

About Us Employment opportunities Mathematical statisticians (MA)

Email: [email protected]

Telephone: 1-888-321-3089

Page 53: The Weighting Strategy of the Canadian Community Health Survey Cathlin Sarafin Methodologist Statistics Canada March 25, 2008

Thank you

[email protected]

Canadian Community Health Survey [email protected]