combining a probability based telephone sample with an opt-in web panel

29
1 icfi.co m | Combining a Probability Based Telephone Sample with an Opt-in Web Panel Randal ZuWallack James Dayton Naomi Freedner- Maguire ICF International Katherine J. Karriker-Jaffe, PhD Thomas K. Greenfield, PhD Alcohol Research Group, Public Health Institute

Upload: icf-international

Post on 04-Aug-2015

122 views

Category:

Data & Analytics


0 download

TRANSCRIPT

1icfi.com |

Combining a Probability Based Telephone Sample with an Opt-in Web Panel

Randal ZuWallack

James Dayton

Naomi Freedner-Maguire

ICF International

Katherine J. Karriker-Jaffe, PhD Thomas K. Greenfield, PhD

Alcohol Research Group, Public Health Institute

2icfi.com |

Outline

• National Alcohol Survey

• Motivation

• Data Fusion

• Data and Methods

• Results

• Summary

3icfi.com |

Acronyms to Know

• RR – Response Rate• NAS – National Alcohol Survey• N1-N14 = the iteration of the NAS (N1 = the first iteration, N14 = the

14th iteration)• CATI – Computer Assisted Telephone Interview• RDD – Random Digit Dial• F2F = Face to Face interview

4icfi.com |

National Alcohol Survey (NAS)

• Adults ages 18 and older in the United States to measure a number of alcohol related topics

• Questionnaire:– Alcohol consumption and behavior– Effects of alcohol on individual lives and the lives of others– Perceptions about alcohol– Emotional well-being– …and more.

• Length averages 45 min

5icfi.com |

Motivation

1. RDD CostRR • Decrease in coop• Decrease in contacts

Cell phones

$$

2000 2015

Cell phones

6icfi.com |

Motivation

2. Survey length

Current drinkers: 48 min

Former drinkers: 41 min

Abstainers: 34 min

0 10 20 30 40 50 600%

10%

20%

30%

40%

50%

Break-off Rate

Current Drinkers Abstainers Former Drinkers

Survey length (Minutes)

Drinker status determined

= average length for partial versus midterminate

7icfi.com |

Motivation

3. Modernizing NAS

1965

1970

1975

1980

1985

1990

1995

2000

2005

2010

2015

2020

First NAS “N1”1964F2F

N102000RDDCATI

N122009Dual frameCATI

N142019??

New technologies

Emerging methods

“Fit-for-purpose”

8icfi.com |

Proposed approach

• Combine probability-based RDD with nonprobability Web panel using data fusion (aka statistical matching)

Cost

Not ALL IN on Web

• Why Web panel?

• Why data fusion?

• IDEA: 1. Use a probability based RDD to identify who’s who in the

population2. Use Web panel to measure behaviors, attitudes (what, where,

when?)

9icfi.com |

Data Fusion

18+ adult

Current drinker

Wine drinker

Beer drinker

Spirits drinkerFormer drinker

Abstainer

üAll that apply

Who? What? Where? When?

WineBeer

Spirits18+ adult

Current drinker

Former drinker

Abstainer

• RDD measures population—%wine drinkers, etc.• Web measures depth of information conditional on who’s who

10icfi.com |

Data Fusion

Survey A

X, Y

Survey B

X, Z

XA = XB

Matched data

X, Y, Z

• Critical: Conditional independence– Y is independent of Z given X

11icfi.com |

Data

• NAS N13 extract– Dual-frame RDD, CATI– Average interview length: nearly 45 minutes – National sample, oversamples in geos with high black or Hispanic

densities – Data collection ongoing since October 2014. Data extracted on

December 23, 2014– 3358 completed interviews (1336 cell phone, 2022 landline).

• NAS Web experiment– Shortened version of the N13 questionnaire focused on alcohol

consumption. – Average interview length: 20 minutes– Conducted Jan 23-27 2014 by Schlesinger Associates (

http://www.schlesingerassociates.com/online_solutions.aspx) – 841 completed surveys

12icfi.com |

Methods

• Focus on – Current drinkers

• 1932 interviews out of 3358 (57%)

• 657 out of 841 (78%)

– Alcohol consumption and behaviors

• How often drinking wine in past 12 months• How often drinking beer in past 12 months• How often drinking spirits in past 12 months• How often drinking any alcohol in past 12 months• Typical number of drinks when drinking wine• Typical number of drinks when drinking spirits• Typical number of drinks when drinking beer• Number of times drinking 12 drinks• Number of times drinking 8-11 drinks• Number of times drinking 5-7 drinks• Number of times getting drunk• Maximum number of drinks in a single day• How often do you drink when spending a quiet evening

at home? • How often do you drink at bars, taverns, or cocktail

lounges? • How often do you drink when spending time with

friends in a public place, such as a park, street, or parking lot?

• How often do you drink at a party in someone’s home?

• Question 1: Are alcohol consumption and behaviors conditionally independent of other topics on the survey?

13icfi.com |

Conditional Independence

Effects of drinkingHelp for drinking problemDrug usePerceptions of drinkingDrinking injuries & illnessesEmotional healthEducationOther people’s drinkingStressful eventsNeighborhood characteristicsEthnic experiences

Correlation Analysis

1. We created 91 “other” variables (+16 drinking behavior variables)

2. Formed variable clusters—oblique principal component cluster analysis (SAS PROC VARCLUS) – Variables assigned to clusters based on

correlation with other variables in the cluster– Used Spearman rank correlation

14icfi.com |

Conditional Independence

• As expected, the 16 drinking behaviors clustered together• 1 perception: “How much do you agree or disagree with the following

statements…I drink to be sociable.” • 3 clusters:

– Beer consumption and heavy drinking– Wine consumption and home drinking– Social drinking (bars, with friends, at parties)

• “Other” variables formed 22 other clusters

15icfi.com |

Conditional Independence

3. Calculated partial correlations of drinking clusters with “other” clusters– Used 1st principal component scores for each cluster (outcome from

the clustering algorithm) – Imputation– 3 stages for partial correlations

1. Design variables

2. + Wine, beer, spirits indicators and demos and general health status

3. + Key drinking variables

Squared Correlations Mean Min Max

Design variables 0.0260 0.0000 0.1889

Design and demographics 0.0147 0.0000 0.1158

Design, demographics, and drinking variables

0.0069 0.0000 0.0452

Gender General healthAge Quality of lifeRace/ethnicity Physical activityTenure Beer DrinkerFT student under 30 Wine DrinkerPresence of children Spirits DrinkerMarital status Drink quiet eve at homeEducational attainment Kept drinking wanted to stopEmployment status

16icfi.com |

The Split

• Linking variables (X): Wine, beer, spirits indicators and demos and general health status and key drinking variables

• RDD (Y): 52 total variables conditionally independent of drinking behaviors– “Other” clusters where 0 of the 3 partial correlations were significant at

0.05 level (7 clusters representing 28 variables)– Visual inspection: 7 more clusters (24 variables) were marginally

correlated (max<0.08)

• Web (Z): 55 total variables– 3 drinking behavior clusters (17 variables)– 8 clusters correlated with drinking behaviors (38 variables)

• Question 1: Are alcohol consumption and behaviors conditionally independent of other topics on the survey? YES and NO

• Question 2: Conditional on X, does Web = Phone (Zw=Zp)?

17icfi.com |

Web = Phone?

• We explore differences in phone response versus web response using adjusted means and frequencies

• Dependent: drinking variables– Independent:

• Controls--Wine, beer, spirits indicators and demos and general health status and key drinking variables

• Web or phone

• Examine differences between RDD phone or the Web panel after controlling for the matching variables?

18icfi.com |

Adjusted drinker distribution RDD vs Phone

• Differences in the distribution of drinker types—web consistently higher

• But drinker type will be a control in the matching

• What about consumption for these drinker types?

RDD Web RDD Web RDD WebWine Beer Spirits

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

79%86%

65%72%

67%76%

Drinker types

RDD

Web

19icfi.com |

<1 per mo

1 per mo 2-3 per mo

1 per wk 3-4 per wk or more

26.6%

20.5%19.9%

21.7%

11.3%

22.3%

19.1%20.3%

24.5%

13.8%

Frequency of beer drinking

<1 per mo

1 per mo 2-3 per mo

1 per wk 3-4 per wk or more

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

45.0%

31.0%

20.8%18.9%

16.9%

12.5%

23.3%

18.7%

19.9% 20.7%

17.4%

Frequency of wine drinking

Adjusted Frequencies

• Higher weekly consumption for Web vs RDD– Spirits (not shown) is same pattern

RDD

Web

20icfi.com |

Adjusted Means

<.0001 0.97330.00280.0195

RDD Web RDD Web RDD Web RDD WebWine Drnkrs Beer Drnkrs Spirits Drnkrs Max Drnks

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

2.012.27

2.572.75

2.382.56

3.51 3.51

Typical drinks on drink day

21icfi.com |

Adjusted Frequencies

• Question 2: Conditional on X, does Web = Phone (Zw=Zp)?YES AND NO

RDD

Web

12+drnks 8-11 drnks 5-7 drnks Drunk0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

2.6% 5.2%

14.8%

38.3%

1.9%4.7%

15.8%

54.0%

Heavy drinking in past 12 months

22icfi.com |

Matching

• Used propensity score matching due to small sample sizes and many combinations of age, race/ethnicity, educational attainment, etc.– Allowed multiple matches to Web, but only if Pscore<=0.01– Not all web cases used (distance to far)

• Ex.--Female wine drinkers: 831 RDD, 399 Web

Typical number of wine drinks

RDD 1.46

RDD/Web 1.79

Drank enough to feel drunk (past yr)

RDD 35.5%

RDD/Web 49.9%

<1 per mo

1 per mo 2-3 per mo

1 per wk 3-4 per wk or more

0.0%5.0%

10.0%15.0%20.0%25.0%30.0%35.0%40.0%45.0%

36.2%

19.0%16.6%

14.1% 14.1%

27.6%

13.2%18.0% 20.2% 21.0%

Frequency of wine drinkingRDD

RDD/Web

23icfi.com |

Matching

• Ex.—Male beer drinkers– 682 RDD, 165 Web

<1 per mo

1 per mo 2-3 per mo

1 per wk 3-4 per wk or more

0.0%5.0%

10.0%15.0%20.0%25.0%30.0%35.0%40.0%45.0%

19.9%17.6% 18.6%

26.4%

17.5%18.5%

12.0%14.7%

27.5% 27.3%

Frequency of beer drinking

Typical number of wine drinks

RDD 2.66

RDD/Web 2.66

Drank enough to feel drunk (past yr)

RDD 52.8%

RDD/Web 65.4%

RDD

RDD/Web

24icfi.com |

Cost Benefit

RDD Survey length 45 25

Landline 0.265 0.375

Cell phone 0.175 0.205

Wt avg 40% cell 0.229 0.307

RDD CPI ($30 /hr) $131.00 $97.72

Web CPI $0.00 $7.50

Total CPI $131.00 $105.22

Savings $1.00 $0.80

Recruit more than needed for matching—assume 50% more for cost analysis.

25icfi.com |

Summary

• Data fusion is a model based process of putting two (or more) disparate data sets together if the are conditionally independent

• The second condition to our data fusion model is that the data sources must represent the same population– Web and phone responses are different even after adjusting for the

demographic differences and drinker types.• Is it mode effects or model failure?

• Benefits of this model:– Average interview length reduced– Considerable cost savings

26icfi.com |

Mode Effects

– No “January effect”—Phone data collected in fall 2014, web in Jan 2015.

B2c. And how often do you usually have beer or malt beverages?

02 More than once a day03 Once a day04 Nearly every day05 Three or Four times a week06 Once or twice a week07 Two or three times a month08 About once a month09 Less than once a month but at least once a year10 Less than once a year11 Never

– Order effects?

2015

• Higher consumption on the web– Higher frequency of drinking on web

Web—primacy

Phone—recency

– Social desirability?

27icfi.com |

– Social desirability?• self-administered vs interviewer-administered• direction and magnitude consistent with literature

Mode effects

• Frequency of getting drunk – +16 points on web

B21. How often in the last twelve months did you drink enough to feel drunk?

01 Every day or nearly every day02 Three to four times a week03 Once or twice a week04 Once to three times a month05 Less than once a month06 Once in those 12 months07 Never in those 12 months

28icfi.com |

Next Steps

• Confirm the conditional independence for the 2 variable sets– Based on a second larger sample

• Understand mode effects– Is Web doing a better job at measuring consumption?– Ask the NAS questions to a split sample: 50% CATI, 50% Web

• Must be same population

• Panel variance – Want to explore the consistency of the NAS measures

• Multiple samples from the same panel (within panel variability)• For different panels (between panel variability)

29icfi.com |

Acknowledgements

• Thanks to Schlesinger Associates for recruiting web panelists free of charge for this experiment– Special thanks to Svetla Ninova and Jason Horine from Schlesinger

Associates

• For more information, please contact:– [email protected]

• Visit: icfi.com/SurveyResearch