cognitive evaluation of survey instruments: state of the ... · cognitive evaluation of survey...

Cognitive Evaluation of Survey Instruments: State of the Science (Art?) and

Future Directions Gordon Willis, PhD

Applied Research Program Division of Cancer Control and Population Sciences

National Cancer Institute [email protected]

Presented at NSF: November 8, 2012

Gordon Willis NSF 2012 1

Context: Total Survey Error

•  Total Survey Error perspective (Groves & Lyberg, 2010; Weisberg, 2005): –  Coverage Error –  Sampling Error –  Nonresponse Error –  Measurement Error due to Interviewer –  Measurement Error due to Respondents –  Post-survey Error (coding/analysis)


Context: Total Survey Error

•  Total Survey Error perspective (Groves & Lyberg, 2010; Weisberg, 2005): –  Coverage Error –  Sampling Error –  Nonresponse Error –  Measurement Error due to Interviewer –  Measurement Error due to Respondents –  Post-survey Error (coding/analysis)

“Response Error” – controllable through question(naire) design


Test

Analyze

Revise

Unresolved Issues:

•  Is Cognitive Testing Reliable/Valid? –  Overall effectiveness –  Parametric evaluation

•  Is it useful for The Survey of the Future? –  Mixed modes/Novel administration methods –  Cross-cultural applications


Research Question: Are the Results of Cognitive Interviewing Reliable?

•  FACT: Cognitive testing to pretest/evaluate survey questionnaires is widespread (gov’t laboratories)

•  PROBLEM: We don’t know if independent practitioners/labs testing the same questionnaire would come to the same conclusions

•  PREVIOUS RESEACH: –  Rothgeb, Willis, & Forsyth (2001) – In 3-lab comparison, found

same questions problematic, but for different reasons –  DeMaio & Landreth (2004) – Which of three C.I. teams did

best? Not clear – but found that listening to recordings is very helpful

–  Willis, Han, Miller, Levin, Kudela, Miller, Willson, Whitaker, & Zahnd (2010); 4-lab comparison ->

NCI / Westat / NCHS / Public Health Institute Parallel Cognitive Interviewing Study

•  Functioning of the tested self-administered questionnaire [ Perception of breast/prostate cancer risk ] was unknown, in advance

•  Four “Labs” conducted parallel testing across multiple cultural/linguistic groups (148 interviews, in English, Spanish, Chinese, Korean)

•  We determined whether the written results and recommendations were similar between groups/labs, or wildly different

Number of Cognitive Interviews, by Lab and by Language

English Spanish Chinese Korean TOTAL

NCI 16 9 0 0 25

Westat 18 36 9 9 72

NCHS 15 0 0 0 15

PHI 18 0 0 18 36

TOTAL 67 45 9 27 148

How likely this is?

How much this has occurred?

Something else? (other than “How concerned I am”)

“It would do Very much to my body” “I have Somewhat of a chance of dying from breast cancer”

Results: This finding occurred for ALL labs, ALL populations

•  Summary notes from every Lab-by-Language group combination revealed a consistent theme:

–  The questionnaire approach did not measure perceptions of degree of ‘Concern’ about (X), because the critical element of Concern was ignored

•  In NO case was this predicted prior to cognitive testing •  Constitutes evidence of reliability of independent

cognitive interviewing tests •  Opposite results have also been reported

(Miller et al., NCHS) •  So, The issue still needing study is: “Under what

conditions are C.I. results reliable, and what do we need to do to enhance those conditions?”

Unresolved Issues:



“…additional interviews continued to produce observations of new problems, although the rate of new problems per interview decreased” (p. 654)

Unresolved Issues:


•  Is it useful for The Survey of the Future? –  Mixed modes/Novel administration methods –  Cross-cultural applications


Unresolved Issues:

•  Is it useful for The Survey of the Future? –  Mixed modes/Novel administration methods

•  C.I. has a history of attention to administration mode (Interviewer-based versus Self-Administered)

•  We have increasingly focused on web usability •  New research is being conducted in Skype/Internet C.I.

(Jennifer Edgar, Bureau of Labor Statistics)


Unresolved Issues:

•  Is it useful for The Survey of the Future –  Mixed modes/Novel administration methods –  Cross-cultural applications


How can we address issue of Cross-Cultural Comparability?

•  Hugely important, given diversity and need for comparative estimates –  With 2 groups: Need to avoid apples-to-oranges –  With 5 groups (Non-Hispanics, Hispanics, Chinese,

Vietnamese, Koreans): Avoid apples-to-oranges-to- pears-to-guanabana-to-rambutan…

•  Can we use our evaluation and pretesting techniques to obtain cross-cultural comparability?

•  Issues tend to be both (a) cultural and (b) linguistic •  Even basic translation can go very badly, if we ignore

good evaluation/pretesting practice -> Gordon Willis NSF 2012 31

Methodological research into cross-cultural applications

•  Do cognitive interviews ‘function’ similarly for Asians, Hispanics? (Pan; Goerman -- US Census Bureau)

•  What types of modifications to analysis procedures are necessary for multi-country study (Miller, National Center for Health Statistics)

•  For Behavior Coding (observe interviews and listen for problems in the interaction): –  Do cultural differences preclude comparability? –  Especially: Do groups vary in willingness to say “that makes

no sense!” –  Critical study by Johnson, Holbrook, Cho, Shavitt, Chávez &

Weiner (2011) – (NSF/NIH sponsored) ->


Johnson, et al.: Behavior Coding Study Question Comprehension

Difficulty (%)

Nonexistent policies or objects

In the past 10 years, how frequently have you visited a serrerium? 60.8%

Do you support or oppose a law to ban the import of fotams into the U.S.? 82.6%

Nonexistent policies or objects: In the past 10 years, how frequently have you visited a serrerium?

56.7%

73.4%

54.5% 61.2%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

African American Korean American Mexican American

White

Comprehension (p=.054)

Nonexistent policies or objects: Do you support or oppose a law to ban the import of fotams into the U.S.?

76.3%

91.1%

79.2% 85.4%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

African American Korean American Mexican American

White

Comprehension (p=.044)

To reiterate/proselytize: Issues that I believe should be studied

(funded):

•  Is Cognitive Testing Reliable/Valid? –  Overall effectiveness – Vital for continued support –  Parametric evaluation – To develop best practices

•  Is it useful for The Survey of the Future? –  Mixed modes/Novel admin methods – To stay current –  Cross-cultural applications – For ‘ecological validity’


cognitive evaluation of survey instruments: state of the ... · cognitive evaluation of survey...

Documents