cognitive evaluation of survey instruments: state of the ... · cognitive evaluation of survey...
TRANSCRIPT
Cognitive Evaluation of Survey Instruments: State of the Science (Art?) and
Future Directions Gordon Willis, PhD
Applied Research Program Division of Cancer Control and Population Sciences
National Cancer Institute [email protected]
Presented at NSF: November 8, 2012
Gordon Willis NSF 2012 1
Context: Total Survey Error
• Total Survey Error perspective (Groves & Lyberg, 2010; Weisberg, 2005): – Coverage Error – Sampling Error – Nonresponse Error – Measurement Error due to Interviewer – Measurement Error due to Respondents – Post-survey Error (coding/analysis)
Gordon Willis NSF 2012 2
Context: Total Survey Error
• Total Survey Error perspective (Groves & Lyberg, 2010; Weisberg, 2005): – Coverage Error – Sampling Error – Nonresponse Error – Measurement Error due to Interviewer – Measurement Error due to Respondents – Post-survey Error (coding/analysis)
“Response Error” – controllable through question(naire) design
Gordon Willis NSF 2012 3
Test
Analyze
Revise
Unresolved Issues:
• Is Cognitive Testing Reliable/Valid? – Overall effectiveness – Parametric evaluation
• Is it useful for The Survey of the Future? – Mixed modes/Novel administration methods – Cross-cultural applications
Gordon Willis NSF 2012 10
Research Question: Are the Results of Cognitive Interviewing Reliable?
• FACT: Cognitive testing to pretest/evaluate survey questionnaires is widespread (gov’t laboratories)
• PROBLEM: We don’t know if independent practitioners/labs testing the same questionnaire would come to the same conclusions
• PREVIOUS RESEACH: – Rothgeb, Willis, & Forsyth (2001) – In 3-lab comparison, found
same questions problematic, but for different reasons – DeMaio & Landreth (2004) – Which of three C.I. teams did
best? Not clear – but found that listening to recordings is very helpful
– Willis, Han, Miller, Levin, Kudela, Miller, Willson, Whitaker, & Zahnd (2010); 4-lab comparison ->
NCI / Westat / NCHS / Public Health Institute Parallel Cognitive Interviewing Study
• Functioning of the tested self-administered questionnaire [ Perception of breast/prostate cancer risk ] was unknown, in advance
• Four “Labs” conducted parallel testing across multiple cultural/linguistic groups (148 interviews, in English, Spanish, Chinese, Korean)
• We determined whether the written results and recommendations were similar between groups/labs, or wildly different
Number of Cognitive Interviews, by Lab and by Language
English Spanish Chinese Korean TOTAL
NCI 16 9 0 0 25
Westat 18 36 9 9 72
NCHS 15 0 0 0 15
PHI 18 0 0 18 36
TOTAL 67 45 9 27 148
How likely this is?
How much this has occurred?
Something else? (other than “How concerned I am”)
“It would do Very much to my body” “I have Somewhat of a chance of dying from breast cancer”
Results: This finding occurred for ALL labs, ALL populations
• Summary notes from every Lab-by-Language group combination revealed a consistent theme:
– The questionnaire approach did not measure perceptions of degree of ‘Concern’ about (X), because the critical element of Concern was ignored
• In NO case was this predicted prior to cognitive testing • Constitutes evidence of reliability of independent
cognitive interviewing tests • Opposite results have also been reported
(Miller et al., NCHS) • So, The issue still needing study is: “Under what
conditions are C.I. results reliable, and what do we need to do to enhance those conditions?”
Unresolved Issues:
• Is Cognitive Testing Reliable/Valid? – Overall effectiveness – Parametric evaluation
Gordon Willis NSF 2012 21
“…additional interviews continued to produce observations of new problems, although the rate of new problems per interview decreased” (p. 654)
Unresolved Issues:
• Is Cognitive Testing Reliable/Valid? – Overall effectiveness – Parametric evaluation
• Is it useful for The Survey of the Future? – Mixed modes/Novel administration methods – Cross-cultural applications
Gordon Willis NSF 2012 28
Unresolved Issues:
• Is it useful for The Survey of the Future? – Mixed modes/Novel administration methods
• C.I. has a history of attention to administration mode (Interviewer-based versus Self-Administered)
• We have increasingly focused on web usability • New research is being conducted in Skype/Internet C.I.
(Jennifer Edgar, Bureau of Labor Statistics)
Gordon Willis NSF 2012 29
Unresolved Issues:
• Is it useful for The Survey of the Future – Mixed modes/Novel administration methods – Cross-cultural applications
Gordon Willis NSF 2012 30
How can we address issue of Cross-Cultural Comparability?
• Hugely important, given diversity and need for comparative estimates – With 2 groups: Need to avoid apples-to-oranges – With 5 groups (Non-Hispanics, Hispanics, Chinese,
Vietnamese, Koreans): Avoid apples-to-oranges-to- pears-to-guanabana-to-rambutan…
• Can we use our evaluation and pretesting techniques to obtain cross-cultural comparability?
• Issues tend to be both (a) cultural and (b) linguistic • Even basic translation can go very badly, if we ignore
good evaluation/pretesting practice -> Gordon Willis NSF 2012 31
Methodological research into cross-cultural applications
• Do cognitive interviews ‘function’ similarly for Asians, Hispanics? (Pan; Goerman -- US Census Bureau)
• What types of modifications to analysis procedures are necessary for multi-country study (Miller, National Center for Health Statistics)
• For Behavior Coding (observe interviews and listen for problems in the interaction): – Do cultural differences preclude comparability? – Especially: Do groups vary in willingness to say “that makes
no sense!” – Critical study by Johnson, Holbrook, Cho, Shavitt, Chávez &
Weiner (2011) – (NSF/NIH sponsored) ->
Gordon Willis NSF 2012 33
Johnson, et al.: Behavior Coding Study Question Comprehension
Difficulty (%)
Nonexistent policies or objects
In the past 10 years, how frequently have you visited a serrerium? 60.8%
Do you support or oppose a law to ban the import of fotams into the U.S.? 82.6%
Nonexistent policies or objects: In the past 10 years, how frequently have you visited a serrerium?
56.7%
73.4%
54.5% 61.2%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
African American Korean American Mexican American
White
Comprehension (p=.054)
Nonexistent policies or objects: Do you support or oppose a law to ban the import of fotams into the U.S.?
76.3%
91.1%
79.2% 85.4%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
African American Korean American Mexican American
White
Comprehension (p=.044)
To reiterate/proselytize: Issues that I believe should be studied
(funded):
• Is Cognitive Testing Reliable/Valid? – Overall effectiveness – Vital for continued support – Parametric evaluation – To develop best practices
• Is it useful for The Survey of the Future? – Mixed modes/Novel admin methods – To stay current – Cross-cultural applications – For ‘ecological validity’
Gordon Willis NSF 2012 37