new experiments on the design of complex survey questions
DESCRIPTION
New Experiments on the Design of Complex Survey Questions. Paul Beatty, National Center for Health Statistics Collaborators: Jack Fowler and Carol Cosenza, Center for Survey Research, University of Massachusetts-Boston. - PowerPoint PPT PresentationTRANSCRIPT
New Experiments on the Design of Complex Survey
QuestionsPaul Beatty, National Center for Health
Statistics
Collaborators:Jack Fowler and Carol Cosenza,
Center for Survey Research, University of Massachusetts-Boston
Optimal structure and presentation of explanatory material in survey questions Many survey questions are complex, particularly
on behavioral surveys This complexity is driven by:
The desire for very specific data points The need to collect data as efficiently as possible (i.e.
single questions if possible) A few common practices:
Presentation of material that follows the question mark The use of examples to illustrate complex concepts Detailed wording to capture relatively rare events
What alternatives do we have? Are they better?
Methods Split ballot experimentation in RDD survey
(n=425) Original questions drawn from federal health
surveys; we constructed alternative questions Do responses differ across versions? If so, can we judge which distribution is more plausible?
Behavior coding random subset of tape recorded interviews (n=313) How often were initial responses inadequate? How often do respondents interrupt the question? How often did interviewer do something more than just
read the question to get a response? How often did respondents ask for repeat, clarifications,
and so on?
Issue #1: Info after the question mark It is common for questions to apparently end but
then add some more material: In the past 12 months, how many times have you talked
to any health professional about your own health?
Issue #1: Info after the question mark It is common for questions to apparently end but
then add some more material: In the past 12 months, how many times have you talked
to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital.
Issue #1: Info after the question mark It is common for questions to apparently end but
then add some more material: In the past 12 months, how many times have you talked
to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital.
Concern: Do respondents pay adequate attention to this material? Failure to consider it could lead to under-reports.
Issue #1: Info after the question mark It is common for questions to apparently end but
then add some more material: In the past 12 months, how many times have you talked
to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital.
Concern: Do respondents pay adequate attention to this material? Failure to consider it could lead to under-reports.
Alternative: People talk to health professionals in person, over the
phone, or as a patient in a hospital. Including any of those, in the past 12 months how many times have you talked to a health professional about your own health?
Results– Experiment 1 V1 V2
Qualifier: (after q) (begin of q) signif
Contacts w/healthprof in 12 months 6.6 5.9 n.s.
(n=214) (n=206)
Initial resp inadeq 32.5% 25.5% n.s.Resp req help 20.0% 13.1% p<.1
(n=160) (n=153)
Issue #2: Related experiment– definition after the question mark Definitions are sometimes presented after the question
mark as well. For example: V1: Have any of your immediate blood relatives ever been told
by a doctor that they have diabetes? By "immediate blood relatives", we mean your parents, your children, and your brothers and sisters, whether or not they are still living.
Issue #2: Related experiment– definition after the question mark Definitions are sometimes presented after the question
mark as well. For example: V1: Have any of your immediate blood relatives ever been told
by a doctor that they have diabetes? By "immediate blood relatives", we mean your parents, your children, and your brothers and sisters, whether or not they are still living.
V2: The next question is about immediate blood relatives-by that, we mean your parents, your children, and your brothers and sisters, whether or not they are still living. Have any of your immediate blood relatives ever been told by a doctor that they have diabetes?
If the definition is easier to ignore in V1, respondents might
interpret “blood relatives” more broadly than intended, leading to (erroneously) higher reports in V1.
Results– Experiment 2 V1 V2
Definition: (after q) (begin of q) signif
Relative w/diabetes 42.6% 34.4% p<.1(n=209) (n=215)
Initial resp inadeq 7.2% 2.5% p<.1Interrupted 16.5% 0.6% p<.01Iwer intervention 9.2% 3.1% p<.05
(n=152) (n=159)
Issue #3: Administration of response categories Conventional wisdom dictates that you administer the
question before offering response categories: V1: The last time you went to see a doctor, which of the
following best describes the main reason for your visit? Medical treatment for a new condition Follow-up care for an existing condition Or, a routine checkup
But what if this design encourages respondents to gravitate toward the first seemingly acceptable response rather than considering the whole list?
Issue #3: Administration of response categories Conventional wisdom dictates that you administer the
question before offering response categories: V1: The last time you went to see a doctor, which of the
following best describes the main reason for your visit? Medical treatment for a new condition Follow-up care for an existing condition Or, a routine checkup
But what if this design encourages respondents to gravitate toward the first seemingly acceptable response rather than considering the whole list? V2: People schedule doctor visits for a variety of reasons,
including getting medical treatment for a new condition, follow-up care for an existing condition, or a routine checkup. Which of those best describes the main reason for your visit the last time you went to see a doctor?
Results– Experiment 3V1 V2
Response categories: (after Q) (before Q) signif
New condition 21.5% 23.6% n.s.Follow-up41.0% 34.6%Routine exam 37.4% 41.9%
--------------------(n=195) (n=191)
Initial resp inadeq 10.6% 23.2% p<.01---------------------(n=141) (n=142)
Issue #4: Examples vs. definitions to illustrate complex concepts Complex concepts such as “strenuous activity” are often
illustrated through examples: The next question is about strenuous tasks done around your
home. By "strenuous tasks," we mean things like shoveling soil in a garden, chopping wood, major carpentry projects, cleaning the garage, scrubbing floors, or moving furniture. In the past 30 days, on how many days did you do strenuous tasks in or around your home?
Although designed to express a range of possibilities, but we hypothesize that they have the opposite effect, focusing attention on a few specifics that might not be well chosen
We expect that a good definition will create higher reports and be easier to administer
However, previous attempts were not successful, presumably because our definition was too complex
Examples vs. definitions V1: The next question is about strenuous tasks done around
your home. By "strenuous tasks," we mean things like shoveling soil in a garden, chopping wood, major carpentry projects, cleaning the garage, scrubbing floors, or moving furniture. In the past 30 days, on how many days did you do strenuous tasks in or around your home?
V2: The next question is about strenuous tasks done around your home. By "strenuous tasks", we mean any chores or projects that made you feel very tired by the time you finished them. In the past 30 days, on how many days did you do strenuous tasks in or around your home?
Results– Experiment 4V1 V2(example) (def) signif
Strenuous activ/mo.4.9 3.9 n.s.Reported “zero times” 29.3% 37.7% p<.1
(n=208) (n=215)
Initial resp inadeq 27.0% 25.1% n.s.(n=153) (n=159)
Issue #5: Question wording to capture rare events One reason questions are very complex is that
their authors want to prompt respondents to think of a broadly inclusive range of situations: In the past 12 months, how many times have you seen
or talked on the telephone about your physical or mental health with a family doctor or general practitioner?
The practice has a downside: respondents may lose track of the forest for the trees
Cognitive interview evaluation of the question above suggested that respondents thought it was exclusively about telephone contact with doctors.
If true, the question would generate significant undercounts.
A simplified comparison “The next question is specifically about
primary care doctors….” V1: In the past 12 months, how many times
have you seen or talked on the telephone with a primary care doctor about your health?
V2: In the past 12 months, how many times have you seen or talked with a primary care doctor about your health?
The only difference between these two questions is the inclusion of “on the telephone.”
Results– Experiment 5V1 V2(telephone) (no phone) signif
Mean contacts 3.4 3.6 n.s.“Zero” responses 24.7% 9.2% p<.01
(n=194) (n=195)
Initial resp inadeq 14.9% 21.1% n.s.Resp req help 5.7% 11.3% p<.1
(n=120) (n=121)
Issue #6: Question decomposition Food consumption example:
“During the last 30 days, how many times did you eat cheese, including cheese as snacks, and cheese in sandwiches, burgers, lasagna, pizza, or casseroles? Do NOT count cream cheese.”
Issue #6: Question decomposition Food consumption example:
“During the last 30 days, how many times did you eat cheese, including cheese as snacks, and cheese in sandwiches, burgers, lasagna, pizza, or casseroles? Do NOT count cream cheese.”
Clearly a challenging response task in general; we had little confidence in accuracy of reports
Cognitive testing: when probed about details… “did you include cheese in other dishes/sandwiches/etc? (If no), “would that have changed your overall answer?”
…some participants increased their reports
Question decomposition (2) Alternative: multiple, response tasks divided into
reasonable sub-components:The next questions are about cheese you have eaten in the last
30 days. Please do NOT include any cream cheese you may have eaten.
During the last 30 days, how many times have you eaten cheese on a sandwich, including burgers?
During the last 30 days, how many times have you eaten cheese in lasagna, pizza, casseroles, or mixed in with other dishes?
During the last 30 days, how many times have you eaten cheese as a snack or appetizer?
Results– Experiment 6 (responses)
V1 V2(single) (multi) signif
Mean times 13.9 19.0 p<.01(n=218) (n=228)
Results– Experiment 6 (behavior coding) The individual “decomposed” questions
consistently outperform the single-item on virtually all measures
Orig Alt1 Alt2 Alt3Inadeq init resp 15.9 9.9 8.3 3.1Probes used 13.7 7.8 6.3 2.1Req help/repeat 19.1 15.1 3.1 2.1
(all expressed as %; most signif at p<.05)
Some other considerations Mean time to administer original was 28 seconds; mean for
alternative was 51 seconds If we actually compare amount of probing, inadequate
responses, etc. to reach our desired data points (i.e., through three questions) the rates of behavior coding become very similar For example: 13.5% of original questions were probed; 15.1%
of the alternative series was ever probed Some research suggests that responses to decomposed
questions are less accurate (but…) Next steps: split ballot experiment on various food and
exercise questions (global vs. decomposed) with diary validation
Conclusions Qualifiers and definitions that dangle after the
question mark should be avoided– provided there is a reasonable way to do so.
Conventional wisdom about response categories after the question seems to stand.
In spite of our reservations about examples, we have failed to find evidence that they limit frame of reference. They don’t perform wonderfully, but alternatives don’t do better
Details in questions have the potential to distract respondents from overall meaning. Additional words may help a few respondents, but simpler wording may have a more profound impact.
Conclusions (2) Experiments presented here involve
single, interviewer-administered questions. Complexity can often be reduced by
asking multiple, smaller questions. However, the pressure to ask fewer
questions is real. Hopefully these results provide some guidance for how to structure questions given such constraints.