understanding user satisfaction with intelligent assistants
TRANSCRIPT
Understanding User Satisfaction with Intelligent Assistants
Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah,
Aidan C. Crook, Imed Zitouni, Tasos Anastasakos
Eindhoven University of Technology Pennsylvania State University
University of Massachusetts Amherst Microsoft
CHIIR’16, Chapel Hill, NC, USA
Q1: how is the weather in ChicagoQ2: how is it this weekendQ3: find me hotelsQ4: which one of these is the cheapestQ5: which one of these has at least 4 starsQ6: find me directions from the Chicago airport to number one
User’s dialogue with
Cortana:Task is
“Finding a hotel in
Chicago”
Q1: find me a pharmacy nearbyQ2: which of these is highly ratedQ3: show more information about number 2Q4: how long will it take me to get thereQ5: Thanks
User’s dialogue with
Cortana:Task is
“Finding a pharmacy”
Research Questions• RQ1: What are characteristic types of scenarios of
use?
Controlling Device• Call a person
• Send a text message
• Check on-device calendar
• Open an application
• Turn on/off wi-fi
• Play music
Knowledge Pane
Image Answer
Knowledge Pane
Image Answer Image Answer
Organic Results
Knowledge Pane
Image Answer Image Answer
Location Answer
Organic Results
User:“Do I
need to have a jacket
tomorrow?”
Search Dialogue
User:“Do I
need to have a jacket
tomorrow?”
Cortana: “You could
probably go without one. The forecast
shows …”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
User:“show
restaurants near
me”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
User:“show
directions to the second one”
Search Dialogue
Research Questions• RQ1: What are characteristic types of scenarios of use?
• RQ2: How can we measure different aspects of user satisfaction?
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• RQ4: How to characterize abandonment in the web search scenario?
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
Research Questions• RQ1: What are characteristic types of scenarios of use?
• RQ2: How can we measure different aspects of user satisfaction?
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• RQ4: How to characterize abandonment in the web search scenario?
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
USE
R
STU
DY
User Study Participants
55%45%
LANGUAGEEnglish Other
• 60 Participants• 25.53 +/- 5.42 years
User Study Participants
75%
25%
GENDER
Male Female
55%45%
LANGUAGEEnglish Other
• 60 Participants• 25.53 +/- 5.42 years
User Study Participants
75%
25%
GENDER
Male Female
55%45%
LANGUAGEEnglish Other
82%
8%2% 8%
Education
Computer ScienceElectrical EngineeringMathematicsOther
• 60 Participants• 25.53 +/- 5.42 years
User Study Design• Video Instructions (same for all participants)
• Tasks are realistic – mined from Cortana logs:
o Control type of taskso Queries where users don’t clicko Search dialogue tasks – mostly localization type of
queries
Find out what is the hair color
of your favorite
celebrity.
You are planning a vacation. Pick a
place. Check if the weather is good enough for the period you are planning the
vacation. Find a hotel that suits you.
Find the driving directions to this
place.
You are planning a vacation. Pick a
place. Check if the weather is good enough for the period you are planning the
vacation. Find a hotel that suits you.
Find the driving directions to this
place.
Questionnaire: Controlling Device
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o 5-point Likert scale
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Controlling Device
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o 5-point Likert scale
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
5 Tasks20 Minutes
Questionnaire: Good Abandonment
• Were you able to complete the task?o Yes/No
• Where did you find the answer?o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Good Abandonment
• Were you able to complete the task?o Yes/No
• Where did you find the answer?o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
5 Tasks20 Minutes
Questionnaire: Search Dialogue
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Search Dialogue
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
8 Tasks: 1 simple, 4 with 2 subtasks, 3 with 3 subtasks
30 Minutes
Search Dialog Dataset• 540 tasks that incorporated
• 2, 040 queries, of which 1, 969 were unique
• the average query-length is 7.07
• The simple task generated 130 queries in total
• Tasks with 2 context switches generated 685 queries
• Tasks with 3 context switches generated 1, 355 queries
Factors Determining Satisfaction
RQ3: What are key factors determining user satisfaction for the different scenarios?
Across Scenar-
ious
Device Control
Web Search
Structured Dialog
50
1
2
3
4
5
6
Across Scenar-
ious
Device Control
Web Search
Structured Dialog
50
1
2
3
4
5
6
Satis
fact
ion
Leve
l
Effor
ts
Results Over ScenariosMean of Satisfaction
Results `Good Abandonment’
RQ4: How to characterize abandonment in the web search scenario?
First Query
Second Query
Third Query
>= Fourth Quey
0
1
2
3
4
5
6
Answer Box
Image SERP Visited WebSite
50
1
2
3
4
5
6
Satis
fact
ion
Leve
l
Results `Good Abandonment’
Mean of Satisfaction
Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
User:“show
directions to the second one”
SAT?
SAT?
SAT?
SAT?
SAT?
SAT?
Overall
SAT??
Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task
Num
ber
of
Ans
wer
s
1 2 3 4 5
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task Mission Task (2 sub-tasks)
Num
ber
of
Ans
wer
s
1 2 3 4 5
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task Mission Task (2 sub-tasks)
Mission Task (3 sub-tasks)
Num
ber
of
Ans
wer
s
1 2 3 4 5
Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counter
Q3: show me the nearest pharmacyQ4: more information on the second one
Q5: do they have a stool softenerQ6: does Fred Meyer have stool softeners
General Search
Search Dialog
Combination of scenarios
User’s dialogue with Cortana related to the ‘stomach ache’ problem
Conclusions (1)• RQ1: What are characteristic types of scenarios of use?• We proposed three main types of scenarios
• RQ2: How can we measure different aspects of user satisfaction?
• We designed a series of user studies tailored to the three scenarios
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• Effort is a key component of user satisfaction across the different intelligent assistants scenarios
Conclusions (2)• RQ4: How to characterize abandonment in the web
search scenario?• We concluded that to measure good abandonment we
need to investigate the other forms of interaction signals that are not based on clicks or reformulation
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
• We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context
Questions?• We proposed three main types of scenarios of use
• We designed a series of user studies tailored to the three scenarios
• Effort is a key component of user satisfaction across the different intelligent assistants scenarios
• We concluded that to measure good abandonment we need to investigate the other forms of interaction signals that are not based on clicks or reformulation
• We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context on user satisfactionQuestions?