sigir16

Predicting User Engagement with Direct Displays Using Mouse Cursor Information

Ioannis Arapakis (Eurecat), Luis Leiva (Sciling)

Contents

§Introduction & motivation§Merits of mouse cursor analysis§Experimental setup§Predictive modelling§Performance assessment§Results§Conclusions

Introduction

§ In recent years direct displays (DDs) have become a standard componenton the SERPs of all major web search engines

§ DDs serve two main purposes:• Provide well-structured summary of

information which is difficult or time-consuming to access

• Help tidy up the SERP section that contains the universal search results

Knowledge Module

§ One such prominent example is the Knowledge Module (KM) display which provides users with information about the named entities they are searching

§ The content presented in the KM display is obtained in a semi-structured format from curated entity databases (e.g., Freebase, Wikipedia)

§ This raw information is further enriched by the search engine, e.g., ranking of related entities, explanations of their relationship or with related multimedia and social media content

Motivation

§ In this context, most research has focused on general backendsystem tasks, the most important being knowledge base construction, or more specific backend tasks such as related entity recommendation

§ This work attempts to understand how users engage with a DD like the KM display in entity-centric search tasks

§ We are interested in predicting user engagement with a DD in the absence of explicit feedback (e.g., self-report data)

Addressing the gap

§ Existing modelling techniques make a simplifying assumption when analysing web search log data: the user is assumed to be equally engaged with all parts of the SERP

§ In practice this assumption is not always true: • A user may click on certain links on the page, but not all links• May read a certain result snippet in the SERP but not necessarily the entire

list of results• May ignore the SERP content completely and focus only on the images

shown in the KM display or other DDs

Mouse cursor tracking

§ Navigation & interaction with a digital environment usually involves the use of a mouse (i.e., selecting, hovering, clicking)

§ Can be easily performed in a non-invasivemanner, without removing users from their natural setting

§ Several works have shown that the mouse cursor is a proxy of gaze (attention)

§ Low-cost, scalable alternative to eye-tracking

Crowdsourcing study

§ We conducted a crowdsourcing study and examine how users engage with DDs like the knowledge module (KM) display

§ We collected and analysed more than 115K mouse cursor positions from 300 users

§ With this study we aim to predict: • When a user notices the KM display on the SERP• If it is perceived as a useful aid to their search tasks• Whether interacting with the KM display alters the users’ perception of how

fast they complete the search tasks

Experimental design

§ Repeated-measures design§ One independent variable: KM display (with two levels: “visible”

or “hidden”)§ Three dependent variables: (i) KM display noticeability, (ii) KM

display usefulness and (iii) perceived task accomplishment speed

§ Two short search tasks were completed using the Yahoo search engine: one task with the KM display on the SERP and one without it*

* The KM display visibility was controlled with client-side scripting.

Search UI

§ Participants accessed the search engine through a custom proxy which did not alter the original look and feel of the SERPs

§ This allowed us to capture user interactions with the SERP without interfering with the actual web search engine interface in production

§ For each search task, participants were presented with a question and were suggested a search query to begin with

Search query sample

§ Query set consisted of 32 unique query patterns (144 different queries in total)§ The selected query patterns belonged to four different topics (famous people,

movies, athletes, sport teams) and required either single or multiple answers

Mouse cursor tracking tool

§ To collect mouse cursor data we used EVTRACK*, an open source JavaScript event tracking library that is part of the smt2ε system

§ EVTRACK allows to specify what browser events should be captured and how they should be captured, i.e., via event listeners (the event is captured as soon as it is fired) or via event polling

* https://github.com/luileito/evtrack

Self-reported measures of engagement

§ A mini-questionnaire on the SERPs gathered ground truth labels for the mouse cursor data

§ The mini-questionnaire was initially hidden and was shown to the user just before leaving the SERP

§ It comprised 3 questions:• Did you notice the knowledge module?• To what extent did you find the knowledge module

useful in answering the question?• To what extent did the knowledge module help you

answer the question faster?

Procedure§ Participants were asked to evaluate two different backend

systems of Yahoo web search by performing two search tasks§ For each task, participants had to answer a question by

searching for relevant information on the proxified search engine§ In one task the KM display would be hidden (control condition)

and in the other task it would be visible (experimental condition)§ The order of the tasks was randomized for each participant§ Participants were presented with a suggested query to begin

their search but were free to submit additional queries § We used informational, entity-centric queries to introduce a

common starting point across all participants

Modelling user engagement

§ Our final dataset consists of ~115K cursor positions, collected during 600 search task sessions

§ Out of those 600 search task sessions we analysed the 300 cases that correspond to the experimental condition with the visible KM in the SERP

§ Our dataset is generally balanced, with 176 users having reported noticing the KM display

§ We normalised the values for each feature so that feature values that those that fall in greater numeric ranges do not dominate over those in smaller numeric ranges

Feature Engineering

* These functions are computed for most base and meta-features.

Users who did not engage with the KG Users who engaged with the KG

Feature Engineering (cont.)

Predictive Modelling

§ We trained 10 RF* models (90% of data) and used them to obtain the predictions for each of the held-out set (10% of data) among the ten folds**

§ Excluded highly correlated and linearly dependent features

§ Performed feature selection using recursive feature elimination

§ We used a subset of our training data for fine-tuning the classifier’s hyperparameters

* R packages “Caret” and “randomForest”.** With stratified sampling.

Performance evaluation

§ Baselines:• If the user clicked on the KM display (hasClickedKM, binary)• If the mouse cursor hovered over the KM display (hasHoveredKM, binary)• Time spent on the page (dwellTime) as a feature to the RF classifier

§ Performance evaluation:• Precision / Recall• Accuracy• F-Measure• AUC

Results

Attention

Precision Recall F-Measure Accuracy AUC0

0.2

0.4

0.6

0.8

1Attention

Click

Hover

DwellTime

Ours

Usefulness


0.2

0.4

0.6

0.8

1Usefulness

Click

Hover

DwellTime

Ours

Perceived Task Duration


0.2

0.4

0.6

0.8

1Task Duration

Click

Hover

DwellTime

Ours

Computational complexity§ Mouse gesture techniques that rely on PCA

preprocessing and k-means clustering• Covariance matrix computation + eigenvalue

decomposition ☞ O(p2N + p3)• K-means ☞ O(icN)

§ Cursor Motifs that use Dynamic Time Warping (DTW) and k-nearest neighbours (kNN)• DTW ☞ O(N2)• kNN☞ O(N2k2)

§ Proposed method has linear ☞ O(N) or quasilinear cost ☞ O(NlogN)

Conclusions

§ We conducted a crowdsourcing study that revealed the potential benefits of using mouse cursor data to predict user engagement with DDs

§ We demonstrated that our feature selection model outperformsthe standard baselines to measure three user engagementproxies with the KM display

§ Our initial results suggest that it is possible to predict when the user attention is captured by a DD using only simple, yet highlydiscriminative features derived from mouse cursor activity

Conclusions (cont.)

§ Predicting accurately if a DD was truly noticed can:• Increase the true negative prediction rate • Reduce the false negative rate

§ Knowing when a user finds a DD useful has importantimplications on the methodology for understanding the impactof launching a new DD, modifying its existing design, and how that change may affect search UIs

Conclusions (cont.)

§ Information about perceived task duration can be combined with the previous grounds truths to understand better how users engage with ads or multimedia content

§ The main practical use of our models is perhaps to automatically select or lay out the DDs

§ DDs are optional for the SERPs and the user behaviour could provide signals about whether DDs should be shown or not in particular queries

§ Our method offers a computationally efficient way to analysemouse cursor data

Acknowledgments

§ We thank B. Barla Cambazoglu and Marios Koulakis for fruitful discussions

Thank you for your attention!

iarapakis

[email protected]

https://es.linkedin.com/in/ioannisarapakis

http://www.slideshare.net/iarapakis/sigir16

sigir16

Science