from “selena gomez” to “marlon brando”: understanding explorative entity search
TRANSCRIPT
From “Selena Gomez” to “Marlon Brando”: Understanding Explorative Entity Search
Iris Miliaraki, Roi Blanco, Mounia Lalmas| May, 2015
24th International World Wide Web Conference (WWW 2015), Florence, Italy
Spark system ▪ What is Spark?
Given a query submitted to Yahoo search engine, Spark provides related entity suggestions for the query exploiting public knowledge bases from the Semantic Web & proprietary data
▪ So, what has the young actress Selena Gomez to do with
Marlon Brando? This is a path that can be explored (and was explored) by a user following suggestions made by Spark
Example navigation patterns
Star behavior: user clicks on many related entities given a single entity query
Path behavior: user follows a path of related entities issuing different successive queries
Goals ● Study how users interact with Spark recommendations
○ Which types of queries & entities users interact with the most?
○ What are the characteristics of these sessions? ○ What is the interplay between typical search results &
Spark entity recommendation results? ○ Does Spark promote an explorative behavior?
● Predict user click behavior
○ Exploit the insights from the study to develop a set of query and user based features to reflect the click behavior of users and explore their impact on click prediction on Spark
Talk outline ● Motivation & Goals ● Analysis
○ Dataset ○ Query-based analysis ○ User-based analysis ○ Other trends
● Prediction task ○ Experimental setup ○ Features ○ Results
● Sum up & contributions
Analysis: Dataset & metrics
▪ Dataset: collected a sample of 2M users focusing on activity related to Spark (queries triggering Spark)
▪ Metrics: Search & Spark CTR (click-through rate) for evaluating “user satisfaction”
▪ Due to confidentiality, all raw CTR values have been normalized via a linear transformation and all reported values are relative.
Query-based analysis I Search vs Spark CTR
Mutual-growth area
Relatively low Spark & Search CTR
High Search CTR - low Spark CTR
Query-based analysis III Search & Spark CTR for different query context The user submitting a query with
context looks for a more specialized set of results
The user submitting a query without any surrounding context more likely to click on Spark results
User-based analysis I Session duration effect
Shorter sessions have highest search CTR: users come, find what they are looking for and leave
As the session length increases, search CTR decreases likely due to users trying various queries to find what they are looking for
different behavior for Spark: user willing to explore the recommendation
Other trends User age vs. Person entity age
Users are enticed to explore people of a closer age to them (Pearson correlation is equal to 0.859 with p<0.0001)
Main insights ▪ Spark promotes explorative behavior
▪ Users are more likely to navigate through the
recommendations for specific type of queries and when no specific context (e.g., “pictures”) is specified
▪ Contrary to standard search behavior, where users find the information they need as soon as possible, users interacting with Spark entity recommendations explore the results leading to longer sessions
▪ Next: we build a prediction model for predicting whether the users will click on Spark results
Talk outline ● Motivation & Goals ● Analysis
○ Dataset ○ Query-based analysis ○ User-based analysis ○ Other trends
● Prediction task ○ Setup ○ Features ○ Results
● Sum up & contributions
Prediction task: setup
▪ Dataset: › sample of 100k users from which we collects their actions
in 6-month period › 2 cases: users with any number of actions / users with at
least 3 actions
▪ Task & Method: Given a user, her previous interactions and a new issued query, predict whether the user will interact or not with the Spark module using logistic regression
▪ Evaluation metrics: precision, negative predictive value, recall, specificity, accuracy and AUC
Prediction task: performance
❏ User-based features improve significantly accuracy
❏ Recall is low showing that the particulars under which a user will engage with a Spark module are diverse & cannot be easily captured
Prediction task: user history previous actions of users (i=1,2,3)
❏ The more recent the previous action used is, the more accurate the prediction (i=3 corresponds to the most recent action)
Talk outline ● Motivation & Goals ● Analysis
○ Dataset ○ Query-based analysis ○ User-based analysis ○ Other trends
● Prediction task ○ Experimental setup ○ Features ○ Results
● Sum up & contributions
Sum up & contributions ● Large-scale analysis: types of queries and entities that
users interact with, who are the users that are interacting with Spark, characteristics of their sessions, and the interplay between the typical search results and Spark entity recommendation results
● Spark click prediction: developed a set of query and
user-based features that reflect the click behavior of the users and explored their impact in the context of click prediction on Spark using a prediction approach