collaborative personalized twitter search with topic-language models
DESCRIPTION
The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user’s search query, delivering content that is relevant to her interests is a challenging problem. Traditional methods for personalized Web search are insufficient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. In particular, social interactions between users need to be considered, in order to accurately model user’s interests, alleviate data sparseness and tackle the cold-start problem. In this paper, we therefore propose a novel framework for Collaborative Personalized Twitter Search. At its core, we develop a collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. Our framework integrates a variety of information about the user’s preferences in a principled manner.TRANSCRIPT
Collaborative Personalized
Twitter Search with Topic-Language Models
Jan Vosecky
Kenneth Wai-Ting Leung
Wilfred Ng
Supported by SIGIR Travel Grant
Microblogs
2
Microblogs
Tweet 1
Tweet 2
3
User-generated content
– Short length
– Informal language, free-form
– Diverse topics
Very high volume
Information overload
Searching on Twitter
4
“When you've got 5 minutes to fill,
Twitter is a great way to fill 35 minutes”
@mattcutts
Searching for “ipad” on Twitter
Around 50 tweets
mentioning “iPad”
posted within
1-minute
5
Personalizing
Twitter Search
6
Microblog data
• Compared with traditional domains
(e.g. web search, news search):
– Explicitly stated user interests
• tweets, conversations, re-tweets
– Social network structure
• following
7
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
Personalization challenge
Putting all kinds of information into a single user model
inaccurate, noisy
8
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
– Need to carefully organize friends’ informatio
Personalization challenge
9
Short messages
Few messages
Few social connections
Little search history
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
– Need to carefully organize friends’ information
for it to be useful
Personalization challenge
10
• Individual user’s data
– Diverse
– Sparse
• User’s social connections
– Diverse friends, topics
Topics
Contributions
11
Novel User Model
structure
Collaborative User
Model
12
Language
modeling IR
Query likelihood model
– Given a query Q and a
document D,
where
Topic Models
A latent topic in LDA:
“Information Technology”
Google 0.00040
Android 0.00020
Microsoft 0.00010
App 0.00010
Security 0.00009
Email 0.00008
Login 0.00005
Virus 0.00004
Scope of our approach
• Input to our algorithm:
– Set of n documents returned by Twitter given
query Q
• Our task:
– Rank the documents according to:
• Query
• User model
13
Proposed Framework
14
At a Glance: Proposed User Model
15
At a Glance: Proposed User Model
16
17
At a Glance: Proposed User Model
18
At a Glance: Proposed User Model
Individual User Model
19
ITW = 2/5 = 40%
Sport
W = 2/5 = 40%
Manchester: 5
Play: 4
Win: 2
Android: 6
Coding: 2
Java: 2
ID Tweet Time Topic
1 Manchester playing tonight 1. 1. Sport
2 Doing some android coding 2. 1. IT
3 Great game, great win for manchester! 5. 1. Sport
4 Had a great apple cake with chocolate 6. 1. Food
5 My java code keeps throwing exceptions 10. 1. IT
Food
W = 1/5 =
20%
Cake: 6
Apple: 5
Oven: 2
Individual User Model (IM)
20
Is u interested in word w from topic k?
Is u interested in topic k?
Is word w related to topic k?
Prior prob. of topic k
Recent interest is more important:
From user From topic model
Personalization using IM
21
Is the Query relevant to topic k?
Is Q related to topic k in general?
Is the User interested in topic k?
Is Q related to the words in topic k that User is interested in?
Is the Document relevant to topic k?
Is D related to topic k in general?
Is the User interested in topic k?
Is D related to the words in topic k that User is interested in?
Prior Document probability
Personalization using IM
22
Q = australia
I’m interested in IT and travel
I’ve never tweeted about Australia
…
TV
Music
IT
Travel
Politics
Business
…
0.1
0.3
…
Top 10 restaurants in Australia
…
iPhones, iPads, and Macs Hacked and Hijacked
for Ransom in Australia - Gotta Be Mobile
…
Tweet (D):
Personalization using IM
23
Q = australia
I’m interested in IT and travel
I have tweeted about IT in Australia
…
TV
Music
IT
Travel
Politics
Business
…
0.6
0.3
…
Top 10 restaurants in Australia
…
iPhones, iPads, and Macs Hacked and Hijacked
for Ransom in Australia - Gotta Be Mobile
…
Tweet (D):
Collaborative User Model
Sport Food
Manchester: 5
Play: 4
Win: 2
Cake: 6
Apple: 5
Oven: 2
Friend 1
Sport
Manchester: 5
Play: 4
Win: 2
Friend 2
IT Music
Radiohead: 4
Listen: 2
Song: 5
Android: 6
Coding: 2
Java: 2
Friend 3
Sport
Manchester: 5
Play: 4
Win: 2
IT
Android: 6
Coding: 2
Java: 2
Music
Radiohead: 4
Listen: 2
Song: 5
Food
Cake: 6
Apple: 5
Oven: 2
Collaborative Model
24
Collaborative User Model
• Weighted sum of IM’s of the top-n friends– based on the amount of interactions (re-tweets, mentions,
conversations)
• Weight of each friend f:
– wP(f): Popularity of f
– wA(u,f): Affinity of u and f
• Weight of each f’s topic k:
– wB(u,k): Topic bias
– wI(u,f,k): Topic-interaction between u and f
25
Personalization using IM and CM
26
From user From topic modelFrom friends
Dirichlet smoothing
Depends on the amount of user’s tweets
Search User Model (SM)
• Feedback sources: Queries + clicks
• What does a ‘click’ mean?
27
URL clickre-tweetfavorite
Search User Model (SM)
• Feedback sources: Queries + clicks
• Feedback from a ‘click’:
– Query-topic: preference for topic k when issuing Q
– Topic-word: preference for words in topic k
– Topic: user’s search bias towards topic k
28
Evaluation
29
Evaluation
30
Query log collection
• Evaluation interface
– Submit query, returns tweets from Twitter API
– Rate relevant tweets
31
Datasets
• Controlled user study (Log_CoS)
– 11 users
• In-the-wild user study (Log_IwS)
– 24 users
32
Log_CoS Log_IwS
Ranking Results
33
Baselines:
Query likelihood (J-M smoothing)
Topic model-based IR
Personalized search (User-specific language models)
Collaborative search (Cluster-specific language models)
Collaborative Personalized search
Ranking Results
34
Ranking Results
35
Ranking Results
36
Ranking Results
37
Average per-user ranking performance
after processing i user’s queries
Comparison of models
38
(a) Log_CoS (b) Log_IwS
Query types
39
(a) Log_CoS (b) Log_IwS
Performance by query type
In summary
• Collaborative Personalized Twitter Search
– User’s tweets
– User’s friends’ tweets
– User’s search activity
– Organized around topics
• topic-specific language models
40
Future work
• Query-dependent personalization
strategies
• Selection of an optimal set of friends for
collaborative model
• Integrating spatial and temporal features
41
Thank You!
Jan Vosecky
Kenneth Wai-Ting Leung
Wilfred Ng
Supported by SIGIR Travel Grant