collaborative personalized twitter search with topic-language models

Collaborative Personalized

Twitter Search with Topic-Language Models

Jan Vosecky

Kenneth Wai-Ting Leung

Wilfred Ng

Supported by SIGIR Travel Grant

Microblogs

2

Microblogs

Tweet 1

Tweet 2

3

User-generated content

– Short length

– Informal language, free-form

– Diverse topics

Very high volume

Information overload

Searching on Twitter

4

“When you've got 5 minutes to fill,

Twitter is a great way to fill 35 minutes”

@mattcutts

Searching for “ipad” on Twitter

Around 50 tweets

mentioning “iPad”

posted within

1-minute

5

Personalizing

Twitter Search

6

Microblog data

• Compared with traditional domains

(e.g. web search, news search):

– Explicitly stated user interests

• tweets, conversations, re-tweets

– Social network structure

• following

7

• Individual user’s data

– Diverse

– Sparse

• User’s social connections

Personalization challenge

Putting all kinds of information into a single user model

inaccurate, noisy

8


– Diverse

– Sparse


– Diverse friends, topics

– Need to carefully organize friends’ informatio


9

Short messages

Few messages

Few social connections

Little search history


– Diverse

– Sparse



– Need to carefully organize friends’ information

for it to be useful


10


– Diverse

– Sparse



Topics

Contributions

11

Novel User Model

structure

Collaborative User

Model

12

Language

modeling IR

Query likelihood model

– Given a query Q and a

document D,

where

Topic Models

A latent topic in LDA:

“Information Technology”

Google 0.00040

Android 0.00020

Microsoft 0.00010

App 0.00010

Security 0.00009

Email 0.00008

Login 0.00005

Virus 0.00004

Scope of our approach

• Input to our algorithm:

– Set of n documents returned by Twitter given

query Q

• Our task:

– Rank the documents according to:

• Query

• User model

13

Proposed Framework

14

At a Glance: Proposed User Model

15


16

17


18


Individual User Model

19

ITW = 2/5 = 40%

Sport

W = 2/5 = 40%

Manchester: 5

Play: 4

Win: 2

Android: 6

Coding: 2

Java: 2

ID Tweet Time Topic

1 Manchester playing tonight 1. 1. Sport

2 Doing some android coding 2. 1. IT

3 Great game, great win for manchester! 5. 1. Sport

4 Had a great apple cake with chocolate 6. 1. Food

5 My java code keeps throwing exceptions 10. 1. IT

Food

W = 1/5 =

20%

Cake: 6

Apple: 5

Oven: 2

Individual User Model (IM)

20

Is u interested in word w from topic k?

Is u interested in topic k?

Is word w related to topic k?

Prior prob. of topic k

Recent interest is more important:

From user From topic model

Personalization using IM

21

Is the Query relevant to topic k?

Is Q related to topic k in general?

Is the User interested in topic k?

Is Q related to the words in topic k that User is interested in?

Is the Document relevant to topic k?

Is D related to topic k in general?

Is the User interested in topic k?

Is D related to the words in topic k that User is interested in?

Prior Document probability


22

Q = australia

I’m interested in IT and travel

I’ve never tweeted about Australia

…

TV

Music

IT

Travel

Politics

Business

…

0.1

0.3

…

Top 10 restaurants in Australia

…

iPhones, iPads, and Macs Hacked and Hijacked

for Ransom in Australia - Gotta Be Mobile

…

Tweet (D):


23

Q = australia

I’m interested in IT and travel

I have tweeted about IT in Australia

…

TV

Music

IT

Travel

Politics

Business

…

0.6

0.3

…

Top 10 restaurants in Australia

…

iPhones, iPads, and Macs Hacked and Hijacked

for Ransom in Australia - Gotta Be Mobile

…

Tweet (D):

Collaborative User Model

Sport Food

Manchester: 5

Play: 4

Win: 2

Cake: 6

Apple: 5

Oven: 2

Friend 1

Sport

Manchester: 5

Play: 4

Win: 2

Friend 2

IT Music

Radiohead: 4

Listen: 2

Song: 5

Android: 6

Coding: 2

Java: 2

Friend 3

Sport

Manchester: 5

Play: 4

Win: 2

IT

Android: 6

Coding: 2

Java: 2

Music

Radiohead: 4

Listen: 2

Song: 5

Food

Cake: 6

Apple: 5

Oven: 2

Collaborative Model

24

Collaborative User Model

• Weighted sum of IM’s of the top-n friends– based on the amount of interactions (re-tweets, mentions,

conversations)

• Weight of each friend f:

– wP(f): Popularity of f

– wA(u,f): Affinity of u and f

• Weight of each f’s topic k:

– wB(u,k): Topic bias

– wI(u,f,k): Topic-interaction between u and f

25

Personalization using IM and CM

26

From user From topic modelFrom friends

Dirichlet smoothing

Depends on the amount of user’s tweets

Search User Model (SM)

• Feedback sources: Queries + clicks

• What does a ‘click’ mean?

27

URL clickre-tweetfavorite

Search User Model (SM)

• Feedback sources: Queries + clicks

• Feedback from a ‘click’:

– Query-topic: preference for topic k when issuing Q

– Topic-word: preference for words in topic k

– Topic: user’s search bias towards topic k

28

Evaluation

29

Evaluation

30

Query log collection

• Evaluation interface

– Submit query, returns tweets from Twitter API

– Rate relevant tweets

31

Datasets

• Controlled user study (Log_CoS)

– 11 users

• In-the-wild user study (Log_IwS)

– 24 users

32

Log_CoS Log_IwS

Ranking Results

33

Baselines:

Query likelihood (J-M smoothing)

Topic model-based IR

Personalized search (User-specific language models)

Collaborative search (Cluster-specific language models)

Collaborative Personalized search

Ranking Results

34

Ranking Results

35

Ranking Results

36

Ranking Results

37

Average per-user ranking performance

after processing i user’s queries

Comparison of models

38

(a) Log_CoS (b) Log_IwS

Query types

39

(a) Log_CoS (b) Log_IwS

Performance by query type

In summary

• Collaborative Personalized Twitter Search

– User’s tweets

– User’s friends’ tweets

– User’s search activity

– Organized around topics

• topic-specific language models

40

Future work

• Query-dependent personalization

strategies

• Selection of an optimal set of friends for

collaborative model

• Integrating spatial and temporal features

41

Thank You!

Jan Vosecky

Kenneth Wai-Ting Leung

Wilfred Ng

Supported by SIGIR Travel Grant

collaborative personalized twitter search with topic-language models

Social Media