collaborative personalized twitter search with topic-language models

Post on 02-Jul-2015

580 Views

Category:

Social Media

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The vast amount of real-time and social content in microblogs results in an information overload for users when searching microblog data. Given the user’s search query, delivering content that is relevant to her interests is a challenging problem. Traditional methods for personalized Web search are insufficient in the microblog domain, because of the diversity of topics, sparseness of user data and the highly social nature. In particular, social interactions between users need to be considered, in order to accurately model user’s interests, alleviate data sparseness and tackle the cold-start problem. In this paper, we therefore propose a novel framework for Collaborative Personalized Twitter Search. At its core, we develop a collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. We then propose a novel user model structure to manage the topical diversity in Twitter and to enable semantic-aware query disambiguation. Our framework integrates a variety of information about the user’s preferences in a principled manner.

TRANSCRIPT

Collaborative Personalized

Twitter Search with Topic-Language Models

Jan Vosecky

Kenneth Wai-Ting Leung

Wilfred Ng

Supported by SIGIR Travel Grant

Microblogs

2

Microblogs

Tweet 1

Tweet 2

3

User-generated content

– Short length

– Informal language, free-form

– Diverse topics

Very high volume

Information overload

Searching on Twitter

4

“When you've got 5 minutes to fill,

Twitter is a great way to fill 35 minutes”

@mattcutts

Searching for “ipad” on Twitter

Around 50 tweets

mentioning “iPad”

posted within

1-minute

5

Personalizing

Twitter Search

6

Microblog data

• Compared with traditional domains

(e.g. web search, news search):

– Explicitly stated user interests

• tweets, conversations, re-tweets

– Social network structure

• following

7

• Individual user’s data

– Diverse

– Sparse

• User’s social connections

Personalization challenge

Putting all kinds of information into a single user model

inaccurate, noisy

8

• Individual user’s data

– Diverse

– Sparse

• User’s social connections

– Diverse friends, topics

– Need to carefully organize friends’ informatio

Personalization challenge

9

Short messages

Few messages

Few social connections

Little search history

• Individual user’s data

– Diverse

– Sparse

• User’s social connections

– Diverse friends, topics

– Need to carefully organize friends’ information

for it to be useful

Personalization challenge

10

• Individual user’s data

– Diverse

– Sparse

• User’s social connections

– Diverse friends, topics

Topics

Contributions

11

Novel User Model

structure

Collaborative User

Model

12

Language

modeling IR

Query likelihood model

– Given a query Q and a

document D,

where

Topic Models

A latent topic in LDA:

“Information Technology”

Google 0.00040

Android 0.00020

Microsoft 0.00010

App 0.00010

Security 0.00009

Email 0.00008

Login 0.00005

Virus 0.00004

Scope of our approach

• Input to our algorithm:

– Set of n documents returned by Twitter given

query Q

• Our task:

– Rank the documents according to:

• Query

• User model

13

Proposed Framework

14

At a Glance: Proposed User Model

15

At a Glance: Proposed User Model

16

17

At a Glance: Proposed User Model

18

At a Glance: Proposed User Model

Individual User Model

19

ITW = 2/5 = 40%

Sport

W = 2/5 = 40%

Manchester: 5

Play: 4

Win: 2

Android: 6

Coding: 2

Java: 2

ID Tweet Time Topic

1 Manchester playing tonight 1. 1. Sport

2 Doing some android coding 2. 1. IT

3 Great game, great win for manchester! 5. 1. Sport

4 Had a great apple cake with chocolate 6. 1. Food

5 My java code keeps throwing exceptions 10. 1. IT

Food

W = 1/5 =

20%

Cake: 6

Apple: 5

Oven: 2

Individual User Model (IM)

20

Is u interested in word w from topic k?

Is u interested in topic k?

Is word w related to topic k?

Prior prob. of topic k

Recent interest is more important:

From user From topic model

Personalization using IM

21

Is the Query relevant to topic k?

Is Q related to topic k in general?

Is the User interested in topic k?

Is Q related to the words in topic k that User is interested in?

Is the Document relevant to topic k?

Is D related to topic k in general?

Is the User interested in topic k?

Is D related to the words in topic k that User is interested in?

Prior Document probability

Personalization using IM

22

Q = australia

I’m interested in IT and travel

I’ve never tweeted about Australia

TV

Music

IT

Travel

Politics

Business

0.1

0.3

Top 10 restaurants in Australia

iPhones, iPads, and Macs Hacked and Hijacked

for Ransom in Australia - Gotta Be Mobile

Tweet (D):

Personalization using IM

23

Q = australia

I’m interested in IT and travel

I have tweeted about IT in Australia

TV

Music

IT

Travel

Politics

Business

0.6

0.3

Top 10 restaurants in Australia

iPhones, iPads, and Macs Hacked and Hijacked

for Ransom in Australia - Gotta Be Mobile

Tweet (D):

Collaborative User Model

Sport Food

Manchester: 5

Play: 4

Win: 2

Cake: 6

Apple: 5

Oven: 2

Friend 1

Sport

Manchester: 5

Play: 4

Win: 2

Friend 2

IT Music

Radiohead: 4

Listen: 2

Song: 5

Android: 6

Coding: 2

Java: 2

Friend 3

Sport

Manchester: 5

Play: 4

Win: 2

IT

Android: 6

Coding: 2

Java: 2

Music

Radiohead: 4

Listen: 2

Song: 5

Food

Cake: 6

Apple: 5

Oven: 2

Collaborative Model

24

Collaborative User Model

• Weighted sum of IM’s of the top-n friends– based on the amount of interactions (re-tweets, mentions,

conversations)

• Weight of each friend f:

– wP(f): Popularity of f

– wA(u,f): Affinity of u and f

• Weight of each f’s topic k:

– wB(u,k): Topic bias

– wI(u,f,k): Topic-interaction between u and f

25

Personalization using IM and CM

26

From user From topic modelFrom friends

Dirichlet smoothing

Depends on the amount of user’s tweets

Search User Model (SM)

• Feedback sources: Queries + clicks

• What does a ‘click’ mean?

27

URL clickre-tweetfavorite

Search User Model (SM)

• Feedback sources: Queries + clicks

• Feedback from a ‘click’:

– Query-topic: preference for topic k when issuing Q

– Topic-word: preference for words in topic k

– Topic: user’s search bias towards topic k

28

Evaluation

29

Evaluation

30

Query log collection

• Evaluation interface

– Submit query, returns tweets from Twitter API

– Rate relevant tweets

31

Datasets

• Controlled user study (Log_CoS)

– 11 users

• In-the-wild user study (Log_IwS)

– 24 users

32

Log_CoS Log_IwS

Ranking Results

33

Baselines:

Query likelihood (J-M smoothing)

Topic model-based IR

Personalized search (User-specific language models)

Collaborative search (Cluster-specific language models)

Collaborative Personalized search

Ranking Results

34

Ranking Results

35

Ranking Results

36

Ranking Results

37

Average per-user ranking performance

after processing i user’s queries

Comparison of models

38

(a) Log_CoS (b) Log_IwS

Query types

39

(a) Log_CoS (b) Log_IwS

Performance by query type

In summary

• Collaborative Personalized Twitter Search

– User’s tweets

– User’s friends’ tweets

– User’s search activity

– Organized around topics

• topic-specific language models

40

Future work

• Query-dependent personalization

strategies

• Selection of an optimal set of friends for

collaborative model

• Integrating spatial and temporal features

41

Thank You!

Jan Vosecky

Kenneth Wai-Ting Leung

Wilfred Ng

Supported by SIGIR Travel Grant

top related