social priors to estimate relevance of a resource

25
Ismaïl BADACHE, Mohand BOUGHANEM IRIT, Toulouse University, France {badache, boughanem}@irit.fr

Upload: ismail-badache

Post on 13-Jun-2015

113 views

Category:

Social Media


4 download

DESCRIPTION

In this paper we propose an approach that exploits social data associated with a Web resource to measure its a priori relevance. We show how these interaction traces left by the users on the resources, which are in the form of social signals as the number of like and share, can be exploited to quantify social properties such as popularity and reputation. We propose to model these properties as a priori probability that we integrate into language model. We evaluated the effectiveness of our approach on IMDb dataset containing 167438 resources and their social signals collected from several social networks. Our experimental results are statistically significant and show the interest of integrating social properties in a search model to enhance the information retrieval.

TRANSCRIPT

Page 1: Social Priors to Estimate Relevance of a Resource

Ismaïl BADACHE, Mohand BOUGHANEM

IRIT, Toulouse University, France

{badache, boughanem}@irit.fr

Page 2: Social Priors to Estimate Relevance of a Resource

Presentation Plan

Introduction

Related Work

Approach of Social Information Retrieval

Experimental Results4

1

3

Conclusion

2

5

Page 3: Social Priors to Estimate Relevance of a Resource

1.1 Emergence of social Web

Number of active users 2013

1,2 1,41,7

2,4

2011 2012 2013 2014

Number of Internet users

Social content per 1 minute

41000 Publications

1,8 Million Like

~350 GB of Data

Facebook

Source:blogdumoderateur.comquantcast.comsemiocast.com

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

1

Page 4: Social Priors to Estimate Relevance of a Resource

Video

Photo

Web Page

Web Resources

Resource

.

.

.

Social Networks

Bookmark

Comment

Share/Recommend

Motion/Vote

Like/+1

Interaction

Extraction and quantification of

social properties

Information Retrieval Model

(Ranking)

Integration

Query Results

Fig 1. Global presentation of our work

Social Signals

(Source of Evidence)

Popularity

Reputation

Freshness

2

Page 5: Social Priors to Estimate Relevance of a Resource

1.2 Example of Social Signals

3

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 6: Social Priors to Estimate Relevance of a Resource

1.3 Research Issues

What are the most useful signals and properties to evaluate a priori relevance

(importance) of a resource?2

What theoretical model to combine a priori relevance of resource with its

topical relevance?3

What is the impact of social properties on IR system performance?4

1 How to translate social signals into social properties?

4

What are the most favored signals and properties while using attribute

selection algorithms? and what are the most correlated with documents

relevance?

5

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 7: Social Priors to Estimate Relevance of a Resource

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

2.1 Related Work

5

Sources of evidence (Social Features) Properties Models Authors

• Number of : clicks, votes, records and

recommendations.

Popularity

Importance

Linear

combination(Karweg et al., 2011)

• Number of : like, dislike, comments on

YouTube.

• The playcount (number of times a user

listens to a track on lastfm)

• Presence of a URL in a tweet.

Importance

Machine

learning

and

Linear

combination

(Chelaru et al., 2012)

(Khodaei et al. 2012)

(Alonso et al., 2010)

• Number of retweets.

• Number of annotations (tags).Popularity

Machine

learning

(Yang et al., 2012)

(Hong et al., 2011)

(Pantel et al., 2012)

• Social approval votes ImportanceMachine

learning

(Kazai and Milic-

Frayling., 2009)

Page 8: Social Priors to Estimate Relevance of a Resource

• Our IR approach consists of exploiting various and heterogeneous social

signals from different social networks to take into account in retrieval model.

In addition, instead of considering social features separately as done in the

previous works, we propose to combine them to measure specific social

properties, namely the popularity and the reputation of a resource. We also

evaluate the impact of freshness of signal in the performance. In our work, we

use language model that provide a theoretical founded way to take into

account the notion of a priori probabilities of a document.

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

3.1 A Modular Approach for Social IR

6

Page 9: Social Priors to Estimate Relevance of a Resource

• We assume that resource D can be represented both by a set of textual key-words

𝐷𝑤={𝑤1, 𝑤2, …𝑤𝑛} and a set of social actions (signals) performed on this

resource, 𝐷𝑎={𝑎1, 𝑎2, … 𝑎𝑚}.

• We consider a set X={Popularity, Reputation, Freshness} of 3 social properties

that characterize a resource D. Each property is quantified by a specific actions

group. These properties are considered as a priori knowledge of a resource.

3.2 Social Signals and Social Properties

Web Resource- Textual key-words

- Social Signals

- Like- +1- Share

- Comment- Dates of actions

Web Resource- Textual key-words

- Social Signals

- Like- +1- Share- Comment- Dates of actions

Reputation

Popularity

Freshness

7

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 10: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• The language modelling approach computes the probability 𝑃(𝐷|𝑄) of a

document D being generated by a query Q by using the Bayes theorem :

• 𝑃(𝐷) is a document prior probability. It is useful for incorporating other sources

of information to the retrieval process.

• 𝑃(𝑄) can be ignored because it does not affect the ranking of documents.

3.3 Query Likelihood and Document Priors

(1)

(2)

8

𝑆𝑐𝑜𝑟𝑒 𝑄, 𝐷 = 𝑃 𝐷 𝑄 =𝑃(𝐷) ∙ 𝑃(𝑄|𝐷)

𝑃(𝑄)

𝑆𝑐𝑜𝑟𝑒 𝑄, 𝐷 = 𝑃 𝐷 𝑄 = 𝑷 𝑫 ∙ 𝑃(𝑄|𝐷)

Document Prior Probability Query-Likelihood Score

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 11: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• Popularity P: The resource popularity can be estimated according to the rate of

sharing this resource on social networks.

• Reputation R: The resource reputation can be estimated based on social

activities that have positive meaning such as Facebook like. Indeed, resource

reputation depends on the degree of users' appreciation on social networks.

The general formula is the following:

Where:

3.4 Estimating Priors: Popularity and Reputation

𝑃𝑥 𝑎𝑖𝑥 =𝐶𝑜𝑢𝑛𝑡(𝑎𝑖

𝑥, 𝐷)

𝐶𝑜𝑢𝑛𝑡(𝑎.𝑥, 𝐷)

(3)

(4)

9

𝑃𝑥 𝐷 =

𝑎𝑖𝑥∈ 𝐴

𝑃𝑥 𝑎𝑖𝑥

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 12: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• To avoid Zero probability, we smooth 𝑃𝑥 𝑎𝑖𝑥 by collection C using Dirichlet.

The formula becomes as follows:

Where:

• 𝐶𝑜𝑢𝑛𝑡 𝑎𝒊𝑥, 𝐷 represents number of occurrence of specific action 𝑎𝑖

𝑥 performed

on a resource.

• 𝑎𝑖𝑥 designs action 𝑎𝑖 used to measure a property 𝑥. 𝑎.

𝑥 is the total number of

social signals associated to property 𝑥, in documents D or in collection C.

3.5 Estimating Priors: Popularity P and Reputation R

(5)

(6)

10

𝑃𝑥 𝐷 =

𝑎𝑖𝑥∈ 𝐴

𝐶𝑜𝑢𝑛𝑡 𝑎𝑖𝑥, 𝐷 + 𝜇 ∙ 𝑃(𝑎𝑖

𝑥|𝐶)

𝐶𝑜𝑢𝑛𝑡 𝑎∙𝑥, 𝐷 + 𝜇

𝑃(𝑎𝑖𝑥|𝐶) =

𝐶𝑜𝑢𝑛𝑡(𝑎𝑖𝑥, 𝐶)

𝐶𝑜𝑢𝑛𝑡(𝑎.𝑥, 𝐶)

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 13: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• In addition to simple counting of social actions, we propose to consider the time

associated with signal. We assume that the resource associated with fresh signals

should be promoted comparing to those associated with old signals. Therefore,

instead of counting each occurrence of a given signal, we bias this counting,

noted 𝐶𝑜𝑢𝑛𝑡𝐵, by the date of the occurrence of the signal. The corresponding

formula is as follows:

• 𝑇𝑎𝑖={𝑡1,𝑎𝑖 , 𝑡2,𝑎𝑖 , … 𝑡𝑘,𝑎𝑖} a set of k datetime at which each action 𝑎𝑖 was produced.

• 𝑓𝐹(𝑡𝑗,𝑎𝑖𝑥 , 𝐷) represents freshness function, estimated by using Gaussian Kernel, it

calculates a distance between current time 𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 and action time 𝑡𝑗,𝑎𝑖𝑥

3.6 Estimating Priors with considering Freshness F

𝐶𝑜𝑢𝑛𝑡𝐵 𝑡𝑗,𝑎𝑖𝑥 , 𝐷 =

𝑗=1

𝑘

𝑓𝐹(𝑡𝑗,𝑎𝑖𝑥 , 𝐷)

= 𝑗=1

𝑘

𝑒𝑥𝑝 −‖𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝑡𝑗,𝑎𝑖

𝑥‖2

2𝜎2(7)

11

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 14: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• In our case, we have various sources of social information that influences the a

priori probability of relevance. This probability is calculated by combining two

main social properties (popularity and reputation). The problem can be

formalized as follows:

• 𝑃𝑃 𝐷 ,𝑃𝑅(𝐷) define a priori probabilities related to popularity P and reputation

R that include freshness function.

• 𝑃𝑃⊕𝑅 𝐷 defines the probability of priors combination.

3.7 Combining Priors

(8)

12

𝑃𝑃⊕𝑅 𝐷 = 𝑃𝑃(𝐷) ∙ 𝑃𝑅(𝐷)

1. Introduction 2. Related Work

5. Conclusion

3. Approach of SIR

4. Experimental Results

Page 15: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• Objectives

1. First, to evaluate whether social signals, taken from different social networks

improve the search.

2. Second, to evaluate the impact of each signal taken separately and grouped to

represent a certain property.

3. and finally to measure the impact of the freshness.

• Evaluation challenge

1. Absence of a standard framework for evaluation in social IR.

2. Collect social signals from 5 social networks and mount experimentation.

1. Introduction 2. Related Work

5. Conclusion

4.1 Experimental Evaluation

3. Approach of SIR

4. Experimental Results

13

Page 16: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• Textual Content: 167438 Documents from INEX IMDb.

4.2 Description of DataSet

3. Approach of SIR

4. Experimental Results

14

Field Description Status

ID Identifying the film (document) -

Title Film's title indexed

Year Year of the film release indexed

Rated Film classfication by content type -

Released Date of making the film indexed

Runtime Length of the film indexed

Genre Film genre (Action, Drama, etc.) indexed

Director Director of the film project indexed

Writer Writers and writers of the film indexed

Actors Main actors of the film indexed

Plot Text summary of the film indexed

Poster URL of the link poster -

url URL of the Web source document -

UGC Social data recovered -

1. Introduction 2. Related Work

5. Conclusion

Page 17: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

• Social Content: 8 social data from 5 social networks.

• Query and Relevance Judgment: from INEX IMDb

- 30 queries (topics) and their Qrels from the set of INEX IMDb.

- Top 1000 documents returned by each topic.

4.2 Description of DataSet

3. Approach of SIR

4. Experimental Results

ACEBOOK

Like

Share

Comment

Date of last action

WITTER

Tweet

GOOGLE+

+1

Share

LINKEDDELICIOUS

Bookmark

15

1. Introduction 2. Related Work

5. Conclusion

Page 18: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach4.3 Quantifying of Social Properties

3. Approach of SIR

4. Experimental Results

Social Properties Social Signals Social Networks

Popularity P

Number of « Comment » C1 Facebook

Number of « Tweet » C2 Twitter

Number of « Share » C3 LinkedIn

Number of « Share » C4 Facebook

Reputation R

Number of « Like » C5 Google+

Number of « +1 » C6 Facebook

Number of « Bookmark » C7 Delicious

Freshness F Dates of last actions C8 Facebook

• Each social property is quantified based on social signals according to their

nature and signification.

16

1. Introduction 2. Related Work

5. Conclusion

Page 19: Social Priors to Estimate Relevance of a Resource

0

0,1

0,2

0,3

0,4

0,5

0,6

Like Share Comment Tweet Mention+1 Bookmark Share(LIn)

Results of individual integration of social signals

3.1 Proposed Approach4.4 Results: Single Priors and Combination Priors

3. Approach of SIR

4. Experimental Results

Facebook signals

17

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

Popularity Reputation All Criteria All Properties

Different combinations of social signals (social properties)

0

0,1

0,2

0,3

0,4

0,5

Lucene Solr ML.Hiemstra

baselines (Topical Models)

P@10 P@20 nDCG MAP

1. Introduction 2. Related Work

5. Conclusion

Page 20: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach4.4 Results: Impact of the Freshness

3. Approach of SIR

4. Experimental Results

18

0

0,1

0,2

0,3

0,4

0,5

Lucene Solr ML.Hiemstra

baselines (Topical Models)

P@10 P@20 nDCG MAP

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

Share Comment Share+Comment Popularity All Criteria All Properties

Without Integration of Freshness

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

Share Comment Share+Comment Popularity All Criteria All Properties

With Integration of Freshness

F F FF F F F

1. Introduction 2. Related Work

5. Conclusion

Page 21: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach4.5 Results: Feature Selection Algorithms Study

3. Approach of SIR

4. Experimental Results

Table 1. Selected Social Signals With Attribute Selection Algorithms

--- : Highly selected

--- : Moderately selected

--- : Less favored

19

1. Introduction 2. Related Work

5. Conclusion

Page 22: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach4.6 Results: Ranking Correlation Analysis

3. Approach of SIR

4. Experimental Results

Fig 1. Spearman correlation between social signals and relevance

Fig 2. Spearman correlation between social properties and relevance20

1. Introduction 2. Related Work

5. Conclusion

Page 23: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach4.6 Results: Ranking Correlation Analysis

3. Approach of SIR

4. Experimental Results

Fig 3. Spearman's Rho correlation values for the social signals pairs

21

The social signals pairs: (tweet, share(LIn)), (bookmark, Tweet) and (mention +1,

bookmark) are highly correlated, i.e., the similarity scores of these pairs are higher

than 0.70

bookmark, share (LIn) are the less important criteria followed by mention +1.

1. Introduction 2. Related Work

5. Conclusion

Page 24: Social Priors to Estimate Relevance of a Resource

3.1 Proposed Approach

1. Introduction 2. Related Work

5. Conclusion

5. Conclusion

3. Proposed Approaches

4. Experimental Results

• Social Information Retrieval based on Language Model

- Topical relevance (retrieval model based content only).

- Social relevance (retrieval model based content and social features).

• Experimental Evaluation

- Superiority of proposed approach compared to textual models (baselines).

- Positive ranking correlation between social signals and relevance.

- Attribute selection algorithms.

• Perspectives

- Integration of other social features.

- Further study on the impact of the temporal property.

- Comparison of the proposed models with other social models.

- Experimental evaluation on other types of dataset.

22

Page 25: Social Priors to Estimate Relevance of a Resource

http://www.irit.fr/~Ismail.Badache/

Thank you @IIiX2014 for travel support