personalized diversification of search results

27
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet , Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker: Shun-Chen, Cheng

Upload: altessa

Post on 14-Jan-2016

77 views

Category:

Documents


0 download

DESCRIPTION

PersonALIZED Diversification of Search Results. Date: 2013/04/15 Author: David Vallet , Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia -ling, Koh Speaker: Shun-Chen, Cheng. Outline. Introduction Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PersonALIZED  Diversification of Search Results

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTSDate: 2013/04/15

Author: David Vallet , Pablo Castells

Source: SIGIR’12

Advisor: Dr.Jia-ling, Koh

Speaker: Shun-Chen, Cheng

Page 2: PersonALIZED  Diversification of Search Results

Outline

Introduction

Personalized Diversity

• IA-Select、 xQuAD

• Personalized IA-Select

• Personalized xQuAD

Evaluation

Experiment Results

Conclusions

Page 3: PersonALIZED  Diversification of Search Results

Introduction• Search Personalization:

adapt the search result to a specific aspect that may interest the user

………….…..…..

…….……

Query

Ranking with similarity between Query and result list

………….…..…..

…….……

Result list

Ranked Result list

Page 4: PersonALIZED  Diversification of Search Results

Introduction• Diversification:

regard multiple aspects in order to maximize the probability that some query aspect is relevant to the user

Query

………….…..…..

…….……

Result list

c1

c2

c3

Clustering

Clustered Result list

Page 5: PersonALIZED  Diversification of Search Results

Introduction

Goal: we question this antagonistic view, and hypothesizethat these two directions may in fact be effectively combined andenhance each other.

Page 6: PersonALIZED  Diversification of Search Results

Introduction

Page 7: PersonALIZED  Diversification of Search Results

Outline

Introduction

Personalized Diversity

• IA-Select、 xQuAD

• Personalized IA-Select

• Personalized xQuAD

Evaluation

Experiment Results

Conclusions

Page 8: PersonALIZED  Diversification of Search Results

IA-Select、 xQuAD• using an explicit representation of query intents for

diversification.• IA-Select:

• xQuAD(eXplicit Query Aspect Diversification):

Page 9: PersonALIZED  Diversification of Search Results

Personalized IA-Select

• A personalized search system: p(q|d,u)• The personalized query aspect distribution: p(c|q,u)• The personalized aspect distribution over documents: p(c|d,u) p(q|d,u)

= Position of document d in the order induced by the retrieval system scores s(d,q) for d R∈ q

assume q and u are conditionally independent given a document

Page 10: PersonALIZED  Diversification of Search Results

p(c|d,u)

assume conditional independence between documents and users given a query aspect

assume conditional independence between aspects and users given a document.

Page 11: PersonALIZED  Diversification of Search Results

w : a tag in the folksonomy(Delicious)

tf(w,u) :the number of times a user used the tag in their profile bookmark annotations.

tf(w,d) :number of times a tag was used (by any user) to annotate a document.

Δ = document collection

1. User preference model by an adaption of the BM25 probabilistic model:

iuf(w) : the inverse user frequency of term w in the set of users.|u| : the size of the user profile calculated as Σwtf(w,u).

b = 0.75k1 = 2

Two ways to calculate p(d|u):

2.

Page 12: PersonALIZED  Diversification of Search Results

p(c|q,u)

A convenient one is to develop p(c|q,u) by marginalizing over the set of documents, because it allows taking advantage of the computation of the two previous top-level components in equations 1 and 2

assume the conditional independence of query aspects and queries given a user and a document.

Page 13: PersonALIZED  Diversification of Search Results

Personalized xQuAD

• The personalized search system: p(q|d,u)• The personalized query aspect distribution: p(c|q,u)• The personalized, aspect-dependent document distribution: p(d|c,u)

p(d|c,u)

P(c|d): by Textwise ODP classification service. It returns up to three possible ODP classifications for a document, ranked by a score in [0,1] that reflects the degree of confidence on the classification.

assumed documents and users are conditionally independent given a query aspect.

Page 14: PersonALIZED  Diversification of Search Results

Outline

Introduction

Personalized Diversity

• IA-Select、 xQuAD

• Personalized IA-Select

• Personalized xQuAD

Evaluation

Experiment Results

Conclusions

Page 15: PersonALIZED  Diversification of Search Results

Evaluation

• Crowdsourcing service :Amazon mechanical turk, Crowdflower• Data set : Delicious • Assessment collection : four weeks• Tested user number : 35 users• for a total amount of 180 topics and 3,800 individual results.• randomly generated an equal amount of topics of size K = 1

and K = 2• top P = 5

Page 16: PersonALIZED  Diversification of Search Results

Evaluation

interactive evaluation interface

Page 17: PersonALIZED  Diversification of Search Results

Evaluation

• Q1 (user): how relevant is the result to the user’s interests.

• Q2 (topic): how relevant is the result to the evaluated topic.

• Q3 (subtopic): workers assign each result to a specific subtopic related

to the evaluated topic.

• Q1 measuring the accuracy of the evaluated approaches with respect to the user interest.

• Q2 : a successful reordering technique will place results high that are assessed as both relevant to the topic and to the user’s interests.

Page 18: PersonALIZED  Diversification of Search Results

Outline

Introduction

Personalized Diversity

• IA-Select、 xQuAD

• Personalized IA-Select

• Personalized xQuAD

Evaluation

Experiment Results

Conclusions

Page 19: PersonALIZED  Diversification of Search Results

Experiment Results• Nine different approaches :

• Baseline

• IA-Select

• xQuAD

• plain personalized search approach based on social tagging profiles and BM25 (PersBM25)

• xQuADBM25

• PIA-Select (probabilistic calculation of p(d|u))

• PIA-SelectBM25 (BM25 of p(d|u))

• PxQuAD

• PxQuADBM25

Page 20: PersonALIZED  Diversification of Search Results

Experiment Results

• to evaluate for diversity :

the intent aware version of expected reciprocal rank (ERR-IA), α-nDCG , and subtopic recall (S-recall)

• for accuracy :

nDCG and precision

Page 21: PersonALIZED  Diversification of Search Results

α-nDCGC1-1 C1-2 C1-3

D1

D2

D3

D4

α = 0.5

15.0*10.5)-J(d1,3)(10.5)-J(d1,2)(10.5)-J(d1,1)(1 G[1] 0rrr 3,02,01,0 00, ir

2

55.0*15.0*15.0*1

0.5)-J(d2,3)(10.5)-J(d2,2)(10.5)-J(d2,1)(1 G[2]

010

rrr 3,12,11,1

2

15.0*10.5)-J(d3,3)(10.5)-J(d3,2)(10.5)-J(d3,1)(1 G[3] 1rrr 3,22,21,2

2

15.0*10.5)-J(d4,3)(10.5)-J(d4,2)(10.5)-J(d4,1)(1 G[4] 1rrr 3,32,31,3

Page 22: PersonALIZED  Diversification of Search Results

1]1[]1[ GCG

2

7

2

51]2[]1[]2[ GGCG

42

1

2

7]3[]2[]3[ GCGCG

2

9

2

14]4[]3[]4[ GCGCG

11/1)11(log/]1[]1[ 2 GDCG

577.2)585.1/5.2(1

))21(log/]2[(]1[]2[ 2

GDCGDCG

827.2)2/5.0(577.2

))31(log/]3[(]2[]3[ 2

GDCGDCG

042.3)322.2/5.0(827.2

))41(log/]4[(]3[]4[ 2

GDCGDCG

IG:

5.0,5.0,1,5.2

ICG:

5.4,4,5.3,5.2

IDCG: 646.8708.6708.452 ,,,.

α-nDCG: 352.0,421.0,547.0,4.0

Page 23: PersonALIZED  Diversification of Search Results

Subtopic recall(S-recall)

s1,s2,s3,s4,s5,s6,s7,s8,s9,s10

topic T with nA subtopics subtopics(di) be the set of subtopics to which di is relevant.

T

S-recall(1) = 3/10S-recall(2) = 5/10S-recall(3) = 7/10S-recall(4) = 9/10

Subtopics(di)

D1 s1,s3,s10

D2 s3,s4,s6

D3 s2,s5

D4 s2,s7,s9

Page 24: PersonALIZED  Diversification of Search Results

Diversity metric values for the evaluated approaches

Bold : the best for each metric. Underlined : a statistically significant difference with respect to the baselineDouble underlined : a statistical significance with respect xQuAD (Wilcoxon, p < 0.05).PxQuADBM25 has a significantly better performance than the baseline

and plain diversification approaches in terms of ERR-IA and α-nDCG@5.

a negative effect of the probabilistic estimate of the personalized factor on the overall behavior of the PIA-Select algorithm.

Page 25: PersonALIZED  Diversification of Search Results

Accuracy metrics for evaluated approaches

User relevance : PersBM25,appears to be on par with PxQuADBM25Topic relevance : PersBM25 underperforms the baseline , while PxQuADBM25 improves the baseline to this regard, with statistical significance.

Page 26: PersonALIZED  Diversification of Search Results

Outline

Introduction

Personalized Diversity

• IA-Select、 xQuAD

• Personalized IA-Select

• Personalized xQuAD

Evaluation

Experiment Results

Conclusions

Page 27: PersonALIZED  Diversification of Search Results

Conclusionshave presented a number of approaches that combine

both personalization and diversification components

investigating the introduction of the user as an explicit random variable in two state of the art diversification models: IA-Select and xQuAD

Achieving statistically significant improvements over the baselines that range between 3%-11% in terms accuracy values, and between 3%-8% in terms of diversity values.