optimized interleaving for online retrieval evaluation

30
Optimized Interleaving for Online Optimized Interleaving for Online Retrieval Evaluation Retrieval Evaluation (Best paper in (Best paper in WSDM’13) WSDM’13) Author: Author: Filip Filip Radlinski, Radlinski, Nick Nick Craswell Craswell Slides By: Slides By: Han Jiang Han Jiang

Upload: han-jiang

Post on 07-Aug-2015

112 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Optimized interleaving for online retrieval evaluation

Optimized Interleaving for Online Retrieval Optimized Interleaving for Online Retrieval Evaluation Evaluation

(Best paper in(Best paper in WSDM’13) WSDM’13)

Author: Author: Filip Radlinski,Filip Radlinski,

Nick CraswellNick CraswellSlides By:Slides By: Han Jiang Han Jiang

Page 2: Optimized interleaving for online retrieval evaluation

AgendaAgenda

Basic conceptsBasic concepts

Previous algorithmsPrevious algorithms

FrameworkFrameworkInvert ProblemInvert Problem

Refine ProblemRefine Problem

Theoretical benefitsTheoretical benefits

IllustrationIllustration

EvaluationEvaluation

DiscussionDiscussion

Page 3: Optimized interleaving for online retrieval evaluation

Basic conceptsBasic concepts

What is interleaving?What is interleaving?

Merge results from different retrieval algorithms.Merge results from different retrieval algorithms.

Only a combined list is shown to user.Only a combined list is shown to user.

The quality of algorithms can be infered with the help The quality of algorithms can be infered with the help of clickthrough data.of clickthrough data.

Interleaved list

Source List A

Search Engine A Query

Clicks

Search Engine B

Source List B

Interleaving Algorithm

Assignment

Credit function

Evaluation Result

Page 4: Optimized interleaving for online retrieval evaluation

Basic concepts +Basic concepts +

OK, then toss a coin instead, and OK, then toss a coin instead, and

Credit function = if Credit function = if ddii is clicked and higher in ranker A, is clicked and higher in ranker A, prefer A.prefer A.

Ah, that’s easy…how about:Ah, that’s easy…how about:

Interleaving method = pickup best results from each Interleaving method = pickup best results from each algorithms?algorithms?

Wait… how do we know whether d1 is better than d4?

Urgh… When a user randomly click on (d1,d2,d3), A is always preferred…

Page 5: Optimized interleaving for online retrieval evaluation

Basic concepts ++Basic concepts ++

So, what is a So, what is a good interleaving interleaving algorithm?algorithm?

[*] Joachims , Joachims , Optimizing Search Engines Using Clickthrough Data, KDD’02

Intuitively*, a good one should:Intuitively*, a good one should:

Be blind to user. Be Be blind to user. Be blind to retrieval functions. to retrieval functions.

Be robust to Be robust to biases in the user’s decision process in the user’s decision process (that do not relate to (that do not relate to retrieval quality)retrieval quality)

Not substantially alter the search experienceNot substantially alter the search experience

Lead to clicks that reflect the user’s preferenceLead to clicks that reflect the user’s preference

Page 6: Optimized interleaving for online retrieval evaluation

AgendaAgenda

Basic concepts √Basic concepts √

Previous algorithmsPrevious algorithms

FrameworkFrameworkInvert ProblemInvert Problem

Refine ProblemRefine Problem

Theoretical benefitsTheoretical benefits

IllustrationIllustration

EvaluationEvaluation

DiscussionDiscussion

Page 7: Optimized interleaving for online retrieval evaluation

Previous AlgorithmsPrevious Algorithms

Balanced InterleavingBalanced Interleavingtoss a coin once, pick up best items by turns.toss a coin once, pick up best items by turns.

Team Draft InterleavingTeam Draft Interleaving toss a coin every two timestoss a coin every two times, , pick up best item from winner pick up best item from winner

firstfirst

Probabilistic InterleavingProbabilistic Interleaving toss a coin every time, toss a coin every time, sample item from winneritem from winner

A weight function ensures that doc in higher rank has higher probability to be picked up

Page 8: Optimized interleaving for online retrieval evaluation

Previous Algorithms +Previous Algorithms +About credit functions, only documents that are About credit functions, only documents that are clicked by by

users are consideredusers are considered

Balanced Interleaving Balanced Interleaving (coin=A)(coin=A)

A: dA: d11 d d22 d d33 d d44

B: dB: d44 d d11 d d22 d d33

M: M: dd11 d d44 d d22 dd33

clicks on: dclicks on: d11 d d33

A: A: dd11 d d22 dd33 d d44

B: dB: d44 dd11 d d22 dd33

A: A: dd11 d d22 dd33

B: dB: d44 dd11 d d22

A wins

Team Draft Interleaving Team Draft Interleaving (coin=AA)(coin=AA)

A: dA: d11 d d22 d d33 d d44

B: dB: d44 d d11 d d22 d d33

M: M: dd11 d d44 d d22 dd33

clicks on: dclicks on: d11 d d33

A: A: dd11 d d22 d d33 d d44

B: dB: d44 d d11 d d22 dd33

tie

Probabilistic Interleaving (possible Probabilistic Interleaving (possible coin=AA, AB)coin=AA, AB)

A: dA: d11 d d22 d d33 d d4 4 A: dA: d11 d d22 d d33 d d44

B: dB: d44 d d11 d d22 d d33 B: d B: d44 d d11 d d22 d d33

M: M: dd11 d d44 d d22 dd33

clicks on: dclicks on: d11 d d33

A: A: dd11 d d22 d d33 d d4 4 A: A: dd11 d d22 dd33 d d44

B: dB: d44 d d11 d d22 dd33 B: d B: d44 d d11 d d22 dd33

A wins with p=100%

Page 9: Optimized interleaving for online retrieval evaluation

AgendaAgenda

Basic concepts √Basic concepts √

Previous algorithms √Previous algorithms √

FrameworkFrameworkInvert ProblemInvert Problem

Refine ProblemRefine Problem

Theoretical benefitsTheoretical benefits

IllustrationIllustration

EvaluationEvaluation

DiscussionDiscussion

Page 10: Optimized interleaving for online retrieval evaluation

Invert the problemInvert the problem

Why previous algorithms are not good enough:Why previous algorithms are not good enough:Balanced interleaving & Team Draft interleaving: Balanced interleaving & Team Draft interleaving: biasedbiased

Probabilistic interleaving: degrading the user Probabilistic interleaving: degrading the user experienceexperience

Even a random click on the document raises up a winner.

blah… A=(d1, d2), B=(d1,d2), but M = (d2, d1)

Therefore, the problem of interleaving should be more Therefore, the problem of interleaving should be more constrained constrained

A good way is to start from the principles…A good way is to start from the principles…

Page 11: Optimized interleaving for online retrieval evaluation

Again, what is a Again, what is a good interleaving algorithm?interleaving algorithm?

Be blind to user. Be Be blind to user. Be blind to retrieval functions. to retrieval functions.

Be robust to Be robust to biases in the user’s decision process in the user’s decision process (that do (that do not relate to retrieval quality)not relate to retrieval quality)

Not substantially alter the search experienceNot substantially alter the search experience

Lead to clicks that reflect the user’s preferenceLead to clicks that reflect the user’s preference

Refine the problemRefine the problem

Not substantially alter the search experience Not substantially alter the search experience (show one of the (show one of the rankings, or a ranking “in between” the two)rankings, or a ranking “in between” the two)

Lead to clicks that reflect the user’s preference:Lead to clicks that reflect the user’s preference:If document If document dd is clicked, the input ranker that ranked is clicked, the input ranker that ranked dd higher is given is given more creditmore credit

A randomly clicking user doesn’t create a preference for either rankerA randomly clicking user doesn’t create a preference for either ranker

Be sensitive to input data Be sensitive to input data (fewest user queries show significant (fewest user queries show significant preference)preference)

Page 12: Optimized interleaving for online retrieval evaluation

Again, what is a Again, what is a good interleaving algorithm?interleaving algorithm?

Be blind to user. Be Be blind to user. Be blind to retrieval functions. to retrieval functions.

Be robust to Be robust to biases in the user’s decision process in the user’s decision process (that do (that do not relate to retrieval quality)not relate to retrieval quality)

Refine the problem +Refine the problem +

Not substantially alter the search experience Not substantially alter the search experience (show one of (show one of the rankings, or a ranking “in between” the two)the rankings, or a ranking “in between” the two)

Lead to clicks that reflect the user’s preference:Lead to clicks that reflect the user’s preference:If document If document dd is clicked, the input ranker that ranked is clicked, the input ranker that ranked dd higher is given is given more creditmore credit

A randomly clicking user doesn’t create a preference for either rankerA randomly clicking user doesn’t create a preference for either ranker

Be sensitive to input data Be sensitive to input data (fewest user queries show significant (fewest user queries show significant preference)preference)

Page 13: Optimized interleaving for online retrieval evaluation

Refine the problem ++Refine the problem ++

Not substantially alter the search experience Not substantially alter the search experience (show one (show one of the rankings, or a ranking “in between” the two)of the rankings, or a ranking “in between” the two)

Lead to clicks that reflect the user’s preference:Lead to clicks that reflect the user’s preference:If document If document dd is clicked, the input ranker that ranked is clicked, the input ranker that ranked dd higher is given is given more creditmore credit

A=(d1, d2), B=(d1,d2), M = (d1, d2)

A randomly clicking user doesn’t create a preference for either A randomly clicking user doesn’t create a preference for either rankerranker

num of clicks

score function, when >0, assign score to A, otherwise to B

length of list

a possible interleaved list under previous constraints

Page 14: Optimized interleaving for online retrieval evaluation

Be sensitive to input data Be sensitive to input data (fewest user queries show significant (fewest user queries show significant preference)preference)

Refine the problem +++Refine the problem +++

Page 15: Optimized interleaving for online retrieval evaluation

Refine the problem ++++Refine the problem ++++

So the constraint So the constraint is:is:

And target is: And target is:

With variable: the definition of With variable: the definition of

Page 16: Optimized interleaving for online retrieval evaluation

Define predict Define predict function: function: δδ

Linear Rank difference:Linear Rank difference:

Inverse Rank:Inverse Rank:

Since it is a optimization problem, the existence of solution should be guaranteed theoretically. While in the paper it is only guaranteed empirically.

Page 17: Optimized interleaving for online retrieval evaluation

Theoretical BenefitsTheoretical Benefits

PROPERTY 1: Balanced interleaving Balanced interleaving ⊆ T⊆ This framework his framework

PROPERTY 2: Team Draft interleaving Team Draft interleaving ⊆ T⊆ This his framework framework PROPERTY 3: This framework This framework ⊆ ⊆ Probabilistic Probabilistic interleavinginterleaving

PROPERTY 4: The merged list is something “in between” The merged list is something “in between” the twothe two

Page 18: Optimized interleaving for online retrieval evaluation

Theoretical Benefits +Theoretical Benefits +

PROPERTY 5: Breaking case in Balanced interleaving Breaking case in Balanced interleaving is is omittedomittedPROPERTY 6: Insensitivity in Team Draft interleaving Insensitivity in Team Draft interleaving is is improvedimprovedPROPERTY 7: Probabilistic interleaving will degrade more user Probabilistic interleaving will degrade more user experience experience

Page 19: Optimized interleaving for online retrieval evaluation

IllustrationIllustration

L1 unbiased towards random user: 3*25% + (-1)*(35% + 40%) = 0

Note: the number of constraint is 5, but unknown factor is 6?

(it is a maximization problem, and the goal is to maximize sigma{pi * sensitivity(L_i)}

An option to pursue is sensitivity

Page 20: Optimized interleaving for online retrieval evaluation

AgendaAgenda

Basic concepts √Basic concepts √

Previous algorithms √Previous algorithms √

Framework √Framework √Invert Problem √Invert Problem √

Refine Problem √Refine Problem √

Theoretical benefits √Theoretical benefits √

Illustration Illustration √√

EvaluationEvaluation

DiscussionDiscussion

Page 21: Optimized interleaving for online retrieval evaluation

Evaluation: summaryEvaluation: summary

Construct a dataset to simulate interleaving and user Construct a dataset to simulate interleaving and user interactinteractEvaluate Pearson correlation between each two Evaluate Pearson correlation between each two algorithms.algorithms.Analyze cases that algorithms disagreeAnalyze cases that algorithms disagree

Evaluate result quality by expertsEvaluate result quality by experts

Analyze bias and sensitivity among algorithmsAnalyze bias and sensitivity among algorithms

Page 22: Optimized interleaving for online retrieval evaluation

Evaluation +: construction of Evaluation +: construction of datasetdataset

Collect all query as well as Collect all query as well as top-4 results from a search results from a search engineengineSince the web and algorithm is changing, there are many Since the web and algorithm is changing, there are many distinct rankings for the same query. distinct rankings for the same query.

For each query, make sure that there’re at least For each query, make sure that there’re at least 4 distinct rankings, each shown to user at least distinct rankings, each shown to user at least 10 times, times, with at least with at least 1 click. click.

The most frequent ranking sequence is regarded as A, a The most frequent ranking sequence is regarded as A, a most dissimilar one is regarded as B. one is regarded as B.

Further filter the log, so that results produced by either Further filter the log, so that results produced by either Balanced interleaving and Team Draft interleaving are Balanced interleaving and Team Draft interleaving are frequent.frequent.

Page 23: Optimized interleaving for online retrieval evaluation

Evaluation ++Evaluation ++

Page 24: Optimized interleaving for online retrieval evaluation

Evaluation +++Evaluation +++

Page 25: Optimized interleaving for online retrieval evaluation

Evaluation ++++Evaluation ++++

Bias comparison among different Bias comparison among different algorithmsalgorithms

Page 26: Optimized interleaving for online retrieval evaluation

Evaluation +++++Evaluation +++++

Sensitivity comparison among different Sensitivity comparison among different algorithmsalgorithms

Page 27: Optimized interleaving for online retrieval evaluation

AgendaAgenda

Basic concepts √Basic concepts √

Previous algorithms √Previous algorithms √

Framework √Framework √Invert Problem √Invert Problem √

Refine Problem √Refine Problem √

Theoretical benefits √Theoretical benefits √

Illustration Illustration √√

Evaluation √Evaluation √

DiscussionDiscussion

Page 28: Optimized interleaving for online retrieval evaluation

DiscussionDiscussion

Contribution in this paper:Contribution in this paper:

Invert the question of obtaining interleaving Invert the question of obtaining interleaving algorithms as a constrained optimization algorithms as a constrained optimization problemproblemThe solution is very intuitive, and The solution is very intuitive, and generalgeneral

Many interesting examples to illustrate the breaking Many interesting examples to illustrate the breaking cases for previous approachescases for previous approaches

The evaluation is simulated on logs from only one search The evaluation is simulated on logs from only one search engine.engine.For interleaving, we’re expecting an evaluation based on different search engines?

And that is why human evaluation result is not good among all algorithms.

Note:Note:

Page 29: Optimized interleaving for online retrieval evaluation

Discussion +Discussion +

““A and B are not shown to users as they have low A and B are not shown to users as they have low sensitivity”sensitivity”This is intuitive, however it violates the result shown in Table 1: (a,b,c,d) has sensitivity 0.83,

which is high?

Page 30: Optimized interleaving for online retrieval evaluation

Thank You !Thank You !