learning to reinforce search effectiveness

32
Learning to Reinforce Search Effectiveness Jiyun Luo, Xuchu Dong, Grace Hui Yang Georgetown University ICTIR 2015 1

Upload: grace-yang

Post on 16-Apr-2017

412 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Learning to Reinforce Search Effectiveness

Learning to Reinforce Search Effectiveness

Jiyun Luo, Xuchu Dong, Grace Hui Yang Georgetown University

ICTIR 2015

1

Page 2: Learning to Reinforce Search Effectiveness

Search by ‘Test-the-Water’

2

Page 3: Learning to Reinforce Search Effectiveness

But you are not alone: Search with a Partner• A teamwork

• Share a common goal

• find relevant documents

• satisfy long term goal

• Equal partners

• not just being an assistant to the user

• but also providing influence

• Cooperative exploration

3

Page 4: Learning to Reinforce Search Effectiveness

Key Idea: Cooperative Exporation

• The two parties

• They talk and they listen

• keep exchanging their ideas

• Take turns to lead the search to the direction each of them would like this collaboration to go

• Also considering the other’s opinion

4

Page 5: Learning to Reinforce Search Effectiveness

t=1 Query q1=“hydropower efficiency”

Messages:Messages:

See my new query. Let’s explore!

5

Example is from TREC 2014 Session 52

Page 6: Learning to Reinforce Search Effectiveness

t=1 Query q1=“hydropower efficiency”

Retrieved docs D1 “…renewable energy…”

Messages:Messages:

Check it out! Documents I’ve ranked high are relevant

See my new query. Let’s explore!

6

Example is from TREC 2014 Session 52

Page 7: Learning to Reinforce Search Effectiveness

t=1

t=2

Query q1=“hydropower efficiency”

Clicked d2 in D1Query q2=“hydropower environment”

Retrieved docs D1 “…renewable energy…”

Messages:Messages:

Check it out! Documents I’ve ranked high are relevant

See my new query. Let’s explore!

Documents I’ve clicked look relevant! My new query is on another subtopic. Let’s explore

7

Example is from TREC 2014 Session 52

Page 8: Learning to Reinforce Search Effectiveness

t=1

t=2

Retrieved docs D2

Query q1=“hydropower efficiency”

Clicked d2 in D1Query q2=“hydropower environment”

Retrieved docs D1 “…renewable energy…”

Messages:Messages:

Check it out! Documents I’ve ranked high are relevant

Check it out! Documents I’ve ranked high are relevant.

See my new query. Let’s explore!

Documents I’ve clicked look relevant! My new query is on another subtopic. Let’s explore

8

Example is from TREC 2014 Session 52

Page 9: Learning to Reinforce Search Effectiveness

t=1

t=2

t=3Retrieved docs D2

Query q1=“hydropower efficiency”

Clicked d2 in D1Query q2=“hydropower environment”

Clicked d2 in D2 Query q3=“hydropower damage”

Retrieved docs D1 “…renewable energy…”

Messages:Messages:

Check it out! Documents I’ve ranked high are relevant

Check it out! Documents I’ve ranked high are relevant.

See my new query. Let’s explore!

Documents I’ve clicked look relevant!

Documents I’ve clicked look relevant!

My new query is on another subtopic. Let’s explore

My new query is still on the same subtopic. Let’s find out more about it.

9

Example is from TREC 2014 Session 52

Page 10: Learning to Reinforce Search Effectiveness

t=1

t=2

t=3…t=4

Retrieved docs D2

Retrieved docs D3

Query q1=“hydropower efficiency”

Clicked d2 in D1Query q2=“hydropower environment”

Clicked d2 in D2 Query q3=“hydropower damage”

Retrieved docs D1 “…renewable energy…”

Messages:Messages:

Check it out! Documents I’ve ranked high are relevant

Check it out! Documents I’ve ranked high are relevant.

See my new query. Let’s explore!

Documents I’ve clicked look relevant!

Documents I’ve clicked look relevant!

Want to explore? I’ve diversified my results.

My new query is on another subtopic. Let’s explore

My new query is still on the same subtopic. Let’s find out more about it.

10

Example is from TREC 2014 Session 52

Page 11: Learning to Reinforce Search Effectiveness

Opinions about Two Things• Relevance

• Which documents (that you have just marked/retrieved/recommended) are relevant

• Desire of Exploration

• How exploratory I want us to be, as a team

11

Page 12: Learning to Reinforce Search Effectiveness

How to Express the Opinions/Feedback

• Relevance is “demonstrated by examples”: • Query is a piece of short text sent from the user • Clicked snippets/documents are long pieces of

text sent from user • Documents are long text sent from the search

engine • Desire of Exploration is shown by

• Query changes • Diversified results

12

Page 13: Learning to Reinforce Search Effectiveness

A Contextual Bandit Formulation of a Decision-Making Distribution

P (relevant) = 1� ✏P (explore) = µ

P (J = RE|o, a,⇡⇤) = (1� ✏)µ

P (J = NRE|o, a,⇡⇤) = ✏µ

P (J |o, a,⇡⇤) = P (relevant)P (explore)

P (J = RNE|o, a,⇡⇤) = (1� ✏)(1� µ)

P (J = NRNE|o, a,⇡⇤) = ✏(1� µ)

13

Page 14: Learning to Reinforce Search Effectiveness

Relevance Feedback from the User

• 1 SAT-Clicked out of 10 retrieved,

✏ = 1� # of SAT-Clicked documents 2 Dt�1

# of returned documents 2 Dt�1

14

" = 1� 1

10= 0.9

smoking quitting !q2 ! hypnosis !

Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs…!… !Rank 10: …!

SAT-Clicked. !Dwell time: 40 seconds!

D1 !

Page 15: Learning to Reinforce Search Effectiveness

Exploration Feedback from the User

• 1 query change , 3 terms in the new query in total

µ = 1� # of query changes 2 Dt�1

# of permutations of query terms 2 Dt�1

15

smoking quitting !q2 ! hypnosis !+∆q "

Rank 1: Easy Ways to Quit Smoking | Quit Smoking Help … !… !Rank 3: Quit Smoking Toolbox - Quit Smoking - Nicotine Addiction … !… !Rank 6: Quit Smoking Hypnosis, Stop Smoking Hypnosis CDs…!… !Rank 10: …!

D1 !

Query reformulation using words in previous search results!

2 Dt�1

µ = 1� 1

3!= 0.83

Page 16: Learning to Reinforce Search Effectiveness

Relevance Feedback from the Search Engine

• Highly scored documents

• Needs consistency in ranking scores

• Could be hard to get

• Highly ranked documents

✏ = 1� # of relevant documents 2 top n retrieved

n

16

Page 17: Learning to Reinforce Search Effectiveness

Relevance Feedback from the Search Engine

17

smoking quitting !q2 !

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart…!… !Rank 10: …!

hypnosis !

• 8 out of 10 top retrieved documents are relevant

• " = 1� 8

10= 0.2

Page 18: Learning to Reinforce Search Effectiveness

Exploration Feedback from the Search Engine

• More diversified results show more mixed results

• Observe the word distribution

• Higher perplexity

µ = 1� total # of the top m frequent non-stop-words 2 Dt

total # of non-stop-words 2 Dt

18

Page 19: Learning to Reinforce Search Effectiveness

Exploration Feedback from the Search Engine

19

smoking quitting !q2 !

D2 ! Rank 1: Quit Smoking Hypnosis | Stop Smoking Hypnosis CDs Quit Smoking Hypnosis Neuro… !… !Rank 4: Quit Smoking with Video Hypnosis Home Shopping Cart…!… !Rank 10: …!

hypnosis !

• 428 non-stop-words in the top 10 snippets

• the most frequent 5 words: “smoke”(59),“quit”(34),“hypnosis”(30),“stop”(19),“button”(7)

• µ = 1� 59 + 34 + 30 + 19 + 7

428= 0.36

Page 20: Learning to Reinforce Search Effectiveness

Put into a POSG Framework• Partially Observable Stochastic Games (POSGs)

• multiple-agent version of POMDP

• A tuple <S,G,T,R> for States, Agents, Transitions, Rewards

• G is a tuple too, for a set of agents , each is <A,O,B>

• Actions, Observations, and Beliefs

20

Page 21: Learning to Reinforce Search Effectiveness

Observation-Action Pairs

• indicates at time t that we can observe how the user has browsed the previously retrieved search results, clicked the documents, and reformulated the query at the current search iteration.

• indicates that, at time t, the search engine selects among its search algorithm options, executes the search algorithms, and provides a ranked list of search results.

21

(ot, at)

ot

at

Page 22: Learning to Reinforce Search Effectiveness

Expectation Maximization (EM) to Learn the Model

• Starts with a random policy

• At the Expectation step

• Compute the decision-making distribution

• Index the most likely decision by j

• A new policy is estimated by finding the best policy at step t given the current estimates of model parameters and

• At the Maximization step

• Re-compute model parameters based on new estimate of the policy

22

Page 23: Learning to Reinforce Search Effectiveness

23

Page 24: Learning to Reinforce Search Effectiveness

Experiments

• TREC 2012, 2013, 2014 Session Track data

• Immediate Search Effectiveness

• nDCG@10 at each search iteration

• TREC used nDCG@10 at the last search interaction

24

Page 25: Learning to Reinforce Search Effectiveness

Baselines• Lemur: Lemur worked on the last query in a session

• Lemur+all: Lemur concatenating all the queries in a session

• QCM: query change model

• Win-win short: Win-Win uses short-term feedback, e.g. user clicks, as rewards

• Win-win long: Win-Win uses long-term feedback, nDCG, as rewards

• served as a performance upper bound

25

Page 26: Learning to Reinforce Search Effectiveness

TREC 2012 Session

26

• fl performs the best besides winwin-long

• lemur+all, qcm, winwin-long and fl monotonically increase over iterations

• winwin-long > fl, qcm, lemur+all > winwin-short >lemur > original

Page 27: Learning to Reinforce Search Effectiveness

TREC 2013 Session

27

• Performance boost at around 2nd iteration and converge at the 5~6th iterations

• First a few queries are more representative

Page 28: Learning to Reinforce Search Effectiveness

TREC 2014 Session

28

• fl achieves significant nDCG@10 improvement over qcm on TREC’13 and TREC’14

Page 29: Learning to Reinforce Search Effectiveness

–A new thinking

“The search engine and the user are equal partners.”

29

Page 30: Learning to Reinforce Search Effectiveness

Based on that, this paper

• Models the two-way communication between the two partners on

• relevance

• desire to explore

• Proposes an EM algorithm for learning the best policy in this framework

30

Page 31: Learning to Reinforce Search Effectiveness

Look into the future• Reinforcement-learning-style methods are good for

modeling information seeking

• A lot of room to study the user and the search engine interaction in a generative way

• The thinking of equal partnership and two-way communication could be able to generate a set of new methods and algorithms

• on not only retrieval, but other related fields

• Exciting!!

31

Page 32: Learning to Reinforce Search Effectiveness

Thank You!• Email: [email protected]

• Group Page: Infosense at http://infosense.cs.georgetown.edu/

• Dynamic IR Website: http://www.dynamic-ir-modeling.org/

• Live Online Search Engine: http://dumplingproject.org

• Upcoming Book: Dynamic Information Retrieval Modeling

• TREC 2015 Dynamic Domain Track: http://trec-dd.org/

32