query reformulation as a predictor of search satisfaction...ahmed hassan, xiaolin shi, nick craswell...

Query Reformulation as a Predictor of

Search Satisfaction

Ahmed Hassan, Xiaolin Shi, Nick Craswell and Bill Ramsey

Online Satisfaction Measurement

• Satisfying users is the main objective

of any search system

• Measuring user satisfaction is

essential for improving the system

Satisfaction and Implicit Behavior

• How can we model user satisfaction?

– Implicit behavior

• Clicks are the best-known implicit signal

– Clickthrough (e.g., Joachims, 2002, Agichtein et al.

SIGIR’06, Carterette, Jones, NIPS’07, etc.)

– Dwell Time (e.g., Fox et al., TOIS’05)

– Interleaving (e.g., Joachims, KDD’02, Radlinski et al.,

CIKM’08)

Why not Just Use Clicks?

greenfield, mn accident

Time spent on page: 38 seconds

Session Ends

greenfield, mn accident

Woman dies in a fatal accident in greenfield, minnesota

• User performed this search on July 1st

• User was probably looking for

Query Click Query

• User clicked on a result

• The dwell time is long

• But, user was not satsified

Clicks do not always mean satisfaction

Lack of clicks does not always mean dissatisfaction

Weather in san francisco

Query Reformulation

Give Up

Reformulation satisfaction

• What do users do when they do not like the results?

Query Reformulation

• OR:

reformulation satisfaction

Give Up

reformulation search satisfaction

• Another implicit feedback signal that did not receive as much

attention is query reformulation

Query Reformulation

• Query Reformulation is the act of submitting a query to modify

a previous query in hope of retrieving better results

• Reformulations vs. Related Queries

reformulation satisfaction

reformulation search satisfaction

food in san francisco

weather in san francisco

A reformulation

Not a reformulation

Clicks and Reformulation

• Clickthrough Rate (CTR) of different sets of pairs relative to

CTR of all pairs

overall

0% 11% -21%

-29% -17% -39%

25% 24% 29%

Overall Not Similar Similar

Query Similarity

• Queries are similar if they share a non-stop-word term

• Queries have short time difference if the difference between their timestamps

is less than 5 minutes

CTR of all pairs

- Similar pairs had 21% below average CTR

- Pairs where Q1 and Q2 are not similar had 11% above average CTR

overall

0% 11% -21%

-29% -17% -39%

25% 24% 29%

Query Similarity

CTR of all pairs

- Pairs with short time diff. had 29% below average CTR

- Pairs with long time diff. had 25% above average CTR

overall

0% 11% -21%

-29% -17% -39%

25% 24% 29%

Query Similarity

CTR of all pairs

- Similar pairs with short time diff. had 39% below average CTR

- Pairs that are not similar and had long time diff had 24% above average

overall

0% 11% -21%

-29% -17% -39%

25% 24% 29%

Query Similarity

overall

0% 11% -21%

-29% -17% -39%

25% 24% 29%

CTR of all pairs

- Pairs with long time diff. are very similar indicating that query

similarity has little effect if the time between queries is large

Query Similarity

Approach

• Query Representation

• Query Reformulation Prediction

• Query Success Prediction

– Using clicks only

– Using reformulation only

– Using both clicks and reformulation

Query Representation

• Query Normalization

– Lower-casing

– Replacing runs of whitespaces with a single space

– Word breaking (using a character level n-gram model)

southjeseycraigslist south jesey craigslist

VerizonWireless verizon wireless

Query Representation

• Queries to Keywords

– For a query x = 𝑥1, 𝑥2, … , 𝑥𝑛 , find a mapping x → y ∈ 𝑌𝑛,

where y is a segmentation from the set 𝑌𝑛

– A segment break is introduced whenever the point wise

mutual information (PMI) between two consecutive words

drops below a certain threshold 𝜏.

𝑃𝑀𝐼(𝑥𝑖 , 𝑥𝑖+1) = log𝑝 𝑥𝑖 , 𝑥𝑖+1

𝑝 𝑥𝑖 𝑝 𝑥𝑖+1

Query Keywords

hotels in san francisco hotels in san_francisco

Hyundai roadside assistance phone number hyundai roadside_assistance phone_number

kodak easyshare recharger chord Kodak_easyshare recharger_chord

user reviews for apple ipad user_reviews for apple_ipad

Matching Keywords

• Exact Match

– The two phrases match exactly.

• Approximate Match

– To capture spelling variants and misspelling, we allow two

keywords to match if the Levenshtein edit distance between

them is less than 2.

• Semantic Match

– Using the depth of the Least Common Subsumer (LCS) in

the WordNet hierarchy.

𝑤𝑢𝑝 𝑡𝑖 , 𝑡𝑗 =2 ∗ 𝑑𝑒𝑝𝑡ℎ(𝐿𝐶𝑆)

𝑑𝑒𝑝𝑡ℎ 𝑡𝑖 + 𝑑𝑒𝑝𝑡ℎ(𝑡𝑗)

Query Reformulation Prediction

Textual Features

normalized Levenshtein edit distance

1 if lev > 2, 0 otherwise

num. characters in common starting from the left

num. characters in common starting from the right

num. words in common starting from the left

num. words in common starting from the right

num. words in common

Jaccard distance between sets of words

Adopted from (Jones and Klinkner., CIKM’08)

Keyword Features

num. of “exact match” keywords in common

num. of “approximate match” keywords in common

num. of “semantic match” keywords in common

num. of keywords in Q1

num. of keywords in Q2

num. of keywords in Q1 but not in Q2

num. of keywords in Q2 but not in Q1

1 if Q1 keywords all Q2’s keywords

1 if Q2 keywords all Q1’s keywords

Other Features

time between queries in seconds

time between queries as a binary feature (5 mins, 30

mins, 60 mins, 120 mins)

cosine distance between vectors derived from the first 10

search results for the query terms.

Query Reformulation Performance

Heurisitic Textual Keywords All

- Keyword features outperform textual features

- Best performance when all features are combined

Query Satisfaction Prediction

1 Clicks Only A query Q is successful if it receives at least one

2 SAT Clicks Only

A query Q is successful if it receives at least one

long dwell time click (thresholds: 10, 30 and 50

seconds)

3 Reformulation Only

Predict success using reformulation features only

(i.e. assume users will always reformulate their

queries when not successful)

4 Reformulation + Clicks

(classifier)

Train a classifier using both reformulation and click

features.

Results

• Clicks Only method performs poorly

• Many queries that receive a click still end up

being unsuccessful

Clicks Only Sat Click Only ReformulationOnly

Reformulation +Clicks

Results

• Accuracy improves when only SAT clicks are

considered

Results

• Better performance if we use clicks only

Results

• Best performance when we learn a classifier using both the

reformulation and the click features

Reformulation Only vs. Reformulation + Clicks

• Reformulation Only achieves high DSAT but low SAT precision

• Reformulation + clicks achieves good performance for both SAT

and DSAT cases

ReformulationOnly

Reformulation Behavior and Search Tasks

• Queries in successful tasks

• Queries in unsuccessful tasks

Queries in unsuccessful tasks have higher similarity than

queries in successful tasks

Data from (Hassan et al., CIKM’11)

Conclusions

• We can reliably identify query reformulations

• Query reformulation is a strong predictor of search success

• Best results when using both query reformulation and clicks

• Reformulation behavior differs in successful and

unsuccessful tasks

Thanks !

Ahmed Hassan

hassanam@microsoft.com

query reformulation as a predictor of search satisfaction...ahmed hassan, xiaolin shi, nick craswell...

Documents

xiaolin guo 2008 myanmar burma challenges and perspectives

citizen satisfaction survey - mississauga.ca · mississauga...

organizational communication satisfaction and job...

customer satisfaction 2013. customer satisfaction campione...

didier - furniture design melbourne€¦ · design essay...

client satisfaction survey -...

job satisfaction and satisfaction in financial situation

measuring cooperative robotic systems using simulation-based...

an experimental comparison of click position-bias models...

guest satisfaction enhances patient satisfaction -...

the very small world of the well-connected xiaolin shi, matt...

research user information satisfaction, job satisfaction and

macaw: an extensible conversational information seeking...

xiaolin li, jun liu, yuyan shao, huilin pan, pengfei yan...

focused crawling for both topical relevance and quality of...

undergraduate satisfaction survey longitudinal · student...

n .c coming to health strawbery banke wellness to ... ·...

context-based adaptive entropy coding xiaolin wu mcmaster...

facebook satisfaction, life satisfaction: malaysian

pay satisfaction, job satisfaction and turnover intent