sandeep pandey 1, sourashis roy 2, christopher olston 1, junghoo cho 2, soumen chakrabarti 3 1...

31
Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results

Upload: corey-brent-hamilton

Post on 16-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3

1 Carnegie Mellon2 UCLA 3 IIT Bombay

Shuffling a Stacked Deck

The Case for Partially Randomized Ranking of Search Engine Results

2

@Carnegie MellonDatabases

Popularity as a Surrogate for Quality

Search engines want to measure the “quality” of pages

Quality is hard to define and measure

Various “popularity” measures are used in ranking– e.g., in-links, PageRank, user traffic

1. ---------2. ---------3. ---------

3

@Carnegie MellonDatabases

Relationship Between Popularity and Quality

Popularity : depends on the number of users who “like” a page– relies on both quality and awareness of the page

Popularity is different from quality – But strongly correlated when awareness is large

Usersaware of

page p

like page p

4

@Carnegie MellonDatabases

Problem

Popularity/quality correlation weak for young pages – Even if of high quality, may not (yet) be popular due to

lack of user awareness

Plus, process of gaining popularity inhibited by “entrenchment effect” – [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05]

[Mowshowitz et. al. Communication’02]

and many others

5

@Carnegie MellonDatabases

Entrenchment Effect

Search engines show entrenched (already-popular) pages at the top

Users discover pages via search engines; tend to focus on top results

1. ---------2. ---------3. ---------4. --------- 5. ---------6. --------- …

entrenched pages

user attention

new unpopular pages

6

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation

Summary

7

@Carnegie MellonDatabases

Alternative Approaches to Counter-act Entrenchment Effect

Weight links to young pages more – [Baeza-Yates et. al SPIRE ’02]– Proposed an age-based variant of PageRank

Extrapolate quality based on increase in popularity – [Cho et. al SIGMOD ’05]– Proposed an estimate of quality based on the

derivative of popularity

8

@Carnegie MellonDatabases

Our Approach: Randomized Rank Promotion

Select random (young) pages to promote to good rank positions

Rank position to promote to is chosen at random

1

2

3

500

501

..

1

500

2

499

501

..3

9

@Carnegie MellonDatabases

Our Approach: Randomized Rank Promotion

Consequence: Users visit promoted pages; improves ability to estimate quality via popularity

Compared with previous approaches: • Does not rely on temporal measurements (+)• Sub-optimal (-)

10

@Carnegie MellonDatabases

Exploration/Exploitation Tradeoff

Exploration/Exploitation tradeoff– exploit known high-quality pages by assigning

good rank positions– explore quality of new pages by promoting them

in rank

Existing search engines only exploit (to our knowledge)

11

@Carnegie MellonDatabases

Possible Objectives for Rank Promotion

Fairness– Give each page an equal chance to become popular– Incentive for search engines to be fair?

Quality– Maximize quality of search results seen by users (in

aggregate)– Quality page p: extent to which users “like” p– Q(p) [0,1]

our choice

12

@Carnegie MellonDatabases

Quality-Per-Click Metric (QPC)

V(p,t) : number of visits made to page p at time t through search engine

QPC : average quality of pages viewed by users, amortized over time

t p

t p

tpV

pQtpV

QPC

),(

)(),(

13

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation

Summary

14

@Carnegie MellonDatabases

Desiderata for Randomized Rank Promotion

Want ability to:– Control exploration/exploitation

tradeoff

– “Select” certain pages as candidates for promotion

– – “Protect’’ certain pages from

demotion

1

2

3

500

501

..

1

500

2

499

501

..3

15

@Carnegie MellonDatabases

Randomized Rank Promotion Scheme

WWm

W-Wm

Promotion pool

4

1

2

3

4

1

2

3random ordering

order by popularity Ld

Lm

Remainder

16

@Carnegie MellonDatabases

Randomized Rank Promotion Scheme

Ld

k-1

r 1-r

Promotion list

k = 3 r = 0.5

Remainder

1

1 2

2 3 4

3 4 5 6

1 2

Lm

17

@Carnegie MellonDatabases

Parameters

Promotion pool (Wm)– Uniform rank promotion : give an equal chance to each

page– Selective rank promotion : exclusively target zero

awareness pages

Start rank (k)– rank to start randomization from

Degree of randomization (r) – controls the tradeoff between exploration and exploitation

18

@Carnegie MellonDatabases

Tuning the Parameters

Objective: maximize quality-per-click (QPC)

Two ways to tune– Real-world experiment– Analytical modeling

19

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking – Randomized Rank Promotion Scheme– Model of ranking and popularity evolution– Evaluation

Summary

20

@Carnegie MellonDatabases

Popularity Evolution Cycle

Popularity P(p,t)

Rank R(p,t)

Awareness A(p,t)

Visit rateV(p,t)

21

@Carnegie MellonDatabases

Popularity Evolution Cycle

Popularity P(p,t)

Rank R(p,t)

Awareness A(p,t)

Visit rateV(p,t)

FAP(A(p,t))

FVA(V(p,t))

FPR(P(p,t))

FRV(R(p,t))

22

@Carnegie MellonDatabases

Deriving Popularity Evolution Curve

Po

pu

lari

ty

P(p

,t)

time (t)

Next step : derive formula for popularity evolution curve

Assumptions– Number of pages constant– Pages are created and retired according to a Poisson

process with rate parameter – Quality distribution of pages is stationary

23

@Carnegie MellonDatabases

Deriving Popularity Evolution Curve

i

j jPRRV

jPRRV

iPRRVi qaFF

qaFF

aFFqaf

1

1

)).((

)).((

)1)).(0(()|(

Doing the steady state analysis, we get

DETAIL

Pp

m

pQxmiPR pQ

m

ifxF

)(/.1

)(|1)(

2/3

1

2/3)(

xi

vxF n

i

RV

q

qx

f

qxE

|

),(

24

@Carnegie MellonDatabases

Use Popularity Evolution Model to Tune Parameters

Model of popularity evolution process (see paper)– Complex dynamic process– To study, we combine approximate analysis with

simulation

Next step: use model to tune rank promotion scheme– Parameters: k, r and Wm

– Objective: maximize QPC

25

@Carnegie MellonDatabases

Tuning: Promotion Pool (Wm )

-no promotion - uniform promotion- selective promotion

k=1 and r=0.2

26

@Carnegie MellonDatabases

Tuning: k and r

k: start rank

r: degree of randomization

27

@Carnegie MellonDatabases

Tuning: k and r

Maximize QPC(Quality-per-click)

Avoid excessive“junk”

Preserve #1 resultfor navigationalsearches

28

@Carnegie MellonDatabases

Model of the Web

Squash Linux

Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.)

A community is made up of a set of pages, interested users and related queries

29

@Carnegie MellonDatabases

Robustness Across Different Web Communities

0

0.2

0.4

0.6

0.8

1

1E+03 1E+04 1E+05 1E+06

# pages

0

0.2

0.4

0.6

0.8

1

1E+02 1E+04 1E+06

# users

0

0.2

0.4

0.6

0.8

1

0.75 2.25 3.75 5.25

page lifetime

qu

alit

y-p

er-c

lick

0

0.2

0.4

0.6

0.8

1

1E+01 1E+04 1E+07

visit rate

qu

alit

y-p

er-c

lick

30

@Carnegie MellonDatabases

Summary

Entrenchment effect hurts search result quality

Solution : Randomized rank promotion

Model of Web evolution and QPC metric– Used to tune & evaluate randomized rank promotion

Results :– New high-quality pages become popular much faster– Aggregate search result quality significantly improved

31

@Carnegie MellonDatabases

THE END

Paper available at :www.cs.cmu.edu/~spandey