sandeep pandey 1 , sourashis roy 2 , christopher olston 1 , junghoo cho 2 , soumen chakrabarti 3

31
Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results

Upload: troy-david

Post on 31-Dec-2015

11 views

Category:

Documents


0 download

DESCRIPTION

Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results. Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3. 1 Carnegie Mellon 2 UCLA 3 IIT Bombay. --------- --------- ---------. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3

1 Carnegie Mellon2 UCLA 3 IIT Bombay

Shuffling a Stacked Deck

The Case for Partially Randomized Ranking of Search Engine Results

Page 2: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

2

@Carnegie MellonDatabases

Popularity as a Surrogate for Quality

Search engines want to measure the “quality” of pages

Quality is hard to define and measure

Various “popularity” measures are used in ranking– e.g., in-links, PageRank, user traffic

1. ---------2. ---------3. ---------

Page 3: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

3

@Carnegie MellonDatabases

Relationship Between Popularity and Quality

Popularity : depends on the number of users who “like” a page– relies on both quality and awareness of the page

Popularity is different from quality – But strongly correlated when awareness is large

Usersaware of

page p

like page p

Page 4: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

4

@Carnegie MellonDatabases

Problem

Popularity/quality correlation weak for young pages – Even if of high quality, may not (yet) be popular due to

lack of user awareness

Plus, process of gaining popularity inhibited by “entrenchment effect” – [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05]

[Mowshowitz et. al. Communication’02]

and many others

Page 5: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

5

@Carnegie MellonDatabases

Entrenchment Effect

Search engines show entrenched (already-popular) pages at the top

Users discover pages via search engines; tend to focus on top results

1. ---------2. ---------3. ---------4. --------- 5. ---------6. --------- …

entrenched pages

user attention

new unpopular pages

Page 6: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

6

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation

Summary

Page 7: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

7

@Carnegie MellonDatabases

Alternative Approaches to Counter-act Entrenchment Effect

Weight links to young pages more – [Baeza-Yates et. al SPIRE ’02]– Proposed an age-based variant of PageRank

Extrapolate quality based on increase in popularity – [Cho et. al SIGMOD ’05]– Proposed an estimate of quality based on the

derivative of popularity

Page 8: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

8

@Carnegie MellonDatabases

Our Approach: Randomized Rank Promotion

Select random (young) pages to promote to good rank positions

Rank position to promote to is chosen at random

1

2

3

500

501

..

1

500

2

499

501

..3

Page 9: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

9

@Carnegie MellonDatabases

Our Approach: Randomized Rank Promotion

Consequence: Users visit promoted pages; improves ability to estimate quality via popularity

Compared with previous approaches: • Does not rely on temporal measurements (+)• Sub-optimal (-)

Page 10: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

10

@Carnegie MellonDatabases

Exploration/Exploitation Tradeoff

Exploration/Exploitation tradeoff– exploit known high-quality pages by assigning

good rank positions– explore quality of new pages by promoting them

in rank

Existing search engines only exploit (to our knowledge)

Page 11: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

11

@Carnegie MellonDatabases

Possible Objectives for Rank Promotion

Fairness– Give each page an equal chance to become popular– Incentive for search engines to be fair?

Quality– Maximize quality of search results seen by users (in

aggregate)– Quality page p: extent to which users “like” p– Q(p) [0,1]

our choice

Page 12: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

12

@Carnegie MellonDatabases

Quality-Per-Click Metric (QPC)

V(p,t) : number of visits made to page p at time t through search engine

QPC : average quality of pages viewed by users, amortized over time

t p

t p

tpV

pQtpV

QPC

),(

)(),(

Page 13: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

13

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking– Randomized Rank Promotion Scheme – Model of ranking and popularity evolution– Evaluation

Summary

Page 14: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

14

@Carnegie MellonDatabases

Desiderata for Randomized Rank Promotion

Want ability to:– Control exploration/exploitation

tradeoff

– “Select” certain pages as candidates for promotion

– – “Protect’’ certain pages from

demotion

1

2

3

500

501

..

1

500

2

499

501

..3

Page 15: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

15

@Carnegie MellonDatabases

Randomized Rank Promotion Scheme

WWm

W-Wm

Promotion pool

4

1

2

3

4

1

2

3random ordering

order by popularity Ld

Lm

Remainder

Page 16: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

16

@Carnegie MellonDatabases

Randomized Rank Promotion Scheme

Ld

k-1

r 1-r

Promotion list

k = 3 r = 0.5

Remainder

1

1 2

2 3 4

3 4 5 6

1 2

Lm

Page 17: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

17

@Carnegie MellonDatabases

Parameters

Promotion pool (Wm)– Uniform rank promotion : give an equal chance to each

page– Selective rank promotion : exclusively target zero

awareness pages

Start rank (k)– rank to start randomization from

Degree of randomization (r) – controls the tradeoff between exploration and exploitation

Page 18: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

18

@Carnegie MellonDatabases

Tuning the Parameters

Objective: maximize quality-per-click (QPC)

Two ways to tune– Real-world experiment– Analytical modeling

Page 19: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

19

@Carnegie MellonDatabases

Outline

Problem introduction Key idea: Mitigate entrenchment by

introducing randomness into ranking – Randomized Rank Promotion Scheme– Model of ranking and popularity evolution– Evaluation

Summary

Page 20: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

20

@Carnegie MellonDatabases

Popularity Evolution Cycle

Popularity P(p,t)

Rank R(p,t)

Awareness A(p,t)

Visit rateV(p,t)

Page 21: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

21

@Carnegie MellonDatabases

Popularity Evolution Cycle

Popularity P(p,t)

Rank R(p,t)

Awareness A(p,t)

Visit rateV(p,t)

FAP(A(p,t))

FVA(V(p,t))

FPR(P(p,t))

FRV(R(p,t))

Page 22: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

22

@Carnegie MellonDatabases

Deriving Popularity Evolution Curve

Po

pu

lari

ty

P(p

,t)

time (t)

Next step : derive formula for popularity evolution curve

Assumptions– Number of pages constant– Pages are created and retired according to a Poisson

process with rate parameter – Quality distribution of pages is stationary

Page 23: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

23

@Carnegie MellonDatabases

Deriving Popularity Evolution Curve

i

j jPRRV

jPRRV

iPRRVi qaFF

qaFF

aFFqaf

1

1

)).((

)).((

)1)).(0(()|(

Doing the steady state analysis, we get

DETAIL

Pp

m

pQxmiPR pQ

m

ifxF

)(/.1

)(|1)(

2/3

1

2/3)(

xi

vxF n

i

RV

q

qx

f

qxE

|

),(

Page 24: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

24

@Carnegie MellonDatabases

Use Popularity Evolution Model to Tune Parameters

Model of popularity evolution process (see paper)– Complex dynamic process– To study, we combine approximate analysis with

simulation

Next step: use model to tune rank promotion scheme– Parameters: k, r and Wm

– Objective: maximize QPC

Page 25: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

25

@Carnegie MellonDatabases

Tuning: Promotion Pool (Wm )

-no promotion - uniform promotion- selective promotion

k=1 and r=0.2

Page 26: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

26

@Carnegie MellonDatabases

Tuning: k and r

k: start rank

r: degree of randomization

Page 27: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

27

@Carnegie MellonDatabases

Tuning: k and r

Maximize QPC(Quality-per-click)

Avoid excessive“junk”

Preserve #1 resultfor navigationalsearches

Page 28: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

28

@Carnegie MellonDatabases

Model of the Web

Squash Linux

Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.)

A community is made up of a set of pages, interested users and related queries

Page 29: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

29

@Carnegie MellonDatabases

Robustness Across Different Web Communities

0

0.2

0.4

0.6

0.8

1

1E+03 1E+04 1E+05 1E+06

# pages

0

0.2

0.4

0.6

0.8

1

1E+02 1E+04 1E+06

# users

0

0.2

0.4

0.6

0.8

1

0.75 2.25 3.75 5.25

page lifetime

qu

alit

y-p

er-c

lick

0

0.2

0.4

0.6

0.8

1

1E+01 1E+04 1E+07

visit rate

qu

alit

y-p

er-c

lick

Page 30: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

30

@Carnegie MellonDatabases

Summary

Entrenchment effect hurts search result quality

Solution : Randomized rank promotion

Model of Web evolution and QPC metric– Used to tune & evaluate randomized rank promotion

Results :– New high-quality pages become popular much faster– Aggregate search result quality significantly improved

Page 31: Sandeep Pandey 1 ,  Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

31

@Carnegie MellonDatabases

THE END

Paper available at :www.cs.cmu.edu/~spandey