1 quicklink selection for navigational query results deepayan chakrabarti ([email protected])...
Post on 15-Jan-2016
224 views
TRANSCRIPT
![Page 1: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/1.jpg)
1
Quicklink Selection for Navigational Query Results
Deepayan Chakrabarti ([email protected])
Ravi Kumar ([email protected])
Kunal Punera ([email protected])
![Page 2: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/2.jpg)
2
What are quicklinks
Quicklinks
Result Website
![Page 3: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/3.jpg)
3
Quicklinks = URLs within the search result website Enable fast navigation to important parts of the
website Which URLs should be QLs?
Quicklinks
Quicklinks
Result Website
![Page 4: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/4.jpg)
4
Quicklink Selection
Some obvious strategies don’t work very well Top clicked URLs in search engine
URL may have low relevance in the QL context lib.utexas.edu/maps is popular for searches on “maps” and
not for searches on “Univ. of Texas” URL may be too specific:
automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com
URL popularity be time sensitive: nytimes.com/election-guide/2008/ for nytimes.com
![Page 5: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/5.jpg)
5
Quicklink Selection
Some obvious strategies don’t work very wellTop clicked URLs in search engine
Top visited URLs intoolbar data May not relate to search activity:
e.g., for nytimes.com #3 is nytimes.com/mem/emailthis.html #6 is nytimes.com/auth/login #8 is nytimes.com/gst/regi.html
![Page 6: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/6.jpg)
6
Quicklink Selection
Some obvious strategies don’t work very wellTop clicked URLs in search engine
Top visited URLs in toolbar data
Top URLs from analysis of hyperlink graph Ignores preferences of search users Toolbar data is more representative
Heavily tagged URLs (e.g., del.icio.us/digg) Low coverage: Too few websites
![Page 7: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/7.jpg)
7
Quicklink Selection
Need a combined approach Search logs Toolbar data Web-server logs Website hyperlink graph User tags
This paper
![Page 8: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/8.jpg)
8
Related Work
Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99,
Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08]
We focus on QLs in the context of Search
![Page 9: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/9.jpg)
9
Outline
Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions
![Page 10: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/10.jpg)
10
Problem Formulation
Which k URLs should be QLs?
“The greatest good for the greatest number”
QLs save clicks Maximize the total number of clicks saved
using at most k QLs But when exactly is a click “saved”?
![Page 11: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/11.jpg)
11
Problem Formulation
When does a QL get clicked by the user?
Graph of click trails (Toolbar data)
Say we pick this node as a QL
nasa.gov
Hubble telescope
Photos
![Page 12: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/12.jpg)
12
Problem Formulation
Say we pick this node as a QL
Assumption:The user recognizes if SearchResult QL Destination
Graph of click trails (Toolbar data)
nasa.gov
Hubble telescope
Photos
![Page 13: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/13.jpg)
13
Problem Formulation
Say we pick this node as a QL
(saves 1 click each)
Assumption:The user recognizes if SearchResult QL Destination
Graph of click trails (Toolbar data)
nasa.gov
![Page 14: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/14.jpg)
14
Problem Formulation
Say we pick this node as a QL
(saves 1 click each)
(saves 2 clicks each)
(saves 0)
(saves 0)
Total savings = 1*3 + 2*2 = 7 clicks
Graph of click trails (Toolbar data)
Assumption:The user recognizes if SearchResult QL Destination
nasa.gov
![Page 15: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/15.jpg)
15
Problem Formulation
However…
Unknown pages might become QLs
lyrics.com
A B C Z… These could become the “best” QLs
![Page 16: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/16.jpg)
16
Problem Formulation
However… Unknown pages might become QLs Automatic-redirect pages might become QLs:
nytimes.com forces logging in aaa.com forces zipcode entry
We need QLs that are “noticeable” in a search context
![Page 17: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/17.jpg)
17
Problem Formulation
How can we estimate noticeability? Via Search click-logs Noticeability of a URL u:
User notices a useful QL with probability α(u)
Tuning param(≈ 2)
Fraction of search clicks for u on website
![Page 18: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/18.jpg)
18
Problem Formulation
QL1
(saves 0)
(saves 0)
QL2
# trail prob #clicks
saves 2 x α1 x 2
saves 1 x α1 x 1
saves 2 x (1-α2)α1 x 1
saves 2 x α2 x 2
Total = 5α1 + 4α2 + 2(1-α1)α2
Assumption:The user picks the best QL that he/she notices
nasa.gov
?
![Page 19: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/19.jpg)
19
Problem Formulation
QL1
(saves 0)
(saves 0)
QL2
# trail prob #clicks
saves 2 x α1 x 2
saves 1 x α1 x 1
saves 2 x (1-α2)α1 x 1
saves 2 x α2 x 2
Total = 5α1 + 4α2 + 2(1-α1)α2
If only QL1 is perfectly noticeable (α1=1, α2=0): Total = 7 clicks (as if 1 QL only)
If both QLs are perfectly noticeable (α1=1, α2=1): Total = 9 clicks
nasa.gov
![Page 20: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/20.jpg)
20
Problem Formulation
Which k URLs should be QLs?
Maximize the expected number of clicks saved using at most k QLs while incorporating “noticeability”
![Page 21: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/21.jpg)
21
Outline
Motivation and Related Work
Problem Formulation Proposed Solution Experiments Conclusions
![Page 22: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/22.jpg)
22
Algorithms
Maximize expected number of saved clicks using k QLs NP-Hard
Theorem: This objective is non-decreasing submodular
1. Non-negative
2. Adding QLs never hurts
3. “Diminishing Returns”
u
SS '
S
Marginal improvement to set S
Marginal improvement to superset S’
![Page 23: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/23.jpg)
23
Algorithms
Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most Within a factor (1-1/e) of OPT
[Nemhauser+/’78]
![Page 24: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/24.jpg)
24
Algorithms
However… Inhomogeneous results: QLs for ea.com are
fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com
Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes
Parent URL makes the child URLs
redundant
Two games made by EA
![Page 25: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/25.jpg)
25
Algorithms
Both can be specified as pairwise constraints on URLs allowed to belong to a QL set
Pairwise-constrained QL selection isNP-hard.
Two-step process: Heuristically find a large subset of trails that form
a tree Enforce constraints on tree
Dynamic program optimal on tree
![Page 26: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/26.jpg)
26
Outline
Motivation and Related Work
Problem Formulation
Proposed Solution Experiments Conclusions
![Page 27: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/27.jpg)
27
Experiments
Baseline Methods TopClicked:
URL score = # search clicks on URL TopVisited:
URL score = # occurrences on toolbar trails PageRank:
Build a weighted graph on URLs, where weight(i,j) = # trails using the ij edge
URL score = PageRank on this graph
![Page 28: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/28.jpg)
28
Experiments
Live Traffic dataset Computed CTRs on QLs currently displayed by
Yahoo! (1043 website subset) Measure:
Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the
better subset Measure how often the predictions match
![Page 29: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/29.jpg)
29
Experiments Live Traffic Data
Subset sizesFra
ctio
n o
f su
bse
t-p
airs
whe
re
pre
dic
tion
s ag
ree
with
live
tra
ffic
QL-ALG > TopVisited > PageRank > TopClicked
![Page 30: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/30.jpg)
30
Experiments
Tree-structured trails Most dropped trails are
very short Tree-structured trails
improve accuracy
1 10 100 1000 100000
20
40
60
80
100
Length of trail
Num
ber
of t
rails
dro
pped
Live Traffic prediction quality comparison
Distribution of dropped trails
![Page 31: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/31.jpg)
31
Outline
Motivation and Related Work
Problem Formulation
Proposed Solution
Experiments Conclusions
![Page 32: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)](https://reader031.vdocuments.site/reader031/viewer/2022012922/56649d4e5503460f94a2ddf4/html5/thumbnails/32.jpg)
32
Conclusions
Proposed a formulation for the QL selection problem Both toolbar and search logs are used intuitively
Proposed two algorithms: Greedy: (1-1/e)-optimal Tree-structured: empirically better
Improvement of 22% over competing baselines