optimizing web traffic via the media scheduling …...3.2% lars backstrom, jon kleinberg, ravi kumar...
TRANSCRIPT
Optimizing Web Traffic via the Media SchedulingProblem
Lars Backstrom1
Jon Kleinberg1
Ravi Kumar2
1Cornell University
2Yahoo! Research
June 30, 2009
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Introduction
Featured items are common on many web pages
Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Introduction
Featured items are common on many web pages
Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Introduction
Featured items are common on many web pages
Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Introduction
Featured items are common on many web pages
Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Utility
In all these cases, some utility is gained through the featureditems
General user interestAd revenue on linked pagesProduct sales
Abstractly, some utility is gained per impression
At a high level, our goal is to maximize the utility gained fromthe featured item slot
In study of yahoo.com, we consider the unit of utility to beclicks
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Framework
All visitors to the site will view the same featured item (nopersonalization)
The website operator has a pool of items that can potentiallybe featured over the course of a day
Available items are known ahead of timeQualities of items are also known ahead of time (perhapsthrough bucket testing)
Items will be presented during only one contiguous interval
Example:
Available articles for day: {How to spot fake money, Top 10Summer Movies, Britney Spears ...}
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
The Problem
0 200 400 600 800 1000 1200 1400
Vis
its
Minutes
Traffic over One Day
Three components to our problem
Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
The Problem
0 10 20 30 40 50 60 70 80
CT
R
Minutes
Declining CTR over Time
Three components to our problem
Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
The Problem
5 10 15 20 25 30 35
CT
R
Minutes
CTR Decay of Different Articles
Three components to our problem
Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Varying Traffic
Traffic is highly variable, with significant difference betweenpeak and offpeak times
Each day has a slightly different shape, but traffic is mostlyconsistent from one week to the next
Important because it means we can accurately predict trafficahead of time
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Staleness
Items become lessvaluable the longerthey appear on a site
A significant fractionof all visitors will bereturning
0 50 100 150 200 250
Clic
k-T
hrou
gh R
ate
Minutes
Fits of a single article
Actual DataBest fit power lay decay
Best fit linear decayBest fit exponential decay
If a visitor returns to the same featured item, typically one oftwo things will have already happened
He has already rejected the item, and will not ‘consume’ itHe has already consumed the item and will not do so again
Utility per impression decays with time
We say the utility of a item i after t minutes if fi (t)
Note that in figure here, exponential decay seems to fit best
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Item Variation
Cou
nt
Quality
Article Quality Distribution
The utility of items naturally varies according to the itemIn most cases this variation can be observed ahead of time
For some things, like products, historical sales data can be usedIn other cases we can use ‘bucket testing’ to discover thisvariation
For short intervals divide all users into many ‘buckets’Show users within each bucket a different itemUse gathered data to gauge item quality
Best items may be order of magnitude better than average
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
The Media Scheduling Problem Formalized
InputsNumber of visitors γτ during minute τSet of N items with associated value functions fi (t) givingexpected utility per impression
OutputNon-overlapping intervals [Si ,Ti ] for each article i
Goal is to maximize the total utility:∑i
Ti∑τ=Si
γτ fi (τ − Si )
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Dataset
Items are ‘featured news’ articles – these are rarely breakingnews
Comes from yahoo.com server logs
Recorded over three weeks in 2008
Captures page views and click rates
Our measure of utility here is clicks, so fi (·) is the CTR
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Utility Decay on Yahoo!
5 10 15 20 25 30 35
CT
R
Minutes
CTR Decay of Different Articles
Item quality varies greatly between ‘best’ and ‘worst’ articles,perhaps by an order of magnitude
However, given initial quality fi (0), articles share similar decayfunctions
We find that all articles can be aligned to a single ‘universal’decay function such that the average relative error is only3.2%
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Utility Decay on Yahoo!
Item quality varies greatly between ‘best’ and ‘worst’ articles,perhaps by an order of magnitude
However, given initial quality fi (0), articles share similar decayfunctions
We find that all articles can be aligned to a single ‘universal’decay function such that the average relative error is only3.2%
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Utility Decay on Yahoo!
Given universal decay function g(·), fi (t) = g(t + σi )
Furthermore, universal decay is quite similar to exponentialdecay
Single universal exponential parameter λ gives average relativeerror of 4.6%
All this suggests that if we know fi (0) we know fi (t) for all t
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Algorithms
General problem is NP-Hard
Naively examine N! permutationsBetter optimal algorithm uses dynamic programming – takesO(T 2N2N) where T is the number of discrete time units
To do better, we need to use structure from problem observedin data
Recall that to close approximation, we observed thatfi (t) = g(t + σi ) and that g(·) is monotonically decreasingConsider case where traffic pattern γ is monotonicallyincreasingOptimal ordering is from worst to best
To prove, we consider an inversion in this orderingWill show that we can correct this inversion to get an orderingwhich is no worse
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Proof for increasing traffic case
Claim: Ordering from worst to best is optimal
Consider an inversion where a better article was placed first
We can swap the two articlesWe get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Proof for increasing traffic case
Claim: Ordering from worst to best is optimal
Consider an inversion where a better article was placed firstWe can swap the two articles
We get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Proof for increasing traffic case
Claim: Ordering from worst to best is optimal
Consider an inversion where a better article was placed firstWe can swap the two articlesWe get lower CTR in the beginning
We get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Proof for increasing traffic case
Claim: Ordering from worst to best is optimal
Consider an inversion where a better article was placed firstWe can swap the two articlesWe get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Solving the problem
Decreasing traffic is similar: order from best to worstTo find exact interval lengths, use dynamic programmingOrder is known, compute the value at time t using first n ofthe N items as opt(t, n) = maxt′ opt(t ′, n−1) + value(t, t ′, n)
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Solving the problem
When traffic unimodal, items on each side of peak are orderedfrom best to worst as they fall away from peakAllows dynamic programming algorithm
Try all possibilities for base case article straddling peakOrder remaining articles from best to worstUse dynamic programming to compute optimal solution forinterval [a, b) using first n items:opt(a, b, n) = max(
maxt opt(a, t, n − 1) + value(t, b, n) ,maxt opt(t, b, n − 1) + value(a, t, n) )
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Results
0 100 200 300 400 500 600 700 800 900
Vie
ws
Clic
k-T
hrou
gh R
ate
Minutes
Views Our Algorithm
Optimal Scheduling
Figure shows trafficover one day in red –note that it is closeto, but not quiteunimodal
Article CTRs are shown for two schedules: optimal and thatof our algorithm
Similar, but not quite the same due to lack of completeunimodality
Over 21 day observation period, always within 0.1% of optimal
Compared to actual schedule picked by human editors, a 26%improvement in total clicks
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Generative Model
0 10 20 30 40 50 60 70
Clic
k-T
hrou
gh R
ate
Minutes Since First Display
Declining Click-Through Rate for a Typical Article and Simulated CTR
Click-Through RateSimulation Results
Declining CTR can be explained by a generative model takinga few user traits into account
Distribution of visit rates for different usersGiven overall visit rate, the distribution of interarrival time gapsThe attentuation curve – the chance of clicking an articlegiven that a user has returned to the page and seen the samearticle K times
Putting these factors together, we can simulate users and findthat these three ingredients explain the declining click throughrates we observeLars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem
Further Directions
What can we do ifnew items appearduring the day?
How can these results be combined with personalization?
What if there are multiple featured items?
To what extent do these results generalize to other datasets?
Are the conditions here approximately met elsewhere also
Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem