1 web search from information retrieval to microeconomic modeling prabhakar raghavan yahoo! research
Post on 20-Dec-2015
216 views
TRANSCRIPT
1
Web SearchFrom Information Retrieval to Microeconomic Modeling
Prabhakar Raghavan
Yahoo! Research
2Yahoo! Research
What is web search?
• Access to “heterogeneous”, distributed information– Heterogeneous in creation– Heterogeneous in accuracy– Heterogeneous in motives
• Multi-billion dollar business– Source of new opportunities in marketing
• Strains the boundaries of trademark and intellectual property laws
• A source of unending technical challenges
3Yahoo! Research
The coarse-level dynamics
Content creators Content aggregators
Feeds
Crawls
Content consumers
Adv
erti
sem
ent
Edi
tori
al
Sub
scri
ptio
nT
rans
acti
on
4Yahoo! Research
5Yahoo! Research
Brief (non-technical) history
• Early keyword-based engines
– Altavista, Excite, Infoseek, Inktomi, Lycos, ca. 1995-1997
• Paid placement ranking: Goto (morphed into Overture Yahoo!)
– Your search ranking depended on how much you paid
– Auction for keywords: casino was expensive!
6Yahoo! Research
Brief (non-technical) history
• 1998+: Link-based ranking pioneered by Google
– Blew away all early engines except Inktomi
– Great user experience in search of a business model
– Meanwhile Goto/Overture’s annual revenues were nearing $1 billion
7Yahoo! Research
Brief (non-technical) history
• Result: Google added “paid-placement” ads to the side, separate from search results
• 2003: Yahoo follows suit, acquiring Overture (for paid placement) and Inktomi (for search)
8Yahoo! Research
Algorithmic results CPC Advertisements
9Yahoo! Research
Editorial
User reviews
Ads
10Yahoo! Research
Types of content
• Editorial content: books, music, professionally-produced websites
• User-generated content: blogs, reviews, bulletin boards, groups, etc.
• Total web growth: 1-3M pages/day– Not “real” growth– Think text content…
Courtesy: Andrew Tomkins
11Yahoo! Research
Total text content
• 6B people type 4 hrs/day at 100 wpm
• Storage: 52PB/yr = Cost: $25M/yr
• In another 5 years, this looks about like the cost of having 10 people on your payroll
• Conclusion 1: any company with tens of people can store every bit of text produced by every human on the planet
• Conclusion 2: no scale-based differentiation around text content
(of course, not all content is text…)Courtesy: Andrew Tomkins
12Yahoo! Research
User generated content (UGC)
• New content– 2 billion pages of editorial content– Tiny number of songs, etc– 5-10 billion pages UGC exist (already ~10%
of consumption), growing
• Note: UGC did not exist as a category a couple of years ago!– Rapidly becoming a key growth area of
consumed web content – but we don’t know how to process it!
Courtesy: Andrew Tomkins
13
Tags The simplest form of UGC
Is the Turing test always the right question?
14Yahoo! Research
15Yahoo! Research
The power of social tagging
• Flickr – community phenomenon
• Millions of users share and tag each others’ photographs (why???)
• The wisdom of the crowd can be used to search
• The principle is not new – anchor text used in “standard” search
• Don’t try to pass the Turing test?
16Yahoo! Research
Anchor text
• When indexing a document D, include anchor text from links pointing to D.
www.ibm.com
Armonk, NY-based computergiant IBM announced today
Joe’s computer hardware linksCompaqHPIBM
Big Blue today announcedrecord profits for the quarter
17Yahoo! Research
Challenges in tag-based search
• How do we use these tags for better search?
• How do you cope with spam?
• What’s the ratings and reputation system?
• The bigger challenge: where else can you exploit the power of the people?
• What are the incentive mechanisms?– Luis von Ahn (CMU): The ESP Game
18Yahoo! Research
19
More UGC: Social search
Indexing the knowledge in people’s heads
20Yahoo! Research
21Yahoo! Research
22Yahoo! Research
Social content Social capital
23Yahoo! Research
Incentives
24Yahoo! Research
Incentives
• What assignment of incentives leads to good user behavior?
– What’s “good” user behavior?
– Good questions, good answers, new questions?
• Whom do you trust and why?
– Propagation of trust and distrust.
25Yahoo! Research
Ratings and reputation
• Node reputation: Given a DAG with– a subset of nodes called GOOD
– another subset called BAD
– Find a measure of goodness for all other nodes.
• Node pair reputation: Given a DAG with a real-valued trust on the edges– Predict a real-valued trust for ordered node
pairs not joined by an edge
Metric labelling
26
CPC advertisements
What pays the bills
27Yahoo! Research
Ads
28Yahoo! Research
Generic questions
• Of the various advertisers for a keyword, which one(s) get shown?
• What do they pay on a click through?
• The answers turn out to draw on insights from microeconomics
29Yahoo! Research
Ads go in slots like this one
and this one.
30Yahoo! Research
Advertisers generally prefer this slot
to this one
to this one
to this one.
31Yahoo! Research
Click through rate r1 = 200 per hour
r2 = 150 per hour
r3 = 100 per hour
etc.
32Yahoo! Research
Why did witbeckappliance win
over ristenbatt?
33Yahoo! Research
First-cut assumption
• Click-through rate depends only on the slot, not on the advertisement
• In fact not true; more on this later.
34Yahoo! Research
Advertiser’s value
• We assume that an advertiser j has a value vj per click through
– Some measure of downstream profit
• Say, click-through followed by• 96% of the time, no purchase
• 0.7% buy Dishwasher, profit $500
• 1.2% buy Vacuum Cleaner, profit $200
• 2.1% buy Cleaning agents, profit $1$ 5.921
35Yahoo! Research
Example
• For the keyword miele, say an advertiser has a value of $10 per click.
• How much should he bid?
• How much should he be charged?
The value of a slot for an advertiser,
what he bids and
what he is charged, may all be different.
36Yahoo! Research
Advertiser’s payoff in ad slot i
(Click-through rate) x (Value per click) – (Payment to search engine)
= ri vj – (Payment to Engine)
= ri vj – pij
Payment ofadvertiser j
in slot iFunction of all other bids.
37Yahoo! Research
Two auction pricing mechanisms
• First price: The winner of the auction is the highest bidder, and pays his bid.
• Second price: The winner is the highest bidder, but pays the second-highest bid.
• Engine decides and announces pricing.
• What should an advertiser bid?
Not truthful.
38Yahoo! Research
Second-price = Vickrey auction
• Consider first a single advt slot
• Winner pays the second-highest bid
• Vickrey: Truth-telling is a dominant strategy for each player (advertiser)
– No incentive to “game” or fake bids
39Yahoo! Research
Auctions and pricing: multiple slots
• Overture’s model:
– Ads displayed in order of decreasing bid
– E.g., if advertiser A bids 10, B bids 2, C bids 4 – order ACB
• How do you price slots? Generalized Vickrey?
– Generalized second-price (GSP)
– Vickrey-Clark-Groves (VCG): each advertiser pays the externality he imposes on others
40Yahoo! Research
Bidder A, $10
Bidder C, $4
Bidder B, $2
Pays 4
Pays 2
Generalized Second Price auction pricing
41Yahoo! Research
VCG pricing
• Suppose click rates are 200 in the top slot, 100 in the second slot
• VCG payment of the second player (C) is 2 x 100 = 200
• For the first player, 4x(200-100) + 200
Externality on third player B.
Externality on C. Externality on B.
42Yahoo! Research
VCG and GSP
• Truth-telling is a dominant strategy under VCG …
• Truth-telling not dominant under GSP!
Edelman, Ostrovsky, Schwarz
43Yahoo! Research
VCG and GSP
• Static equilibrium of GSP is locally envy-free: no advertiser can improve his payoff by exchanging bids with advertiser in slot above.
• Depending on the mechanism, revenue varies: GSP ≥ VCG.
Edelman, Ostrovsky, Schwarz
Locally envy-free mechanisms correspondto Stable Marriage solutions.
44Yahoo! Research
GSP for bid-ordering
• What’s good about bid-ordering and GSP?
–Advertisers like transparency
• What’s wrong with bid-ordering?
45Yahoo! Research
Brand advertising?
Bid ordering(former Yahoo! order)
46Yahoo! Research
47Yahoo! Research
Revenue ordering
• Simplified version of Google’s ordering
– Each ad j has an expected click-through denoted CTRj
– Advertiser j’s bid is denoted bj
• Then, expected revenue from this advertiser is Rj = bj+1 x CTRj
• Order advertisers by Rj
– Payment by GSP
48Yahoo! Research
49Yahoo! Research
“current” Yahoo! ordering
50Yahoo! Research
“Squashed” ordering
• Overture/Old Yahoo! scheme
– Order ads by bid
• Google (puportedly)
– Order by bid click-through rate (CTR)
• Squashing (Lahaie/Pennock)
• Key – advertisers react to mechanism!
s=0 s=1Order by bid*(CTR)sOverture Google?
51Yahoo! Research
52Yahoo! Research
Where do we go next?
• Premise:
– People don’t want to search
– People want to get tasks done
I want to book a vacation in Tuscany.Start Finish
53Yahoo! Research
What is missing?
• Information integration
– Information extraction
– Schema normalization
• Mining social structure
– Tags, UGC
Welcome to The SavoyLocated on The Strand in the heartof the West End theatre district,
hotel near leicester squareSearch
54Yahoo! Research
Computational microeconomics
• Reputation and incentive mechanisms
• Matching marketplaces– Jobs, dates, …
– Online matching everywhere• Hardest part is estimating the payoffs, not the
matching algorithm
• “Network effects”– Are 500 million users 500 times as valuable
as a million users? 5000 times?
55Yahoo! Research
A new convergence
• Monetization and economic value an intrinsic part of system design
– Not an afterthought
– Mistakes are costly!
• Computing meets humanities like never before – sociology, economics, anthropology …