recent developments at yahoo! in search & mobile, and future challenges
DESCRIPTION
WikiMania’08, July 18, 2008 Alexandria, Egypt. Recent Developments at Yahoo! in Search & Mobile, and Future Challenges. Research. Usama Fayyad Chief Data Officer & Executive VP Yahoo! Inc. [email protected]. Overview. About Yahoo! and its business Yahoo! Mobile Philosophy - PowerPoint PPT PresentationTRANSCRIPT
1
WikiMania’08, July 18, 2008Alexandria, Egypt
Recent Developments at Yahoo! in Search & Mobile, and Future Challenges
Research
Usama FayyadChief Data Officer & Executive VP
Yahoo! [email protected]
2
Research
Overview
• About Yahoo! and its business
• Yahoo! Mobile Philosophy
• OneSearch 2.0
• Challenges in Mobile Search
• Some words about search advertising
• Examples of Search Evolution at Yahoo!
• Concrete examples of the changes that are relevant to Social Web
• Concluding thoughts
3
Research
506.3
602.4
702.4787.5
868.3
974.3
1,076.7
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
2001 2002 2003 2004 2005 2006 2007
Worldwide Total Japan
Rest of World Western Europe
Asia/Pacific United States
Globally, Internet Users Number Over 1 Billion
Source: IDC, December 2003.
Internet Users in Millions:
4
Research
Yahoo! is the #1 Destination on the Web
73% of the U.S. Internet population uses Yahoo! – Over 500 million users per month globally!
• Global network of content, commerce, media, search and access products
• 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc.
• 25 terabytes of data collected each day… and growing
• Representing thousands of cataloged consumer behaviors
More people visited Yahoo! in the past month than:
• Use coupons• Vote• Recycle• Exercise regularly• Have children
living at home• Wear sunscreen
regularly
Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers
5
Research
Yahoo! Data – A league of its own…
Terrabytes of Warehoused Data
25 49 94 100500
1,000
5,000
Am
azo
n
Ko
rea
Te
leco
m
AT
&T
Y!
Liv
eS
tor
Y!
Pa
na
ma
Wa
reh
ou
se
Wa
lma
rt
Y!
Ma
in
wa
reh
ou
se
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING
TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET
Y! Data Challenge Exceeds others by 2 orders of magnitude
Millions of Events Processed Per Day
50 120 2252,000
14,000
SABRE VISA NYSE YSM Y! Global
6
Research
What About Yahoo! Mobile?
• Fast growing initiative that is one of the companies priorities in the future
• Great success in distribution– signed deals with 29 carriers, and therefore
it’s accessible to 600 million subscribers, who are now under contract.
– OneSearch is Yahoo’s mobile search application that it launched 13 months ago. Just launched OneSearch 2.0
• Marco Boerries, EVP of Mobile at Yahoo!: “No one has never amassed that kind of distribution under that short period of time.”
7
Research
Mobile Device Internet Penetration Will Eclipse the PC
* U.N Telecommunications Agency, Sept 07 ** Informa, Nov 07
1 Billion people across the worlduse the Internet *
3.3 Billion people across the world are mobile service subscribers(that’s half the global population)**
8
Research
Yahoo!’s Global Mobile Reach
16.9 Million Unique Users Per Month In The U.S. Alone
Yahoo!
MSN
AOLUnique Users Per Month (mm)
16.9
12.1
8.9
8.6
16.9 Million
in USA
9
Research
=PC Search Mobile Search
Mobile Search Built for the Consumer
10
Research
The Mobile Use Case is Different
Give me Answers, Entertainment, Images…
11
Research
Y! oneSearch Changed the Game
Answers Instead of Web Links. Relevant, Complete Results
12
Research
Yahoo! Mobile Approach to Search
• OneSearch is a special federated search engine– Analyses Concept and Intent of the query against a
large collection of “vertical” backends• Web, News, Images, Finance, etc…• UGC such as Wikipedia and Yahoo! Answers
– Aggregates results from verticals and blends to optimize to user query and to device used for query
• Goal is to minimize clicks by taking user to results around tasks
• Query sources:– Browsers: WAP/XHTML– Java app interface for Yahoo! Go– SMS text messaging for Yahoo! Mobile SMS
13
Research
Approach
• Be as Open as possible on interfaces
• Fundamentally believe the mobile OS market will remain fragmented from a platforms perspective for quite a while– Windows Mobile only reached 30M users after more than 7 years
of effort
• Provide an environment to allow users to program to one target platform and let Yahoo! bear the effort of making it run on wide range of devices
• Focus on the highest value apps for users today involving access to on-line world (less on client apps)
• Return results and not links
14
Research
Yahoo! Mobile Products
* M:Metrics, October 2008**All Yahoo! Mobile services are free. Check with your wireless carrier about data plan charges that may apply.
Yahoo! Go 3.0 Yahoo! oneSearchYahoo! Home Page
Yahoo!onePlace
Yahoo!oneConnect
15
Research
OneSearch 2.0
• OneSearch is being opened up to all publishers and content owners so they can write rich metadata that will be returned as part of results, rather than just a link, – Similar to Yahoo’s Search Monkey service for the Internet.
• More about this later…
• Three new major upgrades– Search Assist: The search box will predict what you are typing.
– Voice input: Users can search by speaking into the device instead of typing (provided by Vlingo)
– The search box will be integrated into the home screen of the phones.
16
Research
Providing more relevant content
Unlocking the power of the semantic web
Turning web search results into answersBetter answers
OneSearch 2.0
17
Research
Contextual recommendations
Predictive text completion
Easier, faster inputEasier input
OneSearch 2.0
18
Research
Personalized to your voice
Search for anything
Speak your searchVoice input
OneSearch 2.0
19
Research
Supports text & voice
Gateway to the Internet
Persistent 1-click accessAlways there
OneSearch 2.0
20
Internet Use on Mobile vs. PC
Research
21
Research
Mobile Use
• Today, we believe Internet use on PC is about 10x that of Mobile
• Mobile is faster growing, in all regions
• There are > 3x mobiles today than Internet users globally– But most phones are not data capable yet
• The world today: – We are learning from the web, and attempting to figure out what
makes sense for mobile users
– Trying to work with the Smart Phones users as they represent the early adopters
22
Research
Classical web search user needs
• Informational (~25%) – want to learn about something
• Navigational (~40%) – want to go to that page
• Transactional (~35%) – want to do something (web-mediated)
– Access a service
– Downloads
– Shop
• Gray areas– Find a good hub– Exploratory search “see what’s there”
Low hemoglobin
United Airlines
Mendocino weather
Mars surface images
Nikon CoolPix
Car rental Finland
Broder 2002, A Taxomony of web search
23
Research
What about on Mobile
• No good classification
• Several studies that cover – Query frequency distribution
– Words per query
– Characters per query
• Categorization by query type into traditional categories:– Adult and Entertainment, Autos, Consumer Goods, Finance,
Government & politics, Sports, Technology, Travel, etc…
• Best known studies by– Kamvar and Baluja (2006 and 2007)
– Yi, Maghoul, and Pedersen (2008)
• Good quantitative statistics, little on qualitative purpose-driven analysis (early days still)
24
Research
What do We Believe about Mobile Queries
• We believe it is a different distribution than the query distribution for PC users– Bias towards shorter queries
• Data contradicts that: 2.6 words per query, same # chars as PC
– Difficulty of query entry is a significant hurdle
– Much higher location-based activity
– Much more task-oriented than exploration or research
• Notifications adds a whole new “push” dimension– Trigger alerts (stocks, news, auctions)
– Location-based (geo-driven)
– Event-based (calendar entires such as travel alerts, flight delays, etc.)
• Can learn much more about user intent and hence eventually more promising for advertising
25
Research
Implications and Challenges
• Task-orientation
• Specialized content packaging
• Locality Inference from queries
• Locality Inference from device (LBS)
• Minimize typing and round-trips: get results, not just links– Less room to display SERP + other accessories
• Monetization strategies to fund this model still not decided– Advertising
– Subscription to “premium services”
– Revenue share on “leads”
– Pay per usage of special high-value areas
In the meantime, the web, and Search are evolving…
26
Research
Even Larger Challenges
• Modeling Social Media and use of mobile in social settings on the go– Understanding UGC– Classifying, categorizing, organizing UGC and folksonomy
• A different problem of search -- Semantics of content are critical, especially if we are to target– Intent– Task-orientation– Motion dimension (distance to target of search)– Push and notifications– Understanding the physical world (common sense): what is close?
Business hours? Holidays?
• Web Content growing, changing, diversifying, fragmenting• Truly leveraging the notification abilities and finding new
everyday uses – far more versatile a space than PC• Long-term memory (state) for long-running tasks and queries
27
A Tale of Two Search Engines
Research
28
Research
Algorithmic results=Audience
Advertisements=Monetization
-$ +$
29
Research
Algorithmic vs. Ad Search
• Analogous to classical separation of editorial vs commercial content
• Technical underpinnings:– Some commonalities (IR, ML)
– Many differences (incentives, spam, mechanism design)
30
Research
The two engines
The Web
Ad indexes
Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)
Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages
Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages
Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages
Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages
Sponsored Links
CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com
Web spider
Indexer
Indexes
Search
User
31
Research
1995: The Yahoo! Directory
• Apply human expertise and editorial to organize web sites
• What worked– Practical, Navigable
– Trustworthy, Authoritative
• What didn’t– Scalability
– Granularity
– Etc.
32
Research
1995 : Altavista (Inktomi, Lycos, etc.)
• Automate the process of acquiring pages; use “information retrieval” techniques to return pages that contain a particular term
• What worked– Scalable (query for “IBM” returns 40M pages)
– Simple
– Granular
• What didn’t– Scalability a double-edged sword
– Ranking and relevance poor
– Not authoritative (spam, irrelevance, etc.)
33
Research
c. 1999-2006: PageRank (Google, Yahoo)
• Use topology (link structure) of the web to confer authority
• What works– Relevance is greatly improved
– Navigational query is born (query for “IBM” gets me to ibm.com)
• What doesn’t– Homogeneity of results (no personalization) means no “subjective”
queries – webmasters vote by proxy for everyone – and their answer is the only answer
– System easily “gamed” by spammers – leads to arms race
34
Research
Meanwhile, On the Money Front…
• Sponsored search ranking: Goto.com (morphed into Overture.com Yahoo!)– Your search ranking depended on how much you paid
– Auction for keywords: casino was expensive!
• 1998+: Link-based ranking pioneered by Google– Blew away all early engines except Inktomi
– Great user experience in search of a business model
– Meanwhile Goto/Overture’s annual revenues were nearing $1 billion
• Result: Google added sponsored search “ads” to the side, independent of search results– 2003: Yahoo follows suit, acquiring Overture (for paid placement) and
Inktomi (for search)
• The Monetization Mechanisms… Conversion of marketplace machanisms in 2007
35
Research
Search query
Ad
36
Research
Questions for the audience
• Do you think an “average” user, knows the difference between sponsored search links and algorithmic search results?
• Do you think an “average” user knows there are sponsored links on the page?
• Do you think a user knows where a sponsored link would navigate to upon a click?
37
Research
How it works
Advertiser
Landing page
Sponsoredsearch engine
I want to bid $5 oncanon cameraI want to bid $2 oncannon camera
Engine decides when/where to show this ad.
Engine decides how much to charge advertiser on a click.
Ad Index
38
Research
Engine: Three sub-problems
1. Retrieve ads matching query
2. Order the ads
3. Pricing on a click-through
IR
Econ
39
Research
Ads go in slots
like these
40
Research
Higher slots get more clicks
41
Research
2. Order the ads
• Most generally, composite IR+Econ score … for today’s talk, focus on Econ
• Original GoTo/Overture scheme:– Order by bid
42
Research
Economic ordering
• Bid and revenue ordering: two forms of ordering by an econ score
• Does revenue ordering maximize revenue?
• No – advertisers react to ordering scheme, by changing their bid behavior!
• Lahaie+Pennock ACM EC 2007– Family of schemes bridging Bid and Revenue ordering
– Game-theoretic analysis
Edelman, Ostrovsky, Schwarz 2006
43
Research
A new convergence
• Monetization and economic value an intrinsic part of system design– Not an afterthought
– Mistakes are costly!
• Computing meets humanities like never before – sociology, economics, anthropology …
44
Towards Getting Things Done… vs. Searching
Research
45
Research
Example
I want to book a vacation in Tuscany.Start Finish
47
Research
Loved the vacation, want to make that sweet Italian coffee at home
48
Research
Trends in task complexity
• Dawn of search:
– Navigational queries
– Pockets of information
• Today:
– Increasing migration of content online
– New forms of media only available online
– Infrastructure for payments and reputation sufficient for many users
49
Research
Things to notice
• Long-running user goals
• Search as hub: – start there
– return for resource discovery and at task boundaries
– traverse the web broadly to complete task
• Web services integrated into task
50
Content Growth
Research
51
Research
Content trends
[Ramakrishnan and Tomkins 2007]
52
Research
Metadata trends
[Ramakrishnan and Tomkins 2007]
53
Content Complexity
Research
54
Research
Content ownership
• Content consumption is fragmenting – nobody owns more than 10% of WW PVs
• No single place will own all the content
• Best of breed processing will operate on the web version (?)
• Value transitions to ecosystem
55
Research
Content access is fragmenting
56
Research
Content itself is fragmenting
57
Research
Evolution of Social Media
• Although the “traditional notion” of portal and web content is still attracting growing audiences
• The original notion of “publishing content” to attract audiences is changing fast– As people discover the fact that the Internet is an
Interactive Medium
– The uses of the Internet enter areas we could not imagine a short time ago
• A new notion of “publishing” is fast emerging– The opportunity of user-generated content
58
Research
Challenges in social media
• How do we use these tags for better search?
• What’s the ratings and reputation system?
• How do you cope with spam?
• The bigger challenge: where else can you exploit the power of the people?
• What are the incentive mechanisms?
59
Evolution is startingThe Search Interface
Research
60
Research
What does this mean for search?
• Few changes through 2005
• Entering period of massive change to handle more complex content
• Rich media, aggregation, simple task analysis, etc
• Moving beyond the stateless query/response paradigm
• Personalization theory
61
Research
Rich media and search assistance
62
Research
Structured aggregation
63
Research
Simple task-focused queries
64
Research
Google Base
65
Open Ecosystems
66
Research
Structured data on the Web
• Structured databases power a vast majority of pages on the web
– Certainly ecommerce catalogs etc
– But also user generated content (eg blogs)
• Content owners open to exposing structure, but don’t see how and why
– Microformats adoption at an all-time high
– Yet, it’s produced much more than is consumed
• Experiments with “pure” structured data aggregation have met with mixed success
– Google Base, Freebase, even Co-op
67
Research
What have we announced?
• Yahoo! Search Monkey: API for publishers to push metadata and structure to search engine
• Wide-ranging support for semantic web standards
• Vocabulary to surface structure and semantics
• Community Tools to evolve standards and vocabulary
68
Research
Search as Killer App for Data Web
• Publishers and search engine collaborate
• Users see richer search experience
• Accomplish their tasks faster and more effectively
• Example: abstracts surfacing structured content
69
Research
babycenter
epicurious
Search results of the future
yelp.com
answers.com
webmd
Gawker
New York Times
70
Research
Search results of the future
yelp.com
babycenter
epicurious
answers.com
webmd
New York Times
Gawker
71
Research
Comprehensive support for emerging semantic web standards ++
• Microformats– hCard, hEvent, hReview, hAtom, XFN
– More as they get adopted
• RDFa and eRDF markup
• OpenSearch– +extensions to return structured data
• Atom/RSS Feeds– +extensions to embed structured data
markup
(crawl)
apis
(pull)
push
72
Research
Vocabulary to surface structure
• ‘dataRSS’ provides a common framework for embedding structured data
– Use with RDFa, eRDF or OpenSearch
– Preferred Vocabulary includes
• Atom, Dublin Core
• Creative Commons
• FOAF, GeoRSS, MediaRSS
• RDF, RDFS, RDF Review
• vCal, vCard
73
Research
Community Tools
• We’re seeding the Vocabulary and Standards Support
• We’ll evolve both of these with the help of the Web Community
• Yahoo! Groups: used to collect contributor and community suggestions, feedback, etc…
• Suggestions Board to vote on changes
74
Research
Implications for publishers?
• Yahoo! open search platform does not modify ranking
• Richer abstracts may provide more information to users and draw higher quality/quantity of clicks
• We want rich abstracts that give users a better experience– We don’t want misleading abstracts
75
Research
The whole story
• User needs becoming more complex
• Content growing, changing, diversifying, fragmenting
• Search responding by increase in sophistication
• Value migrating to ecosystem
• Unlock the value by enabling interoperability – expose semantics
76
Research
Subjective Queries
The kinds of queries that rely on domain expertise…
• “Do you know a reputable plumber in Atlanta?”
• “Where is the cool nightlife in Soho?”
• “What political blogs do you think I’d enjoy reading?”
• “Where can I buy a cool pair of boots?”
These kinds of queries are ill-served by today’s search engines, but are ironically the most valuable (i.e. transactional queries.)
77
Research
78
Research
79
Research
80
Research
No definitiveanswer
Unverifiableanswer
Community consensus
81
Research
Incentives
Legitimate?
82
Research
Where is the Science?
• Which questions are legitimate?
• What is the incentive system?
• How do we validate answers?
• What is the role of the community?
• What is the reputation system?
83
Research
What are the challenges?
• Community of users– Social system
• Incentives and reputations– Economic system
• Poorly phrased, grammatically limited queries– Language analysis
• Improving user experience from past data– Data mining
84
These are early days…Back to Business
Research
85
Research
Advertising: Brand and DR
Knowledge of users & their behavior throughout the purchase funnel can grow brand & direct response revenue
Awareness
Purchase
Consideration
> $200B Brand Advertising Market
> $200B Direct Response Market
Most time & activity is in consideration & engagement, but there are limited metrics & reach strategies
86
Why is search-related advertising so powerful?
A question for the Audience:
87
Research
It is all about Inferring User Intent
• User type 2.8 keywords– Note the non-sense use of average
– Average query returns > 600K matches!
• We get an idea of intent
• Coupled with immediacy (recency) an amazing matching engine
10x to 100x click through rate over banner ads
88
Research
Do I know this user’s intent?
89
Research
Brand Ads and Search Ads Interact!
• Is ad search strategy enough for a direct marketer?
• Do brand ads play a role in search advertising?
• Harris Direct Case Study
Awareness
Purchase
Consideration
90
Research
Case Study: Harris Direct
On:
Viewing These Ads: Had This Effect On:
• Brand Favorability– Up 32%
• Purchase Intent– Up 15%
• Aided Brand Awareness– Up 7%
91
Research
Case Study: Harris Direct
People who saw display ads were 61% more likely to search on related topics…
…and drove 139% more clicks on algorithmic and sponsored links…
…specifically driving 249% more sponsored search clicks …
…and driving 91% more activity on the HarrisDirect.com website.
92
Inventing the new sciences of the InternetYahoo! Research
Research
93
Research
New Science?
• The Internet touches all of our lives: personal, commercial, corporate, educational, government, etc…
• Yet many of the basic notions we talk about:– Search, Community, Personalization,
Engagement, Interactive Content, Information Navigation, Computational Advertising
– Are not at all understood, or well-defined
– These are not disciplines that academia or any industry research labs focus on…
94
Research
Areas of Research
• Information Navigation and Advanced Search– We are in the early days of search and retrieval– Inferring intent– New ways of extracting entities and objects
• Community: – How do you know what to believe on the Internet?– Trust models on-line and trust propagation– What makes communities thrive? Whither?– Social media, tagging, image and video sharing
• Microeconomics: a new generation of economics driven by massive interactions– Auction marketplaces– The web as a new LEI of activities and economies
• Computational Advertising– Targeting and matching sciences, Inferring user intent– Pricing models (CPM, CPC, CPA, CPL, etc…)– Large-scale optimization and yield management
95
Research
Concluding Thoughts (1)
• The notion of “corpus” and publishing is changing fundamentally
• We still do not have the basic sciences to understand what is happening and what needs to happen to combine the new capabilities
• The problem of mobile search is different, but poorly understood
• The web is changing, content sources are fragmenting and changing
– the source distribution is radically changing
– Publisher – consumer divide is becoming fuzzy
• Search engine interface is finally changing to adapt– Much of the change came from worrying about mobile search
96
Research
Concluding Thoughts (2)
• The view that Search is everything is LIMITED (at best)– Economics of publishing and advertising
– Users do not differentiate ad and content
– Behavioral data is the most powerful
– “Nothing predicts behavior like behavior”
• Monetization and economic value an intrinsic part of system design– Not an afterthought
– Mistakes are costly!
• Computing meets humanities like never before – sociology, economics, anthropology …
• A more holistic view of Search and Information Navigation is needed
98
Research
No time to cover today
• Micro-Economics of the Web– Auction marketplaces
– Marketplace and Exchange Design
– The economics of Engineering IT Decisions
• Computational Advertising– Targeting and matching sciences
– Inferring user intent
– Pricing models (CPM, CPC, CPA, CPL, etc…)
– Large-scale optimization and yield management