search analytics for fun and profit
DESCRIPTION
Lou Rosenfeld's presentation on local site search analytics; An Event Apart Chicago, August 27, 2007.TRANSCRIPT
Search Analytics for Fun and Profit
An Event ApartChicago, IllinoisAugust 27, 2007
Lou Rosenfeldwww.rosenfeldmedia.com
Who I Am
Information architecture consultant to Fortune 500s
Publisher and founder, Rosenfeld Media
Blog at www.louisrosenfeld.comCo-author, Information Architecture
for the World Wide Web (3rd ed., 2006; O’Reilly)
New book: Search Analytics for Your Site: Conversations with your customers (2008; Rosenfeld Media): www.rosenfeldmedia.com/books/searchanalytics
Anatomy of a Search Log(from Google Search Appliance)Critical elements in pink: IP address, time/date stamp, query,
and # of results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
XXX.XXX.XX.130 - - [10/Jul/2006:10:24:38 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=regional+transportation+governance+commission&ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 62 0.17
The Zipf Curve: Short Head, Middle Torso, Long Tail
Keep It In Proportion
7218 campus map5859 map5184 im west4320 library3745 study abroad3690 schedule of courses3584 bookstore3575 spartantrak3229 angel3204 cata
What’s the Sweet Spot?
Rank Cumul. % Count Query
1 1.40 7218 campus map
14 10.53 2464 housing
42 20.18 1351 webenroll
98 30.01 650 computer center
221 40.05 295 msu union
500 50.02 124 hotels
7877 80.00 7 department of surgery
Topical Patterns and Seasonal Changes
Where will you Capture Search Queries?
1. The search logs that your search engine naturally captures and maintains as searches take place
2. Search keywords or phrases that your users execute, that you capture into your own local database
3. Search keywords or phrases that your commercial search solution captures, records, and reports on (Mondosoft, Visual Sciences, Ultraseek, Google Appliance, etc.)
Querying your Queries: Getting started
1. What are the most frequent unique queries?2. Are frequent queries retrieving quality results?3. Click-through rates per frequent query?4. Most frequently clicked result per query?5. Which frequent queries retrieve zero results? 6. What are the referrer pages for frequent queries?7. Which queries retrieve popular documents?8. What interesting patterns emerge in general?
Tune your Questions:From generic to specific
Netflix asks1. Which movies most frequently searched?2. Which of them most frequently clicked
through?3. Which of them least frequently added to
queue?
Diagnose This: Fixing and improving the UX
1. User Research2. Content Development 3. Interface Design: search entry
interface, search results4. Retrieval Algorithm Modification5. Navigation Design6. Metadata Development
User Research:What do they want?…
SA is a true expression of users’ information needs (often surprising: e.g., SKU #s at clothing retailer; URLs at IBM)
Provides context by displaying aspects of single search sessions
User Research:…what else do they want?…
BBC provides reports to determine other terms searched within same session (tracked by cookies)
User Research:…who wants it?…
Specific segments needs as determined by: Security clearance IP address Job function Account information Alternatively, you may be able to
extrapolate segments directly from SA
Pages they initiate searches from
User Research:…who wants it?…
BBC’s top queries report from children’s section of site
User Research:…and when do they want it?
Time-based variation (and clustered queries) from MSU
By hour, by day,by season
Helps determine“best bets” development
Also can help tune main page and other editorial content
Content Development:Do we have the right content?
From www.behaviortracking.com
Analyze 0 result queries Does the content exist? If so, there are titling, wording, metadata, or indexing problems If not, why not?
Content Development:Are we featuring the right stuff?
Track clickthroughs to determine which results should rise to the top (example: SLI Systems)
Also suggests which “best bets” to develop to address common queries
BBC removes navigation pages from search results
Search Entry Interface Design:“The Box” or something else?Identify “dead end” points (e.g., 0 hits,
2000 hits) where assistance could be added
Query syntax helps you select search features to expose (e.g., use of Boolean operators)
OR
Search Results Interface Design:Which results where?#10 result is clicked through more often than #s
6, 7, 8, and 9 (ten results per page)
From SLI Systems (www.sli-systems.com)
Search Results Interface Design:How to sort results?Financial Times has found that users often
include dates in their queriesObvious but effective improvement: allow
users to sort by date
Search System:What to change?
Add functionality: Financial Times added spell checking
Retrieval algorithm modifications Financial Times weights company names
higher Netflix determines better weighting for unique
terms and phrases
Deloitte, Barnes & Noble, Vanguard demonstrate that basic improvements (e.g., Best Bets) are insufficient (and justify increased $$$)
Navigation:Any improvements?
Michigan State University builds A-Z index automatically based on frequent queries
Navigation:Where does it fail?
Track and study pages (excluding main page) where search is initiated What do they search? (e.g., acronyms,
jargon) Are there other issues that would cause
a “dead end”? (e.g., tagging and titling problems)
Are there user studies that could test/validate problems on these pages? (e.g., “Where did you want to go next?)
Metadata Development:How do searchers express their needs?Tone and jargon (e.g., “cancer” vs.
“oncology,” “lorry” vs. “truck,” acronyms)
Syntax (e.g., Boolean, natural language, keyword)
Length (e.g., number of terms/query; Long Tail queries longer and more complex than Short Head)
Everything we know from analyzing folksonomic tags applies here, and vice versa
Metadata Development:Which values and attributes?
Uncover hierarchy and identify Metadata values (e.g., mobile
vs. cell) Metadata attributes (e.g.,
genre, region) Content types (e.g., spec,
price sheet)
SA combines with AI tools for clustering, enabling concept searching and thesaurus development
Metadata Development:Leveraging differences in the curveVariations in information needs emerge
between Short Head and Long TailExample: Deloitte intranet’s “known-
item” queries are common; research topics are infrequent
known-itemqueries
researchqueries
Organizational Impact:Educational opportunities
“Reverse engineer” performance problems Vanguard
Tests “best” results for common queries Determines why these results aren’t
retrieved or clicked-through Demonstrates problem and solutions to
content owners/authors benefits
Sandia Labs does same, only with top results that are losing rank in search results pages
Organizational Impact:Reexamining assumptions
Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage
Discrepancy = possible breaking story; reporter is assigned to follow up
Next step? Assign reporters to “beats” that emerge from SA
SA as User Research Method: Sleeper, but no panaceaBenefits
Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns
Drawbacks Provides an incomplete picture of usage:
was user satisfied at session’s end? Difficult to analyze: where are the
commercial tools?Complements qualitative methods (e.g.,
persona development, task analysis, field studies)
SA Headaches:What gets in the way?
Problems* Lack of time Few useful tools for parsing logs, generating
reports Tension between those who want to perform
SA and those who “own” the data (chiefly IT) Ignorance of the method Hard work and/or boredom of doing analysis
Most of these are going away…
* From summer 2006 survey (134 responses), available at book site.
Please Share Your SA Knowledge:Visit our book in progress siteSearch Analytics for Your Site:
Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2008)
Site URL: www.rosenfeldmedia.com/books/searchanalytics/
Feed URL: feeds.rosenfeldmedia.com/searchanalytics/
Contact Information
Louis Rosenfeld Rosenfeld Media, LLC705 Carroll Street, #2LBrooklyn, NY 11215 USA
+1.718.306.9396
www.louisrosenfeld.comwww.rosenfeldmedia.com