searchpad: explicit capture of search context to support web search

Computer Networks 33 (2000) 493–501www.elsevier.com/locate/comnet

SearchPad: explicit capture of search context to support Web search

Krishna Bharat *

Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301, USA

Abstract

Experienced users who query search engines have a complex behavior. They explore many topics in parallel, experimentwith query variations, consult multiple search engines, and gather information over many sessions. In the process theyneed to keep track of search context — namely useful queries and promising result links, which can be hard. We presentan extension to search engines called SearchPad that makes it possible to keep track of ‘search context’ explicitly. Wedescribe an efficient implementation of this idea deployed on four search engines: AltaVista, Excite, Google and Hotbot.Our design of SearchPad has several desirable properties: (i) portability across all major platforms and browsers; (ii) instantstart requiring no code download or special actions on the part of the user; (iii) no server side storage; and (iv) no addedclient–server communication overhead. An added benefit is that it allows search services to collect valuable relevanceinformation about the results shown to the user. In the context of each query SearchPad can log the actions taken by theuser, and in particular record the links that were considered relevant by the user in the context of the query. The servicewas tested in a multi-platform environment with over 150 users for 4 months and found to be usable and helpful. Wediscovered that the ability to maintain search context explicitly seems to affect the way people search. Repeat SearchPadusers looked at more search results than is typical on the Web, suggesting that availability of search context may partiallycompensate for non-relevant pages in the ranking. 2000 Published by Elsevier Science B.V. All rights reserved.

Keywords: Search engines; Search context; Queries; Bookmarking; Data collection; Relevance information; JavaScript;Cookies

1. Introduction

As users gain expertise in searching on the WorldWide Web they begin to make use of the wide choicein search services available online. However, as theycast a wider net to locate the information they seek,they start to employ a more elaborate and complexsearch process. Experienced users searching on theWeb seem to have the following behavior.(1) They search on many unrelated topics in parallel,

often with many browser windows.

Ł Present address: Google Inc., 2400 Bayshore Parkway, Moun-tain View, CA 94043, USA. E-mail: [email protected]

(2) A given search for information may extend overmany sessions. They may terminate and restartthe browser between sessions.

(3) For each information need they use manyqueries, often by a process of query refinement.Power users may employ variants of queries thatworked well in other contexts.

(4) They may try the same query on many searchservices. (By a search service we mean searchengines such as AltaVista [1] and Google [4],meta-search engines such as AskJeeves [2] andMetaCrawler [6], and resource directories suchas Yahoo! [12] and Open Directory [8].)

(5) Some users may look at more than one searchresult page.

1389-1286/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved.PII: S 1 3 8 9 - 1 2 8 6 ( 0 0 ) 0 0 0 4 7 - 5

494 K. Bharat / Computer Networks 33 (2000) 493–501

(6) When they do find a useful result, they are oftenunsure whether the information they have foundis the best available or they should search further.

The trouble with the above behavior is that theuser needs to carry around a lot of contextual in-formation over time and there is no convenient wayto record it or make it explicit. Specifically, theyneed to remember URLs of potentially useful resultsas they look for more results, and remember usefulqueries over time. Both of these can be hard to mem-orize. Saving information to the browser’s collectionof bookmarks is one potential solution. However,there are several reasons why this is not convenient,as outlined below.(1) Most users would be reluctant to contaminate

their bookmark list with tentative leads. The listof bookmarks is intended to store high-qualityWeb pages that they wish to remember for a longtime, and not intermediate results.

(2) To remember a query one would need to book-mark a search result page. However, this providesno way to run the same query on a differentsearch service.

(3) As result pages and tentative results from manyqueries get bookmarked they become interleavedand hard to distinguish. One solution to avoidclutter would be to create bookmark folders foreach information need in advance, and bookmarkeach result and query into the appropriate folder.However, this takes too much effort on the partof the user.

In this paper we describe an extension to thesearch result page called SearchPad, which helpsusers search more effectively by explicitly main-taining their ‘search context’. By search context,we mean queries recently deployed by the user,along with hyperlinks of result pages the user visitedand=or liked in the context of each query. SearchPadis very similar to a bookmarks window except thatit is search-specific and maintains a relationship be-tween queries and links the user would like to keeptrack of (which we call leads). As with bookmarks,clicking on a saved lead causes the correspondingpage to be loaded in the browser. Saved queries canbe replayed on other search engines.

To make SearchPad usable by a large audiencewe had two design goals which made the implemen-tation of the system challenging.

ž To appeal to the widest possible audience wewanted an implementation that was portable (i.e.,worked on all browsers and platforms) and did notimpose any overhead on users (loaded quickly andtransparently). The rationale for the latter condi-tion was the feeling that many users would beunwilling to use a search service which requiredthem to first explicitly download a modified clientor a plug-in. Indeed, on some platforms the delayof several tens of seconds in starting the Java Vir-tual Machine makes even Java a poor choice forimplementation.ž A second design goal was to not impose any extra

storage or communication overhead on the searchservice. This greatly simplifies the integrationof SearchPad support into new search services.However, this implies that all control and storagehappens at the user end.Our implementation of SearchPad uses cookies

and a subset of Javascript that is known to workon all platforms. It loads instantly with the firstresult page obtained from the search service, andcommunicates with the server on an as needed basis.To simulate the behavior of actual services providingsupport for SearchPad, we implemented a proxythat provides access to four major search engines:AltaVista, Excite, Google and HotBot. The role ofthe proxy was to transform result pages streamingthrough to make them appear as they would if theywere implementing a service such as SearchPad.Note that other pages such as result pages are notfetched through the proxy. They are fetched directlyfrom the World Wide Web. The role of the proxy isonly to simulate how search engines would behave ifthey supported SearchPad.

As we shall describe, our implementation pro-vides an additional service. It allows the searchservice to collect query-specific result relevance andusage data. Specifically, SearchPad can log for eachclient:(1) queries that were issued;(2) result pages viewed for each query;(3) result hyperlinks considered relevant for each

query;(4) the order in which result pages were viewed;(5) the time spent viewing the result;(6) whether a result hyperlink considered relevant

was actually viewed by the user.

K. Bharat / Computer Networks 33 (2000) 493–501 495

Such information is valuable to search providers.It can be used to statistically compare two rankingalgorithms and find out which one is better. Simi-larly, it can be used to compare two search services.It can also be used to discover the most relevantpages for popular queries, which in turn can be usedto improved results for those queries in the future.

Collection of usage data raises concerns aboutprivacy and the author strongly supports the privacyof users on the Web. However, the major vulnera-bility from the user’s point of view is having searchservices know about their interests. Unfortunately,this is already revealed by the query. The infor-mation we collect, namely the results they viewedand found useful, reveals more about the quality ofthe pages returned than about the user. Thus weargue that this is not a further breach of privacy.In any case there other search companies on theWeb, notably Direct Hit [3], with a business modelbased on collecting data on the pages that users lookat. They count ‘click-throughs’ (i.e., the number oftimes users click on a particular link) received byresult links for popular queries and reorder resultsbased on perceived popularity. We believe that theinformation we can collect is superior to Direct Hit’sdata, because we discover the results people actuallyliked — not just the results they clicked on. Evenwhen a query finds no useful results, users tend toclick on a few results per query to understand whathappened, which can contaminate the click-throughlog. With our scheme the data collected is purer.Also, collecting click-throughs as done previouslyimposes an overhead both on the server and the user.In Direct Hit’s scheme click-throughs are trapped byredirecting result accesses through a Web server thatlogs the data and then issues a redirect to the actualresult page. With our scheme all logging is done atthe client without an extra HTTP access.

2. Interaction with SearchPad

We present a walk-through to illustrate the user’sinteraction with SearchPad. In our implementation,the user accesses AltaVista, Excite, Google and Hot-bot through special URLs that route communicationwith the engines through the SearchPad proxy.

Fig. 1 shows an AltaVista page transformed by

the proxy. The ‘SearchPad’ button at the top leftbrings up the SearchPad agent (Fig. 3), allowing theuser access to any previously marked queries andleads. Each result on the AltaVista page has a blue‘Mark’ button associated with it. Clicking on thisbutton causes the corresponding link to be addedto SearchPad, along with the corresponding query.If the query already exists the link is merged intothe existing set of links. Links added to SearchPadare called ‘leads’. Note that marking is a cheapoperation, and involves only a local transfer of datafrom the result page to the SearchPad agent. Nonetwork communication occurs and hence no delay.This is illustrated in Figs. 2 and 3.

Fig. 2 shows the second AltaVista result for thequery: genetic engineering, which has just been vis-ited by the user. All visits to result page and timespent therein are logged by SearchPad as part ofits data collection process. Also, on return from aresult page, the blue ‘Mark’ button for the just-vis-ited result link turns red as in Fig. 2 (hard to see ingray-scale). The color change is an invitation for theuser to mark the link. Also, it makes the result easierto spot, increasing the likelihood of the user markingthe lead if they liked it.

Fig. 3 shows the SearchPad window with a list ofthree queries bookmarked, and under each query a listof leads. SearchPad is merely a Web page renderedby Javascript code, and appears within an independentbrowser window. Each query has an open=close tri-angular toggle to control the visibility of leads underit. E.g., clicking on an open toggle causes it to close.The last marked query (‘genetic engineering’) is atthe top of the SearchPad heap and hence most visi-ble. Queries in SearchPad are maintained in a most-recently accessed order to keep pace with the user’svarying interests. For each marked lead the title isshown, hyperlinked to the corresponding Web page.To conserve space only the hostname is shown af-ter it. SearchPad is designed to have the form factorof a small notepad — small yet useful for recordingessential information during a search.

Each query has a circular selector in front of it tosupport query selection. To send a query to a searchengine the user would first select the query and clickon the search engine. If they selected the most recentquery, ‘genetic engineering’, and clicked on Google,they would get the result set shown in Fig. 4.


Fig. 1. An AltaVista result page extended with SearchPad support.

Fig. 2. Result visited by the user and then marked.

In Fig. 5 the user has subsequently marked thelead labeled ‘MelissaVirus.com: The very latestMelissa Virus information’ for the (repeat) query‘melissa virus’. This moves ‘melissa virus’ to the topof the list of SearchPad queries and adds the newlead to the end of the list of leads for the query.

SearchPad also has an ‘Edit Mode’ (see Fig. 6) to

support changes to the stored data. This is because,although the browser may shutdown and the machineget rebooted, the information stored in SearchPad ispermanent. Hence, the user may periodically want todelete some leads or queries to free up space. Alsothey might want to merge the leads classified un-der various related queries into a single meaningful


Fig. 3. The SearchPad Helper with a new query: ‘genetic engi-neering’ and a new lead.

query. In ‘Edit Mode’, SearchPad is still fully func-tional, except that it provides extra buttons to edit itsstate. The cross (‘X’) marks are buttons to delete thequery or lead they are associated with. Queries canbe renamed by clicking on ‘Rename’, which bringsup a dialog to enter the new query. If the new querymatches another existing query the leads in the twoqueries are merged. The old query is discarded.

3. Implementation

In this section we describe the implementation ofSearchPad.

Since a design goal was to not require any extrastorage at the server or impose any communicationoverhead for marking, all storage and computation ismoved to the client. Consequently, we implementedSearchPad as an HTML document containing em-bedded code in Javascript [5] (VB Script [11] couldhave been used as well). Result pages are extended

by embedding code in Javascript as well. When theuser marks a link the associated scripting code com-municates the link’s URL and associated informa-tion to a corresponding piece of code in SearchPad.The code within SearchPad then updates its displayshowing the new link.

This approach is faced with the following problem.Embedded scripts are constrained by the browser bothin terms of access (i.e., limited access to other win-dows) and storage (no access to the file system) in thenormal mode of operation. In some Web browsers,the embedded scripts can request the user for moreaccess to the Web browser’s state. Nonetheless thisis not useful because many users will refuse such arequest, since it might represent a security risk. Thus,embedded scripts face many restrictions. We describenext how these may be overcome.

We use cookies (a mechanism for host-specificpersistent client-side storage) both for communica-tion between the result page and SearchPad, and forpersistent storage. (See RFC 2109 [9] and Netscape’sdocumentation on cookies [7].) A client such asNetscape’s Navigator which implements RFC 2109,supports a limited amount of client-side storage inthe form of cookies. Each cookie holds 4 Kb of textdata, and each host a user visits will be allowedat least 20 cookies. Such cookies are persistent andsave their state on the user’s hard disk. This allowsSearchPad to remember marked leads across Webbrowser sessions.

Javascript was ideal for our implementation be-cause it allowed cookies to be read and writtenfrom within the browser. A restricted form of cook-ies sharing is possible between Javascript instances.This allows code on the result page to pass messagesto code in SearchPad. Javascript is single-threadedacross the entire browser. This gives us mutual exclu-sion and simplifies the design. Also, Javascript hassupport for timer-driven callbacks which is neededto implement polling behavior within SearchPad.

The search service (or a proxy server throughwhich the search service is accessed) embeds a but-ton (or equivalent device) within each search resultto allow the user to ‘mark’ the result as a lead.The button links to embedded code in JavaScript.The code is invoked when the button is clicked, andcauses relevant information about the link and queryto be written to a log maintained in a set of cookies,


Fig. 4. Google results for the replayed query: ‘genetic engineering’.

associated with the Web site (known as the accesslog).

For example, the following are logged for each‘mark’ action:ž the query;ž the title, URL and rank of the result being marked;ž the time at which the event occurred.

Similarly, when a result’s hyperlink is clickedto view the result page, we log the same type ofinformation in association with the ‘view’ event.When the user returns to the page containing searchresults after viewing a result page, the ‘return’ eventis logged as well, with a time-stamp. When a ‘return’event follows a ‘view’ event, the time differenceprovides an estimate of the time spent viewing theresult page.

All the information collected above resides in aset of cookies associated with the originating Website, and is available to scripts executing within otherpages downloaded from the same site. In particu-lar it is visible to SearchPad, which is an HTMLdocument whose contents are dynamically generated

by embedded Javascript. SearchPad polls the accesslogs every few seconds in order to respond to events.

All the data needed by SearchPad to displaymarked queries and leads to the user are maintainedin a cookie access log. When the cookie access log isupdated due to a new event which requires a changein SearchPad’s display, the SearchPad code initiatesa ‘soft’ reload. The soft reload operation fetchesthe cached Web page corresponding to SearchPadfrom the browser cache and executes the code again.At this point the Javascript reads the cookies andredraws itself to reflect the new state. To initiate areload, either the code in the result page can signalSearchPad to notify that the state has changed, orSearchPad can periodically examine the cookie logto see if new leads have been added (as in ourimplementation). Changes to SearchPad’s displaydue to interaction with SearchPad (e.g., open=closeoperations and mode change operations) are handledsimilarly. The Javascript event handler updates thevisual state of SearchPad represented in the cookies,and initiates a soft reload.


Fig. 5. Updated SearchPad page.

Eventually, as events accumulate and leads areadded, the storage available in the cookie access logwill be exhausted. At this point either the user canbe prevented from marking any more leads (unlesssome are deleted), or SearchPad can compress thedata. We support a clever form of data compressionto free up more space.

To compress data in the cookie access log,SearchPad does a ‘hard’ reload of itself. This causesthat fresh copy of the SearchPad Web page is fetchedfrom the server ignoring the cache. The cookiescomprising the cookie access log are configured sothat they are transmitted to the Web server everytime SearchPad is reloaded over the net. Also, afresh set of cookies are transmitted back from theserver and overwrite the previous cookies. This ispart of the standard RFC 2109 cookie exchange pro-tocol. We use this to transfer activity log informationto the server and also to reduce the data stored inSearchPad as follows.(1) All data in the event log that the server needs

Fig. 6. Edit Mode SearchPad view after all Genetics relatedpages are merged under ‘Genetic Links’.

to keep for its data collection is logged at theserver. The remaining logged data is cleared inthe cookies.

(2) The verbose information for each newly book-marked lead is removed from the cookies. This isbecause the same information is already presentat the server. Each lead is replaced by anidentifier representing the URL (known as theURLID), based on the internal handle to theURL at the server.

(3) However, the URLID is not intelligible to theuser and unsuitable for presentation. Hence, theserver dynamically generates a new version ofthe SearchPad Web page in which the Javascriptcode is augmented with a lookup table mappingURLIDs to title and URL information, for allmarked leads. This allows the same presentationto be given to the user as before compression.However, the bulk of the data is moved from the


cookies to the SearchPad Javascript code. Sincethe mapping from URLIDs (server internal ids)to title and URL information is assumed to beavailable at the server, no extra storage is neededat the server to support the user base.

To ensure timely data collection at the server,SearchPad is configured to periodically hard reloaditself, thus logging the user’s activity periodically.Further, to avoid transmitting the cookies to theserver during other communications, the cookies areconfigured so that they will be transmitted only whenSearchPad is reloaded and not when result pagesare fetched. We do this by associating SearchPadwith a path that extends the path of result pages,as explained in RFC 2109. This has the effect ofallowing SearchPad to read cookies set by resultpages but not vice versa.

4. Experience

We conducted a trial of the SearchPad serviceat our research laboratory — Compaq, Systems Re-search Center — from May 6 to September 3, 1999.The service was available on the company intranet,but most of the usage was by the research staff ofthe Systems Research Center (about 50 people), andto a smaller extent by Compaq Research as a whole(about 150 people). Logs were collected in partiallyshrouded format so that queries themselves were un-recognizable, but hostnames and other details werepreserved. Our logs show that accesses outside theresearch community did not contribute significantlyto usage.

Table 1 summarizes the usage statistics for the4-month period. This does not include accesses bythe author for testing. The aim of the study wasto understand if people would find our service use-ful. Although users were invited to use the systemthrough internal advertising, no incentive was givento make them use it. Also, assurances were giventhat we would protect their privacy. Hence, we didnot attempt to keep track of the results that werebookmarked, since within a small community suchinformation might reveal more than it would on theInternet at large. Also, we would need a large userbase to collect a statistically significant sample of us-age information to make any relevance judgements.

Table 1Usage statistics from a 4-month trial

Number of result pages viewed:AltaVista 1352Excite 148Google 724HotBot 57

Total 2281

Number of distinct accessing hosts 178Number of distinct queries 1133Average number of result pages=query 2.01Average number of result pages=host 12.8Percentage accesses w SearchPad ‘docked’ 8

The high usage of AltaVista may be biased by thefact that AltaVista was created by Compaq Research.The usage of the other engines can be taken torepresent perceived value by our user base. In mostcases each host in the log corresponds to a distinctuser. We were curious to see if usage patterns wouldchange with the SearchPad model of searching. Forexample, we were curious if users look at morepages, since they now had the option of keepingtrack of temporary leads? The average number ofresult pages per query was 2.01, which is higherthan previously reported (e.g., 1.39 was reported in aprevious study by Silverstein et al. [10]). Our numberis somewhat diluted by the presence of casual userswho used SearchPad marginally, possibly for testqueries. Considering more seasoned users (users whoused SearchPad to view more than 50 result pages),the number of result pages viewed per query isslightly higher D 2.15. We noticed a large numberof single page views in the logs, even for seasonedusers. A single result page view is often evidenceof the fact that the users found the result they werelooking for immediately (i.e., the ranking was good),or that they were disappointed with the query andformulated a better query. If we consider only casesin which users looked at more than one result pagewe find that the average page views per query ishigher D 3.98. This suggests that having a toolto record search context may encourage users toexplore result sets more deeply, and compensate forsome non-relevant pages in the ranking.

The only interface design choice we tried to eval-uate was the option of attaching SearchPad to the leftof the results window, as an extra frame. This was


done by clicking on the ‘SearchPad’ button at thetop left of the result page (see Fig. 1). We call this‘docking’. Each result window could have a dockedversion of SearchPad potentially. Docking was hardto implement since it meant keeping several versionsof SearchPad synchronized. However, the user studyshows that only 8% of the users liked the dock-ing option. This actually reduced to 5% for userswith more than 50 result page views, suggesting thatembedding SearchPad in a frame is not convenient.

SearchPad was tested on Netscape versions 3and higher on Unix, MacOS, and Windows 95=NT,and on Internet Explorer versions 4 and higher onWindows 95=NT, and found to work reliably.

5. Conclusions

In this paper we describe an extension to searchengines to explicitly maintain user search contextas they look for information, on many topics, usingmany search engines, and over many sessions. Bysearch context we mean queries that were previouslydeployed and considered useful, and promising re-sult links associated with each query. SearchPad isan agent that works collaboratively with result pages,and allows users to remember queries and associatedleads in a convenient helper window. Unlike book-marks, which correspond to the user’s long-termmemory of information, the leads in SearchPad con-stitute the user’s short-term memory and representwork in progress. They tend be less valuable thanbookmarks and are maintained only as long as theuser’s information need is current. Hence we per-ceive SearchPad as a complement to the browser’sbookmarks facility.

SearchPad is implemented as a Javascript exten-sion to the search results page. We demonstratedthe generality of our design with an implementationthat works on four major search engines. Our imple-mentation is highly portable, requires no downloador start-up delay, needs no storage at the browserand does not increase the communication overheadwith the server. An added benefit is that SearchPadcan record user actions on the search result page, andalso discover which results are most valuable to usersin the context of specific queries. This imposes lessoverhead and is qualitatively more useful than the in-

formation collected using the click-through trackingstrategy of search engines such as Direct Hit.

The service was tested in a multi-platform envi-ronment with over 150 users for 4 months and foundto be usable and helpful. It is possible that the abil-ity to maintain search context explicitly affects theway people search. Repeat SearchPad users lookedat more search results than reported previously. Thissuggests that explicit availability of search contextmight partially compensate for non-relevant pages inthe ranking.

References

[1] AltaVista: http://www.altavista.com/[2] AskJeeves: http://www.ask.com/[3] Direct Hit: The Direct Hit Technology — A White Paper,

Direct Hit Inc., http://system.directhit.com/whitepaper.html[4] Google: http://www.google.com/[5] Javascript: Javascript Reference, Netscape, http://developer.

netscape.com/docs/manuals/communicator/jsref/contents.htm

[6] MetaCrawler: http://www.metacrawler.com/[7] NetscapeCookies: Persistent Client State — HTTP Cook-

ies, Netscape, http://www.netscape.com/newsref/std/cookie_spec.html

[8] Open Directory: http://www.dmoz.org/[9] RFC 2109: HTTP State Management Mechanism, http://

andrew2.andrew.cmu.edu/rfc/rfc2109.html[10] C. Silverstein, M. Henzinger, H. Marais and M. Moricz,

Analysis of a very large AltaVista query log, CompaqSRC, Technical Note, 1998-014, ftp://ftp.digital.com/pub/DEC/SRC/technical-notes/SRC-1998-014.pdf

[11] VB Script: http://msdn.microsoft.com/scripting/vbscript/default.htm

[12] Yahoo: http://www.yahoo.com/

Krishna Bharat is a member ofthe research staff at Google Inc.in Mountain View, California. For-merly he was at Compaq Com-puter Corporation’s Systems Re-search Center, which is where theresearch described here was done.His research interests include Webcontent discovery and retrieval, userinterface issues in Web search andtask automation, and relevance as-sessments on the Web. He received

his Ph.D. in Computer Science from Georgia Institute of Tech-nology in 1996, where he worked on tool and infrastructuresupport for building distributed user interface applications.

searchpad: explicit capture of search context to support web search

Documents