a community-based approach to personalizing web searchpeterb/2480-122/socialsearchreadings.pdf ·...

9
0018-9162/07/$25.00 © 2007 IEEE 42 Computer Published by the IEEE Computer Society C O V E R F E A T U R E through rates should be preferred during ranking. Unfortunately, Direct Hit’s so-called popularity engine did not play a central role on the modern search stage (although the technology does live on as part of the Teoma search engine) largely because the technology proved inept at identifying new sites or less well- traveled ones, even though they may have had more contextual relevance to a given search. Despite Direct Hit’s fate, the notion that searchers themselves could influence the ranking of results by virtue of their search activities remained a powerful one. It res- onates well with the ideas that underpin the social web, a phrase often used to highlight the importance of a suite of technologies and ideas that see users playing a more active role in Web content creation and management. From blogs to wikis, social networks to tagging, the social web emphasizes the importance of community, participation, and sharing when it comes to the creation, organization, and dissemination of Web content. These ideas have influenced research into how the search behavior of communities of like-minded users can be harnessed and shared to adapt the results of a conventional search engine according to the needs and preferences of a particular community. Ideally, this leads to an improved personalized search experience that can deliver more relevant result pages that reflect the experi- ences of a community of users, effectively forming a collective search wisdom. At its heart, this collaborative Web search (CWS) approach promotes the idea that community search Researchers can leverage the latent knowledge created within search communities by recording users’ search activities—the queries they submit and results they select—at the community level.They can use this data to build a relevance model that guides the promotion of community-relevant results during regular Web search. Barry Smyth University College Dublin O ver the past few years, current Web search engines have become the dominant tool for accessing information online. However, even today’s most successful search engines struggle to provide high-quality search results: Approx- imately 50 percent of Web search sessions fail to find any relevant results for the searcher. The earliest Web search engines adopted an informa- tion-retrieval view of search, using sophisticated term- based matching techniques to identify relevant docu- ments from repeated occurrences of salient query terms. Although such techniques proved useful for identifying a set of potentially relevant results, they offered little insight into how such results could be usefully ranked. How then should documents be ranked and ordered? Some researchers 1,2 solved this problem when they realized that ranking could be greatly improved by evaluating the importance or authoritativeness of a particular document. By analyzing the links in and out of a document, it became possible to evaluate its relative importance within the wider Web. For example, Google’s famous PageRank metric assigns a high page-rank score to a document if it is itself linked to by many other documents with a high page-rank score, and it iteratively evaluates the page-rank scores for every document in its index for use during results ranking. Other researchers began exploring alternative rank- ing options. One notable alternative, implemented in the Direct Hit search engine, argued that search results should be ranked by their popularity among searchers. All other things being equal, results with higher click- A Community-Based Approach to Personalizing Web Search

Upload: others

Post on 31-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

0018-9162/07/$25.00 © 2007 IEEE42 Computer P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y

C O V E R F E A T U R E

through rates should be preferred during ranking.Unfortunately, Direct Hit’s so-called popularity enginedid not play a central role on the modern search stage(although the technology does live on as part of theTeoma search engine) largely because the technologyproved inept at identifying new sites or less well-traveled ones, even though they may have had more contextual relevance to a given search.

Despite Direct Hit’s fate, the notion that searchersthemselves could influence the ranking of results by virtueof their search activities remained a powerful one. It res-onates well with the ideas that underpin the social web,a phrase often used to highlight the importance of a suiteof technologies and ideas that see users playing a moreactive role in Web content creation and management.From blogs to wikis, social networks to tagging, thesocial web emphasizes the importance of community,participation, and sharing when it comes to the creation,organization, and dissemination of Web content.

These ideas have influenced research into how thesearch behavior of communities of like-minded userscan be harnessed and shared to adapt the results of aconventional search engine according to the needs andpreferences of a particular community. Ideally, this leadsto an improved personalized search experience that candeliver more relevant result pages that reflect the experi-ences of a community of users, effectively forming a collective search wisdom.

At its heart, this collaborative Web search (CWS)approach promotes the idea that community search

Researchers can leverage the latent knowledge created within search communities by

recording users’ search activities—the queries they submit and results they select—at the

community level.They can use this data to build a relevance model that guides the

promotion of community-relevant results during regular Web search.

Barry SmythUniversity College Dublin

O ver the past few years, current Web searchengines have become the dominant tool foraccessing information online. However, eventoday’s most successful search engines struggleto provide high-quality search results: Approx-

imately 50 percent of Web search sessions fail to findany relevant results for the searcher.

The earliest Web search engines adopted an informa-tion-retrieval view of search, using sophisticated term-based matching techniques to identify relevant docu-ments from repeated occurrences of salient query terms.Although such techniques proved useful for identifyinga set of potentially relevant results, they offered littleinsight into how such results could be usefully ranked.

How then should documents be ranked and ordered?Some researchers1,2 solved this problem when they realizedthat ranking could be greatly improved by evaluating theimportance or authoritativeness of a particular document.By analyzing the links in and out of a document, it becamepossible to evaluate its relative importance within the widerWeb. For example, Google’s famous PageRank metricassigns a high page-rank score to a document if it is itselflinked to by many other documents with a high page-rankscore, and it iteratively evaluates the page-rank scores forevery document in its index for use during results ranking.

Other researchers began exploring alternative rank-ing options. One notable alternative, implemented inthe Direct Hit search engine, argued that search resultsshould be ranked by their popularity among searchers.All other things being equal, results with higher click-

A Community-BasedApproach to PersonalizingWeb Search

Page 2: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

August 2007 43

activities can provide a valuable form of search knowl-edge and that facilitating the sharing of this knowledgebetween individuals and communities makes it possibleto adapt traditional search-engine results according tothe community’s needs. Several natural community-based search scenarios motivate this work, and recentevaluation results speak to the potential of the collabo-rative Web search approach.

PERSONALIZING WEB SEARCHThe one-size-fits-all approach typified by conventional

search-engine technologies shows room for improvement.The vague queries that are commonplace in Web searchdo little to distinguish the searcher’s real informationneeds, while recent advances in areas such as user profil-ing and personalization suggestpotential solution strategies capableof delivering more relevant, person-alized search experiences. In this con-text there have been numerous recentdevelopments and practical applica-tions of personalized search using sev-eral different approaches.

For example, one common ap-proach seeks to leverage the searchhistories of individual users to per-sonalize future search sessions. Onestudy3 introduces a technique thatconstructs a client-side index fromall of the documents that a user cre-ates, copies, or employs on a client machine. This indexis treated as a type of user profile intended to disam-biguate the user’s search query terms and to improveresult relevance by reranking relevant documents withinsearch results. As users manipulate certain documents,this technique assesses the material as being of greateror lesser relevance to the user’s needs and assigns a rel-ative importance to terms used in the search.

A different approach leverages an individual user’ssearch history in combination with a general profilegleaned from the Open Directory Project (http://dmoz.org).4 The user’s search history is mapped to a set of con-tent categories drawn from the ODP. These categoriesthen serve as a source of preference or context terms forfuture queries.

Other researchers5,6 use search-selection histories tochoose a topic-sensitive PageRank value for eachreturned search result, which is then used to rank thoseresults. Previously selected search results serve as biasedindicators of user interest; each page in the engine’sindex has a number of PageRank values calculated forit, one for each top-level category in the ODP. At querytime, the search engine accesses the target user’s storedhistory to select an appropriate PageRank value for eachresult, depending on the user’s preferences. The enginethen uses these PageRank values to rank the returned

results so that they reflect any potential topic bias in theuser’s interests.

These exemplars show just a small subset of the ongo-ing research in the Web search personalization area.Indeed, leading commercial search engines Google(www.google.com/psearch) and Yahoo! (http://myweb2.search.yahoo.com) recently have undertaken related ini-tiatives, which have offered their own particular demon-strations of personalized search.

THE VAGUE QUERY PROBLEMWeb search represents a significant technological

challenge. The Web’s size and growth characteristics,and the sheer diversity of offered content types, repre-sent formidable information-retrieval challenges in their

own right. At the same time, as thedemographics of the Web’s userbase continue to expand, searchengines must be able to accommo-date an increasingly diverse range ofuser types and skill levels. In partic-ular, most users fail to live up to the expectations of the document-centric, term-based information-retrieval engines that lie at the heartof modern search technology. Theseengines, and the techniques they relyupon, largely assume well-formed,detailed search queries. But suchqueries are far from common in

Web search today. Instead, most Web search queries arevague or ambiguous with respect to the searcher’s trueinformation needs, and queries often contain terms notreflected in the target documents.

For example, consider a search query for a commonterm such as “jaguar.” Querying Google shows anemphasis on the importance of car-related meaningsfor this query, with nine of the top 10 results linking tocar-related products and services. Only one of the top10 results links to pages that relate to the wild cat.Other interpretations, from the NFL football team toApple’s OS X operating system, appear much furtherdown the list.

A query such as “jaguar” is inherently vague, offeringa search engine such as Google little insight into thesearcher’s intention. Nevertheless, such queries are com-monplace, with many researchers noting that a typicalWeb query contains only two or three terms.7,8 Certainly,if searchers need data on the NFL or the operating sys-tem, they will be disappointed with Google’s first pageof results and, at best, must continue their search tolocate relevant results further down the listing.

Typically, developers respond to examples such as thisby declaring that users must be taught to provide moremeaningful and detailed queries. Although this makesperfect sense, it is not the perfect solution. Many users

The collaborative Web search

approach promotes the idea

that community search

activities can provide valuable

search knowledge, and

sharing this knowledge makes

adapting traditional search-

engine results possible.

Page 3: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

44 Computer

will continue to provide vague queries. Recent evidencesuggests that even when users do provide additionalquery terms, they might not select the types of terms thatwill help a search engine understand their needs.Essentially, a vocabulary gap exists because users willsometimes select terms that are not even present in theirdesired results. For example, researchers recently sub-mitted just under 7,700 queries to the three leadingsearch engines—Google, Yahoo!, and MSN—in an effortto locate a particular target page for each query. Theyestimated the effectiveness of each search engine in termsof the average percentage of times users retrieved the tar-get page within the top 10 results returned.

Figure 1 shows these results as a graph of retrievaleffectiveness against query size. The search engines

performed poorly overall, at best retriev-ing the target results in their top 10 searchitems less than 14 percent of the time.However, this is a particularly tough mea-sure of relevance. It is also clear that bothGoogle and Yahoo! perform consistentlybetter than MSN across all query sizes. Allthree search engines perform best forqueries with three terms, suggesting thatmodern search-engine technology hasbeen optimized for typical query lengths.

Retrieval effectiveness increases rapidlyas query size increases from one to threeterms, which supports the idea that encour-aging users to provide more detailed queriescan improve search-engine performance.However, this is true only to a point. Forqueries longer than three terms, a gradualdecline in retrieval effectiveness occurs thatappears to relate to the additional termsusers add to their queries. More often thannot, these extra terms offer search engineslittle help when it comes to identifying theirtarget document, and users frequentlychoose very specialized terms that do noteven occur in the target document.

REPETITION AND REGULARITY IN SEARCH COMMUNITIES

It is common practice to think aboutsearch as an isolated single-user activity,one that relies on the services of a genericsearch engine. In reality, there are manyscenarios in which search can be viewedas a type of community activity. For exam-ple, consider a wildlife information por-tal designed to provide users with accessto a host of wildlife-related resources. Theportal pages also host several search boxesso that visitors can easily initiate standardWeb searches as they browse; this is com-

mon practice with all the main search engines. Visitorsto this portal constitute an ad hoc community with ashared interest in wildlife. All other things being equal,searches originating from this portal will more likely bewildlife-related—a fact that the search engines provid-ing these search boxes typically ignore—but that thisresearch seeks to exploit as a means of improving thequality of subsequent result lists.

Many other examples of naturally occurring searchcommunities exist. For example, the employees of asmall- or medium-size company, or a group in a largermultinational, or even a class of students, might eachconstitute a search community with individuals search-ing for similar information in similar ways. Indeed, withthe advent of social networking services, thousands of

2

4

6

8

12

10

14

01 2 3

Query size4 5 6+

Retri

eval

effe

ctiv

enes

s (p

erce

nt)

Yahoo!MSNGoogle

Figure 1. Retrieval effectiveness versus query size.The search engines performed

poorly overall, at best retrieving the target results in their top 10 search results

less than 14 percent of the time.

10

90

0 0

50

100

150

200

250

>0 >0.25 >0.50Query similarity threshold

>0.75 >1.0 Exact

Repe

atin

g qu

erie

s (p

erce

nt)

Aver

age

num

ber o

f sim

ilar q

uerie

sPercent repetition Average number of similar queries

20

30

40

50

60

70

80

10094

19390.1

65.6

49.3 46.6

26.4

3555

25

Figure 2. Search patterns. Results of a 17-week study of the search patterns for

a set of about 70 employees at a local software company, showing the percent-

age of query repetition at various similarity thresholds and the mean number

of similar queries.

Page 4: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

August 2007 45

community members have previously selected for simi-lar queries. These results will likely relate to the com-munity’s wildlife interests. So, without any expensiveprocessing of result content, the search results can bepersonalized according to the community’s learned pref-erences. This lets novice searchers benefit from theshared knowledge of more experienced searchers.

COLLABORATIVE WEB SEARCHThe latent search knowledge created by search com-

munities can be leveraged by recording the search activ-ities of users—the queries they submit and results theyselect—at the community level. This data can then beused as the basis for a relevance model to guide the pro-motion of community-relevant results during regularWeb search. A key objective here is to avoid replacing aconventional search engine, instead enhancing its defaultresult lists by highlighting particular results that areespecially relevant to the target community. For exam-ple, regarding the “jaguar” queries, it should be possi-ble to promote some of the wildlife community’s pagesthat relate to the wild cat ahead of those related to cars.Thus, the most relevant results from a community perspective can be promoted to the top of the defaultresult list, while other results might simply be labeled asrelevant to the community but left in place.

How it worksTo achieve a more granular relevance scale, the meta-

search architecture shown in Figure 3 operates in coop-eration with one or more underlying search engines. For

more structured communities of friends with sharedinterests emerge daily. These emergent search commu-nities are interesting because of the high likelihood thatsimilarities will exist among community members’search patterns. For example, Figure 2 shows the resultsof a 17-week study of the search patterns for a set ofabout 70 employees at a local software company. Thisstudy examined more than 20,000 individual searchqueries and almost 16,000 result selections.

Figure 2 looks at the average similarity between queriesduring the study. On average, just over 65 percent of sub-mitted queries shared at least 50 percent (0.5 similaritythreshold) of their query terms with at least five otherqueries; more than 90 percent of queries shared at least25 percent of their terms with at least 25 other queries.Thus, searchers within this ad hoc corporate search com-munity seemed to search in similar ways, much more sothan in generic search scenarios, which typically showlower repetition rates of about 10 percent at the 0.5 sim-ilarity threshold.

This result, supported by similar studies of other searchcommunities,9 shows that, in the context of communitiesof like-minded searchers, Web search is a repetitive andregular activity. As individuals search, their queries andresult selections constitute a type of community searchknowledge. This in turn suggests that it might be possibleto harness such search knowledge by facilitating the shar-ing of search experiences among community members.

As a simple example, when visitors to the wildlife portal search for “jaguar pictures,” the collaborativesearch engine can recommend search results that other

Figure 3. Metasearch architecture.The collaborative Web search architecture provides for a form of metasearch with the result of

one or more underlying search engines augmented by community-based result promotions.

G

Adapter Adapter Adapter

Metasearch

RT

qn

qT

Hit-Matrix, H

15

Rele

vanc

e en

gine

RC

RS

SnS2

pj

HC Tj

Page 5: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

the consideration of pages that have been selected forqueries very similar to qT. For example, Equation 2 pro-vides a straightforward way to calculate query similar-ity by counting the proportion of terms shared by qT andsome other query qi.

(2)

This query-similarity metric can then be used as the basisfor a modified relevance metric, as Equation 3 shows:

(3)

The relevance of a page pj, with respect to some targetquery qT, is computed by independently calculating theexact query relevance of pj with respect to a set of queries(q1, ..., qn) deemed to be sufficiently similar to qT; in prac-tice only queries that share 50 percent of their terms withthe target query need be considered. The overall relevanceof pj with respect to qT is then the weighted sum of theindividual exact query relevance values, with the rele-vance of pj with respect to some qi discounted by the sim-ilarity of qi to qT. In this way, pages frequently selectedfor queries very similar to qT are preferred over pages lessfrequently selected for less similar queries.

Sample sessionFigures 4 and 5 show an example of collaborative Web

search in action. Figure 4 shows the results of a standardGoogle search for the vague query “O2,” which refersto the European mobile operator. These results clearlytarget the average searcher by providing access to nearbystores, pricing plans, and various company informationsites. In contrast, the results shown in Figure 5 corre-spond to the results returned by a collaborative Websearch for a community made up of the employees of alocal mobile software company. This time, the top threeresults have been promoted for this community. Theytarget more specialized information that has proven to beof recent interest to community members for this andsimilar queries. These promoted results are annotatedwith several community icons to reflect their popularity,the number of related queries associated with the result,and the recency of the community history.

PRACTICAL BENEFITSThe CWS technique for adapting a conventional search

engine’s results to conform with the preferences of a par-ticular community of searchers reveals that these com-munities take many different forms. These range fromad hoc communities that arise from users visiting a

W Rel p q q q

Relevance p

Cj T n

Ci n j

( , , ,... )

(...

1

1

=

=∑ ,, ) ( , )

( , ) (...

q Sim q q

Exists p q Sim

i T iC

i n j i

•=∑ 1qq qT i, )

Sim q qq q

q qT iT i

T i( , ) =

∩∪

46 Computer

simplicity, assume Google is the single underlying con-ventional search engine.

For example, consider a user u (as a member of somecommunity C) submitting a query qT. In the first instanceqT is submitted to Google to obtain a standard set ofranked results, RS. In parallel, qT is used to query thecommunity’s search knowledge base to produce anotherset of ranked results, RC, judged to be especially relevantto members of C based on their past search behavior.

Next, RS and RC are combined to produce a finalresults list, RT, which is presented to the user. Theseresult lists can be combined in many different ways. Onestrategy has worked well in practice: The top threeresults in RC are promoted to the top positions in RT

with all other RC results retaining their default positionin RS but being labeled as community-relevant.

Capturing community search knowledgeCapturing a community’s search behavior means

recording the queries submitted and the results selectedfor these queries, as well as their selection frequency. Thiscan be conceptualized as populating the communitysearch matrix, HC, called a hit-matrix, such that HC

ij refersto the number of times that a result page, pj, has beenselected for a query, qi. Thus, each row of a community’shit-matrix corresponds to the result selections that havebeen made over multiple search sessions by members ofC for a specific query qi. In turn, the column of the hit-matrix related to pj refers to the number of times that thecommunity has selected pj for different queries.

Making relevant promotionsHow then can the current query, qT, be used to iden-

tify results from a community’s hit-matrix as potentialpromotion candidates? To begin with, any previouscommunity history with respect to qT must be deter-mined—have any pages been selected in the past for qT?Assuming such pages exist, the hit-matrix will containfrequency selection information with respect to qT, andthis information can be used to estimate the relevanceof each such page. For example, Equation 1 calculatesthe relevance of a result page pj with respect to the queryqT as the relative proportion of selections that pj hasreceived for this query:

(1)

As it stands, this exact query-relevance approach islimited because it restricts candidates considered for pro-motion to those pages previously selected for the spe-cific target query (qT). Certainly, the results shown inFigure 2 indicate that just over 25 percent of query sub-missions in the test community exactly match previoussubmissions. A more flexible approach would allow for

Relevance p qH

HC

j TTjC

TjC

j

( , ) =∀∑

Page 6: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

themed Web site to more structuredcommunities such as those formed bya company’s employees or a class ofstudents. Some of the results emerg-ing from a recent study of CWS in acorporate context show how it helpedemployees search more successfully asa result of sharing community searchknowledge.

The trial participants includedapproximately 70 employees from aDublin software company thatdeployed CWS for 10 weeks as theprimary search engine covering morethan 12,600 individual search ses-sions. During the trial all Googlerequests were directed to the CWSserver and the standard Google inter-face was adapted to accommodateCWS promotions and annotations, asFigure 5 shows.

During this initial 10-week trial,approximately 25 percent of searchsessions included CWS promotions,referred to as promoted sessions. Theremaining 75 percent carried thestandard Google result list, referredto as standard sessions.

While eliciting direct relevance feed-back from trial participants provedinfeasible, one useful indicator ofsearch performance looked at the fre-quency of successful sessions. A searchsession is successful if the searcherselects at least one result—an admit-tedly crude measure of performance.Result selections can be good indica-tors of at least partial relevance, butnot always. However, the lack of anyresult selections indicates that no rel-evant results have been noticed.

When researchers analyzed the suc-cess rates of trial search sessions, theyfound marked differences between thepromoted and standard sessions. Forexample, this analysis shows an aver-age success rate of just under 50 per-cent for standard Google searches,compared to a success rate of just over60 percent for promoted sessions—arelative advantage of approximately25 percent directly attributable toCWS promotions. Thus, communitypromotions made by collaborativeWeb search helped users to searchmore successfully.

August 2007 47

Figure 4. Standard vague query. Example of a one-size-fits-all search session for a

vague query,“O2,” which refers to the European mobile operator.The results

returned clearly target the average searcher by providing access to nearby stores,

pricing plans, and various company information sites.

Figure 5. Collaborative Web search. A search session personalized for the preferences

of a particular community of searchers who work for a software company involved

in developing mobile services and applications.The top three results have been pro-

moted for this community and target more specialized information proven to be of

interest to community members for this and similar queries.

Page 7: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

48 Computer

Sharing is an important theme in collaborative Websearch: Community members share past search experi-ences through result promotions. These promotions cancome from two different sources:

• The current searcher’s past history. One searchmight, for example, use a query similar to queriesused in the past, which will gather promotions basedon the user’s previous selection history. These self-promotions are useful when helping searchersrecover previously encountered results.

• A different community member’spast history. When users receive such peer promotions, they share the search experiences of other community members. These pro-motions are especially useful for helping users discover new results, and they potentially help draw on the experiences of more informed searchers within the community.

While CWS does not store infor-mation about the individual searcherby default, during the trial, recon-structed information about the originsof promotions was used to investigatedifferences between users’ behaviorwhen it came to sessions made up ofself- and peer promotions. This analy-sis generated revealing results.

For example, promoted sessionsmade up only of self-promotions havean average success rate of just under60 percent. By comparison, sessionsmade up of peer promotions have asuccess rate of about 66 percent, whilemixed sessions, made up of both self-and peer promotions, have an averagesuccess rate of more than 70 percent.This demonstrates that searchers dobenefit from the search experiences of others within their community.Further analysis looked at how fre-quently sessions containing promo-tions from a given source led to thosepromotions being selected. Sessionscontaining peer promotions havehigher click-through rates than ses-sions containing only self-promotions:a 60 to 70 percent click-through ratecompared to only 30 percent for self-promotions.

At this trial’s start it was not obvi-ous how well participants wouldserve as a coordinated search com-

munity. For example, would their search activities breakinto small clusters of related activity, or would manyindividuals search in ways markedly different from theirpeers and not participate in creating or consumingsearch knowledge?

The trial’s results showed that more than 85 percentof the participants became involved in the creation andconsumption of search knowledge. About 20 percent ofsearchers behaved primarily as search leaders in thesense that many of their searches corresponded to dis-covery tasks in which there was little or no community

Figure 6. Intercommunity collaboration facilitates the promotion of search results

from multiple communities.The example shown presents rugby-related results for

the host Rugby Union community.

Figure 7. Example of the results promoted to the Rugby Union community from the

related Irish rugby community.

Page 8: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

search knowledge to draw from. A similar percentageof searchers played the contrary role of search follow-ers: These users generally searched on topics alreadywell-known and thus benefited disproportionately frompeer promotions. The remaining 60 percent of users dis-played a mixture of roles, often producing new searchknowledge by selecting fresh results and consumingexisting search knowledge by selecting promotions.

BEYOND THE COMMUNITYGiven the benefits of creating and sharing search

knowledge within the community, the implications ofcooperation between related communities of searchersmust be considered. For example, a search communityservicing the needs of skiers in Europe might benefitfrom promotions derived from the community searchknowledge generated by a separate community of USskiers. A query for “late ski deals” by a member of theEuropean community would likely be answered by pro-motions for the latest deals offered by European skiresorts. At the same time, the searcher might benefitfrom hearing about the latest snow conditions and spe-cial deals in the US, knowledge that would be better rep-resented by the US ski community.

This idea has been explored in the context of the I-SPYsearch system,10 a separate implementation of collabo-rative Web search that lets users easily create and deploytheir own search communities. Figure 6 shows a screenshot of a result-list that has been generated for a mem-ber of one of several rugby-related communities, theRugby Union. The query submitted is for “6 nations,” apopular international rugby tournament, and the pro-moted results for the community appear ahead of othermatching results provided by the underlying searchengine. In addition, the screen shot also includes a set ofsearch tabs, each containing the promotions from a com-munity related to Rugby Union. Figure 7 presents thepromotions from the Irish rugby community, which pro-vide a different set of results for the “6 nations” query,results more appropriate for Irish rugby fans.

Related communities can be identified and their pro-motions ranked during search by, for example, rankingcommunities according to their similarity to the hostcommunity—the community where a particular searchoriginated.10 Intercommunity similarity can be calcu-lated based on the overlap between the results that havebeen selected between two different communities. Forexample, Rugby Union more closely resembles Irishrugby than a Manchester United community becausethe Irish rugby community will share many similarresults with Rugby Union, which is unlikely in the caseof Manchester United. In this way, a ranked set of sim-ilar communities can be produced, and those generat-ing the most relevant results can be recommended to thehost community as shown. The relevance of a resultfrom a related community can be scored in the usual

way, but further discounted by the related community’ssimilarity to the host.

This technique offers two potentially important ben-efits. First, the related communities can provide an alter-native source of interesting results, thereby improvingthe relevance and coverage of the results offered to theuser. Second, partitioning the results according to theircommunity provides a novel form of results clusteringthat does not rely on a detailed and computationallyexpensive analysis of a larger results set. Instead, eachrelated community forms a coherent cluster from aresults presentation perspective.

T he collaborative approach to Web search offers afurther advantage that many traditional approachesfail to provide: The vast majority of approaches to

personalized search focus on the individual’s needs andas such maintain individual user profiles. This representsa significant privacy issue because users’ search activitiescan be revealing, especially if a third party maintains theprofiles.11,12

In contrast, CWS avoids the need to maintain indi-vidual user profiles. The engine stores preferences at thecommunity level, thereby providing individual userswith access to an anonymous form of personalizedsearch. In an increasingly privacy-conscious world, CWScan provide an effective balance between the user’s pri-vacy on the one hand and the benefits of personaliza-tion on the other. ■

Acknowledgments

This material is based in part on works supported byChangingWorlds Ltd. and on works supported byEnterprise Ireland’s Informatics Initiative, and ScienceFoundation Ireland under grant no. 03/IN.3/I361.

References

1. S. Brin and L. Page, “The Anatomy of a Large-Scale WebSearch Engine,” Proc. 7th Int’l World Wide Web Conf.(WWW 01), ACM Press, 2001, pp. 101-117.

2. J.M. Kleinberg, “Authoritative Sources in a HyperlinkedEnvironment,” J. ACM, vol. 46, no. 5, 1999, pp. 604-632.

3. J. Teevan, S.T. Dumais, and E. Horvitz, “Personalizing SearchVia Automated Analysis of Interests and Activities,” Proc.28th Ann. Int’l ACM SIGIR Conf. Research and Develop-ment in Information Retrieval (SIGIR 05), ACM Press, 2005,pp. 449-456.

4. F. Liu, C. Yu, and W. Meng, “Personalized Web Search forImproving Retrieval Effectiveness,” IEEE Trans. Knowledgeand Data Engineering, vol. 16, no. 1, 2004, pp. 28-40.

5. F. Qiu and J. Cho, “Automatic Identification of User Interestfor Personalized Search, Proc. 15th Int’l World Wide WebConf. (WWW 06), ACM Press, 2006, pp. 727-736.

August 2007 49

Page 9: A Community-Based Approach to Personalizing Web Searchpeterb/2480-122/SocialSearchReadings.pdf · 2013. 12. 13. · PERSONALIZING WEB SEARCH The one-size-fits-all approach typified

50 Computer

6. T.H. Haveliwala, “Topic-Sensitive PageRank: A Context Sen-sitive Ranking Algorithm for Web Search,” IEEE Trans.Knowledge and Data Engineering, vol. 15, no. 4, 2003, pp.784-796.

7. S. Lawrence and C.L. Giles, “Accessibility of Information onthe Web,” Nature, vol. 400, no. 6740, 1999, pp. 107-109.

8. A. Spink et al., “Searching the Web: The Public and TheirQueries,” J. Am. Soc. Information Science, vol. 52, no. 3,2001, pp. 226-234.

9. B. Smyth et al., “Exploiting Query Repetition and Regularityin an Adaptive Community-Based Web Search Engine,” UserModeling and User-Adapted Interaction: The Journal of Per-sonalization Research, vol. 14, no. 5, 2004, pp. 383-423.

10. J. Freyne and B. Smyth, “Cooperating Search Communities,”Proc. 4th Int’l Conf. Adaptive Hypermedia and AdaptiveWeb-Based Systems, Springer-Verlag, 2006, pp. 101-110.

11. M. Teltzrow and A. Kobsa, “Impacts of User Privacy Prefer-ences on Personalized Systems: A Comparative Study,”Designing Personalized User Experiences in eCommerce,Kluwer Academic, 2004, pp. 315-332.

12. Y. Wang and A. Kobsa, “Impacts of Privacy Laws and Regu-lations on Personalized Systems,” Proc. CHI 2006 Workshopon Privacy-Enhanced Personalization, 2006, pp. 44-45.

Barry Smyth is the Digital Chair of Computer Science,School of Computer Science and Informatics, UniversityCollege Dublin. His research interests include case-basedreasoning, user modeling, and recommender systems withparticular focus on personalization techniques. Smythreceived a PhD in computer science from Trinity CollegeDublin. Contact him at [email protected].

By Susan K. LandNorthrop Grumman

Software process definition, documentation, and improvementare integral parts of a software engineering organization. This ReadyNote gives engineers practical support for such work by analyzing the specific documentation requirements that support the CMMI Project Planning process area. $19 www.computer.org/ReadyNotes

IEEEReadyNotes

IEEE Software Engineering Standards Support for the CMMI Project Planning Process Area