a short-term bookmarking system for collecting …oak.cs.ucla.edu/~chucheng/publication/bigec14.pdf3...

A short-term bookmarking systemfor collecting user-interest data

Chu-Cheng Hsieh, Yoni Medoff, and Naren Chittar

{chsieh, ymedoff, nchittar}@ebay.comeBay, Inc.

2065 Hamilton AveSan Jose, CA 95125, USA

Abstract. During the shopping process, users typically narrow downtheir search to a small collection of products before making a final pur-chase. These data, consisting of products that users are considering pur-chasing, correlate strongly with user search intent and product desir-ability. By allowing users to bookmark products between browsing andpurchasing, we collect user-interest information. We then propose a prod-uct recommendation algorithm based on these data. By considering bothpopular and long-tail queries, we shed light on the potential usage of thedata.

Keywords: User Interface, Recommendation System, Collaborative Filtering

1 Introduction

Product recommendation systems have been a popular feature of e-commercesites for helping buyers find what they want. A common practice is to recommendproducts based on user behavior, such as “people viewing/buying this are alsoviewing/buying that.” Developing such a recommendation system is very chal-lenging for a site such as eBay. Applying “buy-together” is not feasible becausemany products are unique with respect to either price or condition (or both).On the other hand, “view-together” often requires further offline processes, forinstance, to break down user sessions into meaningful segments of intent, or toparse logs for finding similar products (belonging to the same cluster [6]). Fur-thermore, data collected from user activity logs often contain a certain degreeof noise.

One simple solution is to collect information directly from the users, that is,asking each user about his shopping intent and the products he is considering.However, mandating users to provide feedback is often very inconvenient for theuser, and may cause negative impact in user shopping experience. In this work,we propose a roundabout — we collect “user-interest data” by providing a short-term bookmarking mechanism. Our approach gives buyers a better shoppingexperience by helping them minimize the inconvenience of switching between

2

search result pages and product viewing pages. At the same time, the proposedprototype helps us in collecting the aforementioned data.

Note that data collected by our proposed system are very different from dataacquired from wishlists or collections.1 Of course, products that are collectedtogether usually share some similar characteristics, for example consider thecollections entitled baby stuff and new home. Often the intent of creating theselists are quite broad. For instance, a cradle could be assigned to baby stuff ornew home. Although further processes such as topic modeling techniques [3]could be applied in providing better recommendations by clustering, we wouldstill face the well-known “cold start” problem, that is, to infer a right decision,one needs to wait until sufficient data has been gathered and analyzed. Sincecreating a wishlist/collection is an optional behavior in the shopping process atan e-commerce site, many popular or desirable deals might be gone by the timewe collect enough data and complete our offline analysis.

Under the hood, our proposed solution is based on crowd wisdom and fre-quent itemset mining. We believe that, given search result pages of a query,products being considered by many users are often active listings and “gooddeals.” That is to say, overpriced, outdated, or suspicious listings are naturallyfiltered out by users. Next, through applying association rule learning, we caneasily make timely recommendations based on these high quality data. Our workcollects data that fills the gap between view-action and buy-action. It empowersc2c sites to provide in-session personalization — the user’s recommendations areaffected by the products bookmarked in real-time.

2 Collecting User-Interest Data

During the course of shopping, users often collect a small assortment of productsthat they are considering purchasing. They deliberate over this pool of products,and often choose to purchase one from among them. There are many ways thatusers can accomplish this behavior on current e-commerce sites, such as openingmultiple tabs, saving links, or going back and forth between product pages andsearch result pages. However, these methods can be tedious for many users,and the logs from such activities are not always directly related to user-interestlevels. Clicks and impressions are sometimes used to infer user-interest data,but it would be much more accurate to collect interest data explicitly from theuser. There is the additional challenge that any such system must also presentsignificant value to the user, so that he has some motivation to actively use thesystem when shopping on an e-commerce site.

We have developed a short-term bookmarking system which serves as onepossible method to facilitate the collection of user-interest data. This acts as amethod for users to actively collect a small number of products on eBay. Thecore part of the interface is comprised of an interactive portion at the bottomof the screen. Whenever a user is interested in a certain product on the site, he

1 Collections enable users to bookmark products and organized in one place, for ex-ample, http://www.ebay.com/cln, or http://www.pinterest.com

3

Fig. 1. Drag-and-drop bookmarking

simply drags the image of that product into this interface, as shown in Figure 1.Our interface allows him to easily bookmark such products, which are alwaysavailable for navigation from the strip of thumbnail images.

This system also features an intuitive user interface to ensure significantusefulness while maintaining a minimal footprint. The container that housesthese products is fairly intelligent in responding to user behavior on the page.When the user scrolls down, the container minimizes to get out of the user’sviewing area (toward the bottom of the screen). If the user scrolls up, or attemptsto interact with the interface by dragging a product image, the container willshift back into view. In this way, the system presents itself only when necessary.

Our system limits the maximum number of products a user can bookmark.This limitation forces the user to keep a small pool of products for which he hasexpressed interest, and it maintains data quality by ensuring that each productselection is important in making purchasing decisions for a single session. Thissystem is also designed to be persistent across browsing sessions and devices. Allbookmarked products remain unless explicitly deleted by the user. This func-tionality has two main advantages: (1) the user can work from multiple devicesor browsers while retaining the same bookmarks, and (2) we can observe dele-tion behavior, which can coincide with intent, such as “making room” for betterproduct choices.

Basic interaction with our system allows us to accurately predict user-interestdata in real-time. The system allows for many combinations of user actions,which we can interpret as varying levels of interest, especially in accordancewith other elements of the page and other actions on the site. For example,if a user adds an product and then quickly removes it, we can infer a low orsuperficial level of interest for that product. If a user removes the last productA, and then quickly replaces it with another product B, we can infer that theinterest level in B supersedes that of A. By aggregating this type of data over

4

many users, and combining it with purchasing behavior, search queries, browsingpatterns, etc., we can form very robust algorithms for product recommendations,and even search relevance in general.

3 Product Recommendations

In this section, we discuss our product recommendation algorithm. The algo-rithm consists of the following steps:

1. Convert the bookmarks into transactions2. Cluster transactions by query intent3. Use association rule mining techniques to derive recommendation candidates4. Rank candidates based on support, confidence, and other measurements

3.1 Algorithm

We now discuss our recommendation algorithm. Theoretically, any associationrule mining techniques [2, 12] could satisfy our need to generate recommenda-tions. However, due to performance concerns for e-commerce sites like eBay, wealways choose algorithms [7] that could be implemented using the map-reduceprogramming model.

Let tx = {i1, i2, ..., im} be an itemset where each item corresponds to oneproduct (bookmarked by drag-and-drop), and let Dk represent the collection ofall bookmarks that share the same intent k. One simple method is to considerevery t as a transaction and every query as an intent cluster Dk. For everycluster, we run an association rule mining algorithm, and all rules are associatedwith corresponding Support and Confidence [1] measurements.

Assuming a user bookmarks three items {A,B,C}, we seek association rulesfor which all items exist in the antecedent (left-hand-side), for example, {A,B,C} ⇒{D}. Then any item(s) in the consequent (right-hand-side) are considered legiti-mate candidates. Intuitively, when the antecedent of an association rule matchesexactly the items in a bookmark, every item in the consequent becomes a mem-ber of the candidate set. We rank those candidates by

−log CC + S

− log α ∗ SC + S

(1)

This formula refers to the Shannon entropy [10] except that we introduce α asa tuning parameter of popularity (support, denoted by S) and accuracy (confi-dence, denoted by C).

We have adopted two main strategies to control the size of the candidateset. First, by adjusting the threshold of support and confidence, we can in-crease or decrease the size of the candidate set. Sometimes, especially when auser bookmarks many products, it may be problematic to continue loweringthe thresholds of support and confidence in order to find candidates. In sucha scenario, we conduct soft-matching — the antecedent matches only a subset

5

of bookmarked products. Namely, if we have 7 products bookmarked, we seekassociation rules that contain at least, say, 6 products in their antecedents. Thescore of a candidate (for ranking) becomes

−n ∗ [logCC + S

+ logα ∗ SC + S

] (2)

where n represents the number of matched rules for the same candidate. Forexample, if we have two association rules {A,B} ⇒ {E} and {B,C} ⇒ {E},and a user bookmarks three items {A,B,C}, the score of the candidate item Ewould be doubled (n = 2) in a soft-matching case. Note that in soft-matchingcases, if there are better matches, for instance the rule {A,B,C} ⇒ {D}, weassign n to the number of possible combinations, i.e. it matches {A,B}, {A,C},and {B,C}.

3.2 Extension

In this section, we discuss two extensions for increasing recommendation quality— one targets popular queries, and the other targets long-tail queries.

In a short time windows, for a given query q, products returned by a searchranking algorithms are very similar, partly because there is often a delay to passinventory data in transactional databases into inverted indices (a central com-ponent of a typical search engine indexing algorithm). For popular queries like“iPhone,” in a short time window, say 30 minutes, the system probably collectshundreds or thousands of bookmarks to make recommendations. Therefore, con-straining association rule minings to a time window for popular queries ensuresthe products seen by users are mostly identical. It may also boost the chanceof bookmarking the same candidates, because according to the study conductedby Granka et al. [5], often products with the highest rankings draw the mostattention. Therefore, products in the first page become an indicator to learnuser preferences.

For e-commerce sites, long-tail queries are often difficult to handle becausethey require a long period of time in order to acquire enough data for completeanalysis. So far, we assume that a user intent (and its corresponding clusterDk) is constrained or represented by a user query, and this assumption becomesproblematic for long-tail queries.

We address long-tail queries by studying query transition (i.e. how usersreform their queries). Query transition has been studied extensively in the pastand has shown its success in helping query reformulation [4]. To address long-tailqueries, we rely on their preceding queries. For example, we enlarge the clusterof the query “iPhone 5 Gold 64GB” by considering both “iPhone 5 Gold” and“iPhone 5 64GB.”

We apply the notion of probability matching here for long-tail queries. As-suming a query transition graph is provided where every vertex corresponds to aquery, and a directed link from query a to q is associated with a probability Paq,

6

referring to the probability that a user reforms a into q. We identify a set of pre-ceding queries A where Paq ≥ θp, a ∈ A, and the query a contains an adequatenumber of reliable recommendations, i.e. without applying soft-matching.

θp is usually a parameter of controlling diversity — if θp is smaller, the num-ber of preceding queries A is larger. We then randomly select a preceding query xin proportion to its probability Pxq, i.e. Pxq/

∑a∈A Paq. The first time a preced-

ing query x is selected, we draw the first recommendation based on Equation 2,and the next time we draw the second one, and so on. This probability matchingensures diversity in guessing user intention based on transition probability. Ascreen shot of our recommendations is shown in Figure 2.

Fig. 2. Displaying recommendations to the user

4 Related Work

Product recommendation has been studied extensively in the past. Linden et al. [8]focused on long-living products based on co-purchasing behavior. Katukuri et al. [6]used clustering algorithms in recommending similar products. Later Xiao et al. [11]elicited the interests of individual customers, and Park et al. [9] used individ-ual/group profiling to generate personal recommendations. Most of the workrelated to production recommendation or user preferences are based on ana-lyzing data in logs or inventory. Our work differs in the methodology for bothcollection and analysis. We gather user-interest data through explicit user ac-tions. While other works infer user-interest from related data, we have devised asystem which naturally encourages the user to direclty produce these data. Sec-ondly, we emphasize the collection of these data in real-time and the potentialapplications that arise from real-time feedback.

7

5 Conclusion

In this work, we design an interactive bookmarking system that both improvesthe e-commerce shopping experience and provides valuable user-interest datafor researchers. This system responds to a users natural inclination during shop-ping to collect a small number of products in making a purchasing decision.It facilitates the gathering of valuable data on user-interest levels in particularproducts relative to other aspects of shopping behavior — which product usersare considering. These data, aggregated over many users, can be applied to manyimportant problems. Moreover, compared to logs that require post-processing,the user-interest data are cleaner and can therefore be processed in real-time.

Acknowledgement

We appreciate the help from Noah Batterson for his work in designing the userinterface of this project. In addition, we thank Lan Wang for sharing her insightsand help in building the first prototype. Our work is greatly influenced by theirknowledge of improving user experience.

References

1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between setsof items in large databases. In 19 ACM SIGMOD Conf. on the Management ofData, Washington,DC, May 1993.

2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in largedatabases. In International Conference On Very Large Data Bases (VLDB ’94),pages 487–499, San Francisco, Ca., USA, Sept. 1994. Morgan Kaufmann Publish-ers, Inc.

3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach.Learn. Res., 3:993–1022, Mar. 2003.

4. P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions usingquery-flow graphs. In Proceedings of the 2009 Workshop on Web Search Click Data,WSCD ’09, pages 56–63, New York, NY, USA, 2009. ACM.

5. L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behaviorin www search. In Proceedings of the 27th Annual International ACM SIGIRConference on Research and Development in Information Retrieval, SIGIR ’04,pages 478–479, New York, NY, USA, 2004. ACM.

6. J. Katukuri, R. Mukherjee, and T. Konik. Large-scale recommendations in a dy-namic marketplace. In Workshop on Large Scale Recommendation Systems atRecSys’13, 2013.

7. M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh. Apriori-based frequent itemset miningalgorithms on mapreduce. In Proceedings of the 6th International Conferenceon Ubiquitous Information Management and Communication, ICUIMC ’12, pages76:1–76:8, New York, NY, USA, 2012. ACM.

8. G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-itemcollaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003.

8

9. Y.-J. Park and K.-N. Chang. Individual and group behavior-based customer pro-file model for personalized product recommendation. Expert Systems with Appli-cations, 36(2, Part 1):1932 – 1939, 2009.

10. C. E. Shannon. A mathematical theory of communication. The Bell System Tech-nical Journal, 27:379–423, 623–656, July / Oct. 1948.

11. B. Xiao and I. Benbasat. E-commerce product recommendation agents: Use, char-acteristics, and impact. MIS Q., 31(1):137–209, Mar. 2007.

12. Zaki. Scalable algorithms for association mining. IEEETKDE: IEEE Transactionson Knowledge and Data Engineering, 12, 2000.

a short-term bookmarking system for collecting …oak.cs.ucla.edu/~chucheng/publication/bigec14.pdf3...

Documents