how to interpret implicit user feedback

1
How to Interpret Implicit User Feedback? Ladislav Peška Department of Software Engineering Charles University in Prague, Czech Republic Peter Vojtáš Department of Software Engineering Charles University in Prague, Czech Republic ABSTRACT We focus on interpreting user preference from his/her implicit behavior. There are many Relevant Behavior Types (RBT) e.g. dwell time, scrolling, clickstream etc. RBTs varies both in quality and occurrence and thus we might need different approaches to process them. In this early work we focus on how to interpret each RBT separately. We selected number of common RBTs and proposed several approaches to interpret RBT values as user rating. We conducted series of off-line experiments and A/B test on the real-world users of a Czech travel agency. The experiments, although preliminary, showed importance of considering multiple RBTs and various methods to threat them. METHODS FOR INTERPRETTING USER BEHAVIOR BINARY: for all visited objects set =1 (baseline) LINEAR: user based linear normalization (the more the better) COLLABORATIVE (did users with similar RBT purchased the product?) Select user visits with similar RBT - KNN: use k nearest neighbors according to the RBT - Distance: use all records from interval around RBT Compute purchase ratio, use sigmoid function COMBINED (single rating based on multiple RBTs) = + 1 + + 1 + ( + 1) RESULTS OF A/B TESTING OFFLINE EVALUATION: Pairwise comparison of purchased and non-purchased objects visited by each user. 8400 pairs of objects from 380 users with 450 purchases. Counting pairs ordered correctly, incorrectly and with the same value. ≔× + (1 − α) × + LESSONS LEARNED and POSSIBLE EXTENSIONS It is important to consider more refined implicit user feedback, than simple Binary visits. - We proposed several methods to transform raw implicit feedback (RBTs) into a user rating - The methods succeeded in the off-line experiments, however we are not yet able to confirm it in A/B testing (longer experiments needed) There are multiple options to combine ratings based on a single RBT e.g. weighting scheme, priorization, T-(co)norms. Also many RBTs were neglected in this study and other recommending algorithms should be also considered in future work. USED RELEVANT BEHAVIOR TYPES RBT Triggered event Coverage Pageview JavaScript Load() 99% Mouse JavaScript MouseOver() 44% ScrollTime JavaScript Scroll() 49% DwellTime Total time spent on page 69% Purchase Object was purchased 0.5% RBT, α=0.5 LINEAR DIST, ε=0.2 DIST, 0.9 KNN, 0.01 KNN, 0.7 Pageview 0.797 0.695 0.850 0.753 0.825 Mouse 0.772 0.561 0.799 0.695 0.822 Scroll 0.569 0.555 0.578 0.582 0.573 DwellTime 0.791 0.502 0.589 0.632 0.649 0,2% 0,4% 0,6% 0,8% 1,0% 1,2% 1 2 3 4 5 6 7 8 9 10 Click Through Rate Minimal Number of Visited Objects BINARY LINEAR (avg) BEST OFFLINE (avg) COMBINED 17 days of real deployment (ongoing), 4 groups of users, in total 4700 users, 135K recommended objects (6 or 12 objects in list), 1260 clicks. VSM recommender system, Click through rate. For users with more visited objects, Best Offline method gets the best results EXAMPLE: User U visited object O1, O2, O3: User U PAGEVIEW DWELL TIME SCROLL TIME O1 1 10 sec 0 sec O2 1 60 sec 0 sec O3 2 200 sec 10 sec Binary user rating : 1 =1 2 =1 3 =1 Linear PAGEVIEW DWELL TIME SCROLL O1 0.5 0.05 0 O2 0.5 0.3 0 O3 1 1 1 user rating (AVG of local ratings): 1 = 0.183, 2 = 0.267, 3 =1 local ratings for each RBT normalized to user max Based on different interpretation method, the same recommender (VSM) should propose different list of objects to the user regular user new user Best Offline in A/B testing is a combination (AVG) of methods with best offline results for each RBT.

Upload: ladislav-peska

Post on 11-Apr-2017

105 views

Category:

Software


1 download

TRANSCRIPT

Page 1: How to Interpret Implicit User Feedback

How to Interpret Implicit User Feedback?Ladislav Peška

Department of Software EngineeringCharles University in Prague, Czech Republic

Peter VojtášDepartment of Software Engineering

Charles University in Prague, Czech Republic

ABSTRACTWe focus on interpreting user preference from his/herimplicit behavior. There are many Relevant Behavior Types(RBT) e.g. dwell time, scrolling, clickstream etc. RBTs variesboth in quality and occurrence and thus we might needdifferent approaches to process them.In this early work we focus on how to interpret each RBTseparately. We selected number of common RBTs andproposed several approaches to interpret RBT values asuser rating. We conducted series of off-line experimentsand A/B test on the real-world users of a Czech travelagency. The experiments, although preliminary, showedimportance of considering multiple RBTs and variousmethods to threat them.METHODS FOR INTERPRETTING USER BEHAVIOR

• BINARY: for all visited objects set 𝑟 = 1 (baseline)

• LINEAR: user based linear normalization (the more the better)

• COLLABORATIVE (did users with similar RBT purchased the product?)

Select user visits with similar RBT- KNN: use k nearest neighbors according to the RBT- Distance: use all records from interval around RBTCompute purchase ratio, use sigmoid function

• COMBINED (single rating based on multiple RBTs)

𝑟 = 𝑙𝑛 𝐷𝑤𝑒𝑙𝑙𝑇𝑖𝑚𝑒 + 1 + 𝑙𝑛 𝑆𝑐𝑟𝑜𝑙𝑙𝑇𝑖𝑚𝑒 + 1 + 𝑙𝑛(𝑀𝑜𝑢𝑠𝑒𝑇𝑖𝑚𝑒 + 1)

RESULTS OF A/B TESTING

OFFLINE EVALUATION:Pairwise comparison of purchased and non-purchasedobjects visited by each user. 8400 pairs of objects from 380users with 450 purchases. Counting pairs ordered correctly,incorrectly and with the same value.

𝑄𝑝𝑎𝑖𝑟 ≔ 𝛼 ×𝐶𝑜𝑟𝑟𝑒𝑐𝑡

𝐴𝑙𝑙 𝑝𝑎𝑖𝑟𝑠+ (1 − α) ×

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 + 𝐸𝑞𝑢𝑎𝑙

𝐴𝑙𝑙 𝑝𝑎𝑖𝑟𝑠

LESSONS LEARNED and POSSIBLE EXTENSIONSIt is important to consider more refined implicit user feedback, than simple Binary visits.- We proposed several methods to transform raw implicit feedback (RBTs) into a user rating- The methods succeeded in the off-line experiments, however we are not yet able to confirm it in A/B testing (longer experiments needed)

There are multiple options to combine 𝑟𝑖 ratings based on a single RBT e.g. weighting scheme, priorization, T-(co)norms.

Also many RBTs were neglected in this study and other recommending algorithms should be also considered in future work.

USED RELEVANT BEHAVIOR TYPES

RBT Triggered event Coverage

Pageview JavaScript Load() 99%

Mouse JavaScript MouseOver() 44%

ScrollTime JavaScript Scroll() 49%

DwellTime Total time spent on page 69%

Purchase Object was purchased 0.5%

RBT, α=0.5 LINEAR DIST, ε=0.2 DIST, 0.9 KNN, 0.01 KNN, 0.7

Pageview 0.797 0.695 0.850 0.753 0.825

Mouse 0.772 0.561 0.799 0.695 0.822

Scroll 0.569 0.555 0.578 0.582 0.573

DwellTime 0.791 0.502 0.589 0.632 0.649

0,2%

0,4%

0,6%

0,8%

1,0%

1,2%

1 2 3 4 5 6 7 8 9 10

Clic

k Th

rou

gh R

ate

Minimal Number of Visited Objects

BINARY

LINEAR (avg)

BEST OFFLINE (avg)

COMBINED

17 days of real deployment (ongoing), 4 groups of users, in total 4700 users, 135K recommended objects (6 or 12

objects in list), 1260 clicks.

VSM recommender system, Click through rate.

For users with more visited objects, Best Offlinemethod gets the best results

EXAMPLE: User U visited object O1, O2, O3:

User U PAGEVIEW DWELL TIME SCROLL TIME

O1 1 10 sec 0 sec

O2 1 60 sec 0 sec

O3 2 200 sec 10 sec

Binary

user rating 𝑟: 𝑟𝑂1 = 1 𝑟𝑂2 = 1 𝑟𝑂3 = 1

Linear

𝒓𝒊 PAGEVIEW DWELL TIME SCROLL

O1 0.5 0.05 0O2 0.5 0.3 0O3 1 1 1

user rating 𝑟 (AVG of local ratings):

𝑟𝑂1 = 0.183, 𝑟𝑂2 = 0.267, 𝑟𝑂3 = 1

local ratings 𝑟𝑖 for each RBT normalized to user max

Based on different interpretation method, the same recommender (VSM) shouldpropose different list of objects to the user

regular usernew user

Best Offline in A/B testing is a combination (AVG) of methods with best offlineresults for each RBT.