learning user preferences

45
Learning User Preferences Jason Rennie MIT CSAIL [email protected] Advisor: Tommi Jaakkola

Upload: keran

Post on 05-Jan-2016

71 views

Category:

Documents


1 download

DESCRIPTION

Learning User Preferences. Jason Rennie MIT CSAIL [email protected]. Advisor: Tommi Jaakkola. Information Extraction. Informal Communication: e-mail, mailing lists, bulletin boards Issues: Context switching Abbreviations & shortened forms Variable punctuation, formatting, grammar. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning User Preferences

Learning User Preferences

Jason RennieMIT CSAIL

[email protected]

Advisor: Tommi Jaakkola

Page 2: Learning User Preferences

Information Extraction

• Informal Communication: e-mail, mailing lists, bulletin boards

• Issues:– Context switching– Abbreviations & shortened forms– Variable punctuation, formatting, grammar

Page 3: Learning User Preferences

Thesis Advertisement: Outline

• Thesis is not end-to-end IE system

• We address some IE problems:

1. Identifying & Resolving Named Entites

2. Tracking Context

3. Learning User Preferences

Page 4: Learning User Preferences

Identifying Named Entities

• “Rialto is now open until 11pm”

• Facts/Opinions usually about a named entity

• Tools typically rely on punctuation, capitalization, formatting, grammar

• We developed criterion to identify topic-oriented words using occurrence stats

[Rennie & Jaakkola, SIGIR 2005]

Page 5: Learning User Preferences

Resolving Named Entites

• “They’re now open until 11pm”

• What does “they” refer to?

• Clustering– Group noun phrases that co-refer

• McCallum & Wellner (2005)– Excellent for proper nouns

• Our contribution: better modeling of non-proper nouns (incl. pronouns)

Page 6: Learning User Preferences

Tracking Context

• “The Swordfish was fabulous”– Indirect comment on restaurant.– Restaurant identifed by context.

• Use word statistics to find topic switches

• Contribution: new sentence clustering algorithm

Page 7: Learning User Preferences

Learning User Preferences

• Examples:– “I loved Rialto last night.”– “Overall, Oleana was worth the money”– “Radius wasn’t bad, but wasn’t great”– “Om was purely pretentious”

• Issues:1. Translate text to partial ordering or rating

2. Predict unobserved ratings

Page 8: Learning User Preferences

Preference Problems

• Single User w/ Item Features

• Multi-user, no features– Aka Collaborative Filtering

Page 9: Learning User Preferences

Single User, Item Features

-0.1-0.1+10 +5 0 0 +2

User Weights

+8 -4 +1 -7 -6 -3

Preference Scores

Capacity

Price

French?

New American?

Ethnic?

Formality

Location

10

Ta

ble

s

#9

Pa

rk

Lum

iere

Ta

njo

re

Ch

enn

ai

Rn

dzv

ous

30 90 60 80 40 80

30 60 50 30 20 40

1 0 1 0 0 0

0 1 0 0 0 1

0 0 0 1 1 0

2 4 3 1 0 2

2 3 1 2 0 2

FeatureValues 4=6

3=3

2=-2

1=-5

5

1

3

2

4

Ra

ting

s

Page 10: Learning User Preferences

Single User, Item Features

? ? ? ? ? ? ?

User Weights

? ? ? ? ? ?

Preference Scores

Capacity

Price

French?

New American?

Ethnic?

Formality

Location

10

Ta

ble

s

#9

Pa

rk

Lum

iere

Ta

njo

re

Ch

enn

ai

Rn

dzv

ous

30 90 60 80 40 80

30 60 50 30 20 40

1 0 1 0 0 0

0 1 0 0 0 1

0 0 0 1 1 0

2 4 3 1 0 2

2 3 1 2 0 2

FeatureValues

5 2 3 1 ? ?

Ratings

Page 11: Learning User Preferences

-2.5 1.4 -0.9 5.6 3.1 -1.8

-2.7 0.2 -4.2 2.1 0.2 -4.2

2.1 -2.5 1.4 -0.9 5.6 3.1

-1.8 -2.7 0.2 -4.2 2.1 -2.5

1.4 -0.9 5.6 3.1 -1.8 -2.7

0.2 -4.2 -1.4 0.7 3.4 -0.8

1.9 -2.2 4.7 2.6 -3.5 -2.1

Many Users, No Features

2 3 2 3 2 3

2 1 5 1 2 4

1 2 1 3 1 3

5 2 3 5 2 4

4 2 5 2 1 5

3 3 3 5 3 2

4 5 2 4 3 5

?

? ? ? ?

? ?

? ?

? ? ? ?

? ? ?

? ? ?

We

igh

ts

Features

Preference Scores Ratings

??

?

Page 12: Learning User Preferences

• Possible goals:– Predict missing entries– Cluster users or items

• Applications:– Movies, Books– Genetic Interaction– Network routing– Sports performance

Collaborative Filtering

2 3 2 3 2 3

2 1 5 1 2 4

1 2 1 3 1 3

5 2 3 5 2 4

4 2 5 2 1 5

3 3 3 5 3 2

4 5 2 4 3 5

use

rs

items

Page 13: Learning User Preferences

Outline

• Single User, Features– Loss functions, Convexity, Large Margin– Loss function for Ratings

• Many Users, No Features– Feature Selection, Rank, SVD– Regularization: tie together multiple tasks– Optimization: scale to large problems

• Extensions

Page 14: Learning User Preferences

This Talk: Contributions

• Implementation and systematic evaluation of loss functions for Single User prediction.

• Scaling Multi-user regularization to large (thousands of users/items) problems– Analysis of optimization

• Extensions– Hybrid: features + multiple users– Observation model & multiple ratings

Page 15: Learning User Preferences

Rating Classification

• n ordered classes

• Learn weight vector, thresholds

11

11 11

22 2

22 2

3

33 33

3

w

Page 16: Learning User Preferences

Loss Functions

0-1 Hinge Logistic

Margin Agreement Smooth Hinge Mod. Least Squares

Page 17: Learning User Preferences

Convexity

• Convex function => no local minima

• Set convex if all line segments within set

Page 18: Learning User Preferences

Convexity of Loss Functions

• 0-1 loss is not convex– Local minima, sensitive to small changes

• Convex Bound– Large margin solution with regularization– Stronger guarantees

Page 19: Learning User Preferences

Proportional Odds

• McCullagh introduced original rating model– Linear interaction: weights & features– Thresholds– Maximum likelihood

[McCullagh, 1980]

1 11

1 112

2 222 2

33

3 33

3

w

Page 20: Learning User Preferences

Immediate-Thresholds

1 2 3 4 5

[Shashua & Levin, 2003]

Page 21: Learning User Preferences

Some Errors are Better than Others

User:

System 1:

System 2:

Page 22: Learning User Preferences

Not a Bound on Absolute Diff.

1 2 3 4 5

Page 23: Learning User Preferences

All-Thresholds Loss

1 2 3 4 5[Srebro, Rennie & Jaakkola, NIPS 2004]

Page 24: Learning User Preferences

Experiments

Multi-Class

Imm-Thresh

All-Thresh p-value

MLS .7486 .7491 .6700 1.7e-18

Hinge .7433 .7628 .6702 6.6e-17

Logistic .7490 .7248 .6623 7.3e-22

Least Squares: 1.3368

[Rennie & Srebro, IJCAI 2005]

Page 25: Learning User Preferences

Many Users, No Features

2 3 2 3 2 3

2 1 5 1 2 4

1 2 1 3 1 3

5 2 3 5 2 4

4 2 5 2 1 5

3 3 3 5 3 2

4 5 2 4 3 5

?

? ? ? ?

? ?

? ?

? ? ? ?

? ? ?

? ? ?

-2.5 1.4 -0.9 5.6 3.1 -1.8

-2.7 0.2 -4.2 2.1 0.2 -4.2

2.1 -2.5 1.4 -0.9 5.6 3.1

-1.8 -2.7 0.2 -4.2 2.1 -2.5

1.4 -0.9 5.6 3.1 -1.8 -2.7

0.2 -4.2 -1.4 0.7 3.4 -0.8

1.9 -2.2 4.7 2.6 -3.5 -2.1

We

igh

ts

Features

Preference Scores Ratings

??

?

Page 26: Learning User Preferences

Background: Lp-norms

• L0: # non-zero entries: ||<0,2,0,3,4>||0 = 3

• L1: absolute value sum: ||<2,-2,1>||1 = 5

• L2: Euclidean length: ||<1,-1>||2 = 2

• General: ||v||p = (i |vi|p)1/p

Page 27: Learning User Preferences

Background: Feature Selection

• Objective: Loss + Regularization

L2 Squared L1

Page 28: Learning User Preferences

Singular Value Decomposition

• X=USV’– U,V: orthogonal (rotation)– S: diagonal, non-negative

• Eigenvalues of XX’=USV’VSU’=USSU’ are squared singular values of X

• Rank = ||s||0• SVD: used to obtain least-squares low-

rank approximation

Page 29: Learning User Preferences

Low Rank Matrix Factorization

V’U

×

¼X

rank k=

2 4 5 1 4 23 1 2 2 5 44 2 4 1 3 13 3 4 2 42 3 1 4 3 2

2 2 1 4 52 4 1 4 2 3

1 3 1 1 4 34 2 2 5 3 1

YY

Use SVD to findGlobal Optimum

Non-convexNo explicit soln.

• Sum-Squared Loss• Fully Observed Y• Classification Error Loss• Partially Observed Y

Page 30: Learning User Preferences

Low-Rank: Non-Convex Set

Rank 1Rank 1 Rank 2

Page 31: Learning User Preferences

Trace Norm Regularization

[Fazel et al., 2001]

Trace Norm: sum of singular values

y

Page 32: Learning User Preferences

Many Users, No Features

2 3 2 3 2 3

2 1 5 1 2 4

1 2 1 3 1 3

5 2 3 5 2 4

4 2 5 2 1 5

3 3 3 5 3 2

4 5 2 4 3 5

-2.5 1.4 -0.9 5.6 3.1 -1.8

-2.7 0.2 -4.2 2.1 0.2 -4.2

2.1 -2.5 1.4 -0.9 5.6 3.1

-1.8 -2.7 0.2 -4.2 2.1 -2.5

1.4 -0.9 5.6 3.1 -1.8 -2.7

0.2 -4.2 -1.4 0.7 3.4 -0.8

1.9 -2.2 4.7 2.6 -3.5 -2.1

We

igh

ts

Features

Preference Scores Ratings

U

V’

X Y

Page 33: Learning User Preferences

Max Margin Matrix Factorization

• Convex function of X and • Low rank in X

All-Thresholds Loss Trace Norm

[Srebro, Rennie & Jaakkola, NIPS 2004]

Page 34: Learning User Preferences

Properties of the Trace Norm

The factorization: US, VS minimizes both quantities

Page 35: Learning User Preferences

Factorized Optimization

• Factorized Objective (tight bound):

• Gradient descent: O(n3) per round

• Stationary points, but no local minima

[Rennie & Srebro, ICML 2005]

Page 36: Learning User Preferences

Collaborative Prediction Results

size, sparsity:

EachMovie36656x1648, 96%

MovieLens6040x3952, 96%

Algorithm

Weak Error

Strong Error

Weak Error

Strong Error

URP .8596 .8859 .6946 .7104

Attitude .8787 .8845 .6912 .7000

MMMF .8548 .8439 .6650 .6725

[URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]

Page 37: Learning User Preferences

Extensions

• Multi-user + Features

• Observation model– Predict which restaurants a user will rate, and– The rating she will make

• Multiple ratings per user/restaurant– E.g. Food, Service and Décor ratings

• SVD Parameterization

Page 38: Learning User Preferences

Fixed Features

Learned Features

Multi-User + Features

• Feature parameters (V):– Some are fixed– Some are learned

• Learn weights (U) for all features

• Fixed part of V does not affect regularization

V’

Page 39: Learning User Preferences

Observation Model

• Common assumption: ratings observed at random

• Restaurant selection:– Geography, popularity, price, food style

• Remove bias: model observation process

Page 40: Learning User Preferences

Observation Model

• Model as binary classification

• Add binary classification loss

• Tie together rating and observation models

X=UXV’ W=UWV’

Page 41: Learning User Preferences

Multiple Ratings

• Users may provide multiple ratings:– Service, Décor, Food

• Add in loss functions

• Stack parameter matrices for regularization

Page 42: Learning User Preferences

SVD Parameterization

• Too many parameters: UAA-1V’=X is another factorization of X

• Alternate: U,S,V– U,V orthogonal, S diagonal

• Advantages:– Not over-parameterized– Exact objective (not a bound)– No stationary points

Page 43: Learning User Preferences

Summary

• Loss function for ratings

• Regularization for multiple users

• Scaled MMMF to large problems (e.g. > 1000x1000)

• Trace norm: widely applicable

• Extensions

Code: http://people.csail.mit.edu/jrennie/matlab

Page 44: Learning User Preferences

Thanks!

• Helen, for supporting me for 7.5 years!• Tommi Jaakkola, for answering all my questions and

directing me to the “end”!• Mike Collins and Tommy Poggio for add’l guidance.• Nati Srebro & John Barnett for endless valuable

discussions and ideas.• Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-

Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommi’s reading group for paper discussions, conference trips and feedback on my talks.

• Many, many others who have helped me along the way!

Page 45: Learning User Preferences

Low-Rank Optimization

Low-Rank Minimum

Objective Minimum

Low-RankLocal

MinimumLow-Rank

Low-Rank