steffen rendle, research scientist, google at mlconf sf

82
Factorization Models & Polynomial Regression Factorization Machines Applications Summary Factorization Machines Steffen Rendle Current affiliation: Google Inc. Work was done at University of Konstanz MLConf, November 14, 2014 Steffen Rendle 1 / 53

Upload: sessionsevents

Post on 04-Jul-2015

324 views

Category:

Technology


0 download

DESCRIPTION

Abstract: Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.

TRANSCRIPT

Page 1: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machines

Steffen Rendle

Current affiliation: Google Inc.Work was done at University of Konstanz

MLConf, November 14, 2014

Steffen Rendle 1 / 53

Page 2: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial RegressionFactorization ModelsLinear/ Polynomial RegressionComparison

Factorization Machines

Applications

Summary

Steffen Rendle 2 / 53

Page 3: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization

Example for data: Matrix Factorization:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

Y := W H t , W ∈ R|U|×k ,H ∈ R|I |×k

y(u, i) = yu,i =k∑

f =1

wu,f hi,f = 〈wu,hi 〉

k is the rank of the reconstruction.

Steffen Rendle 3 / 53

Page 4: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization

Example for data: Matrix Factorization:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

Y := W H t , W ∈ R|U|×k ,H ∈ R|I |×k

y(u, i) = yu,i =k∑

f =1

wu,f hi,f = 〈wu,hi 〉

k is the rank of the reconstruction.

Steffen Rendle 3 / 53

Page 5: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization & ExtensionsExample for data: Examples for models:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

y MF(u, i) :=k∑

f =1

vu,f vi,f = 〈vu, vi 〉

y SVD++(u, i) :=

⟨vu +

∑j∈N(u)

vj , vi

y Fact-KNN(u, i) :=1

|R(u)|∑

j∈R(u)

ru,j〈vi , vj〉

RatingMatrix

time

y timeSVD(u, i , t) := 〈vu + vu,t , vi 〉

y timeTF(u, i , t) :=k∑

f =1

vu,f vi,f vt,f

. . .

Steffen Rendle 4 / 53

Page 6: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization & ExtensionsExample for data: Examples for models:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

y MF(u, i) :=k∑

f =1

vu,f vi,f = 〈vu, vi 〉

y SVD++(u, i) :=

⟨vu +

∑j∈N(u)

vj , vi

y Fact-KNN(u, i) :=1

|R(u)|∑

j∈R(u)

ru,j〈vi , vj〉

RatingMatrix

time

y timeSVD(u, i , t) := 〈vu + vu,t , vi 〉

y timeTF(u, i , t) :=k∑

f =1

vu,f vi,f vt,f

. . .

Steffen Rendle 4 / 53

Page 7: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization & ExtensionsExample for data: Examples for models:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

y MF(u, i) :=k∑

f =1

vu,f vi,f = 〈vu, vi 〉

y SVD++(u, i) :=

⟨vu +

∑j∈N(u)

vj , vi

y Fact-KNN(u, i) :=1

|R(u)|∑

j∈R(u)

ru,j〈vi , vj〉

RatingMatrix

time

y timeSVD(u, i , t) := 〈vu + vu,t , vi 〉

y timeTF(u, i , t) :=k∑

f =1

vu,f vi,f vt,f

. . .

Steffen Rendle 4 / 53

Page 8: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Tensor Factorization

Example for data: Examples for models:

Triples of Subject, Predicate, Object

yPARAFAC(s, p, o) :=k∑

f =1

vs,f vp,f vo,f

yPITF(s, p, o) := 〈vs , vp〉+ 〈vs , vo〉+ 〈vp, vo〉. . .

Steffen Rendle 5 / 53

[illustration from Drumond et al. 2012]

Page 9: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Sequential Factorization Models

Example for data: Examples for models:

ce

c cb a b

a ca

d c ee ?

?

?

?User 1

User 2

User 3

User 4

Bt­2

Bt­1

Bt

Bt­3

ba

y FMC(u, i , t) :=∑

l∈Bt−1

〈vi , vl〉

y FPMC(u, i , t) := 〈vu, vi 〉+∑

l∈Bt−1

〈vi , vl〉

. . .

Steffen Rendle 6 / 53

Page 10: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Models: Discussion

I AdvantagesI Can estimate interactions between two (or more) variables even if

the cross is not observed.I E.g. user × movie, current product × next product, user × query ×

url, . . .

I DownsidesI Factorization models are usually build specifically for each problem.I Learning algorithms and implementations are tailored to individual

models.

Steffen Rendle 7 / 53

Page 11: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Models: Discussion

I AdvantagesI Can estimate interactions between two (or more) variables even if

the cross is not observed.I E.g. user × movie, current product × next product, user × query ×

url, . . .

I DownsidesI Factorization models are usually build specifically for each problem.I Learning algorithms and implementations are tailored to individual

models.

Steffen Rendle 7 / 53

Page 12: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial RegressionFactorization ModelsLinear/ Polynomial RegressionComparison

Factorization Machines

Applications

Summary

Steffen Rendle 8 / 53

Page 13: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Data and Variable Representation

Many standard ML approaches work with real valued feature vectors asinput. It allows to represent, e.g.:

I any number of variables

I categorical domains by using dummy indicator variables

I numerical domains

I set-categorical domains by using dummy indicator variables

Using this representation allows to apply a wide variety of standardmodels (e.g. linear regression, SVM, etc.).

Steffen Rendle 9 / 53

Page 14: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Linear Regression

I Let x ∈ Rp be an input vector with p predictor variables.

I Model equation:

y(x) := w0 +

p∑i=1

wi xi

I Model parameters:

w0 ∈ R, w ∈ Rp

O(p) model parameters.

Steffen Rendle 10 / 53

Page 15: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Polynomial Regression

I Let x ∈ Rp be an input vector with p predictor variables.

I Model equation (degree 2):

y(x) := w0 +

p∑i=1

wixi +

p∑i=1

p∑j≥i

wi,j xi xj

I Model parameters:

w0 ∈ R, w ∈ Rp, W ∈ Rp×p

O(p2) model parameters.

Steffen Rendle 11 / 53

Page 16: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial RegressionFactorization ModelsLinear/ Polynomial RegressionComparison

Factorization Machines

Applications

Summary

Steffen Rendle 12 / 53

Page 17: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Representation: Matrix/ Tensor vs. Feature Vectors

Matrix/ Tensor data can be represented by feature vectors:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r

Steffen Rendle 13 / 53

Page 18: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Representation: Matrix/ Tensor vs. Feature Vectors

Matrix/ Tensor data can be represented by feature vectors:

5 3 1 ? ...? ? 4 5 ...1 ? 5 ? ...... ... ... ... ...

TI NH SW ST ...

Movie

A

B

C

...

Use

r ⇔

# User Movie Rating

1 Alice Titanic 5

2 Alice Notting Hill 3

3 Alice Star Wars 1

4 Bob Star Wars 4

5 Bob Star Trek 5

6 Charlie Titanic 1

7 Charlie Star Wars 5

. . . . . . . . . . . .

Steffen Rendle 13 / 53

Page 19: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Representation: Matrix/ Tensor vs. Feature Vectors

Matrix/ Tensor data can be represented by feature vectors:

# User Movie Rating

1 Alice Titanic 5

2 Alice Notting Hill 3

3 Alice Star Wars 1

4 Bob Star Wars 4

5 Bob Star Trek 5

6 Charlie Titanic 1

7 Charlie Star Wars 5

. . . . . . . . . . . .

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

5

3

4

5

1

5

Target y

y(1)

y(2)

y(4)

y(5)

y(6)

y(7)

1 y(3)

Steffen Rendle 13 / 53

Page 20: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

5

3

4

5

1

5

Target y

y(1)

y(2)

y(4)

y(5)

y(6)

y(7)

1 y(3)

Applying regression models to this data leads to:

Linear regression: y(x) = w0 + wu + wi

Polynomial regression: y(x) = w0 + wu + wi + wu,i

Matrix factorization: y(u, i) = 〈wu,hi 〉

Steffen Rendle 14 / 53

Page 21: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

5

3

4

5

1

5

Target y

y(1)

y(2)

y(4)

y(5)

y(6)

y(7)

1 y(3)

Applying regression models to this data leads to:

Linear regression: y(x) = w0 + wu + wi

Polynomial regression: y(x) = w0 + wu + wi + wu,i

Matrix factorization: y(u, i) = 〈wu,hi 〉

Steffen Rendle 14 / 53

Page 22: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

5

3

4

5

1

5

Target y

y(1)

y(2)

y(4)

y(5)

y(6)

y(7)

1 y(3)

Applying regression models to this data leads to:

Linear regression: y(x) = w0 + wu + wi

Polynomial regression: y(x) = w0 + wu + wi + wu,i

Matrix factorization: y(u, i) = 〈wu,hi 〉

Steffen Rendle 14 / 53

Page 23: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

5

3

4

5

1

5

Target y

y(1)

y(2)

y(4)

y(5)

y(6)

y(7)

1 y(3)

Applying regression models to this data leads to:

Linear regression: y(x) = w0 + wu + wi

Polynomial regression: y(x) = w0 + wu + wi + wu,i

Matrix factorization: y(u, i) = 〈wu,hi 〉

Steffen Rendle 14 / 53

Page 24: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.

I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 25: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 26: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 27: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 28: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 29: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Application to Sparse Feature Vectors

For the data of the example:

I Linear regression has no user-item interaction.I ⇒ Linear regression is not expressive enough.

I Polynomial regression includes pairwise interactions but cannotestimate them from the data.

I n� p2: number of cases is much smaller than number of modelparameters.

I Max.-likelihood estimator for a pairwise effect is:

wi,j =

{y − w0 − wi − wu, if (i , j , y) ∈ S .

not defined, else

I Polynomial regression cannot generalize to any unobserved pairwiseeffect.

Steffen Rendle 15 / 53

Page 30: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization MachinesModelExamplesPropertiesLearninglibFM Software

Applications

Summary

Steffen Rendle 16 / 53

Page 31: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machine (FM)

I Let x ∈ Rp be an input vector with p predictor variables.

I Model equation (degree 2):

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Model parameters:

w0 ∈ R, w ∈ Rp, V ∈ Rp×k

Steffen Rendle 17 / 53

[Rendle 2010, Rendle 2012]

Page 32: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machine (FM)I Let x ∈ Rp be an input vector with p predictor variables.I Model equation (degree 2):

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Model parameters:

w0 ∈ R, w ∈ Rp, V ∈ Rp×k

Compared to Polynomial regression:I Model equation (degree 2):

y(x) := w0 +

p∑i=1

wixi +

p∑i=1

p∑j≥i

wi,j xi xj

I Model parameters:

w0 ∈ R, w ∈ Rp, W ∈ Rp×p

Steffen Rendle 17 / 53

[Rendle 2010, Rendle 2012]

Page 33: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machine (FM)

I Let x ∈ Rp be an input vector with p predictor variables.

I Model equation (degree 2):

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Model parameters:

w0 ∈ R, w ∈ Rp, V ∈ Rp×k

Steffen Rendle 17 / 53

[Rendle 2010, Rendle 2012]

Page 34: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machine (FM)

I Let x ∈ Rp be an input vector with p predictor variables.

I Model equation (degree 3):

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

+

p∑i=1

p∑j>i

p∑l>j

k∑f =1

v(3)i,f v

(3)j,f v

(3)l,f xi xj xl

I Model parameters:

w0 ∈ R, w ∈ Rp, V ∈ Rp×k , V(3) ∈ Rp×k

Steffen Rendle 17 / 53

[Rendle 2010, Rendle 2012]

Page 35: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorization Machines: Discussion

I FMs work with real valued input.

I FMs include variable interactions like polynomial regression.

I Model parameters for interactions are factorized.

I Number of model parameters is O(k p) (instead of O(p2) for poly.regr.).

Steffen Rendle 18 / 53

Page 36: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization MachinesModelExamplesPropertiesLearninglibFM Software

Applications

Summary

Steffen Rendle 19 / 53

Page 37: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Matrix Factorization and Factorization Machines

Two categorical variables encoded with real valued predictor variables:

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

With this data, the FM is identical to MF with biases1:

y(x) = w0 + wu + wi + 〈vu, vi 〉︸ ︷︷ ︸MF

1libFM, k = 128, MCMC inference, Netflix RMSE=0.8937Steffen Rendle 20 / 53

Page 38: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

RDF-Triple Prediction with Factorization Machines

Three categorical variables encoded with real valued predictor variables:

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...S1 S2 S3 ... P1 P2 P3 P4 ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

1

0

0

0

1

0

1

0

0

0

0

0

1

1

0

0

0

0

0

0

...

...

...

...

...

0 0 0 1 ...O1 O2 O3 O4 ...

Subject Predicate Object

1 0 0 ... 0 0 1 0 ...x(3) 0 0 0 1 ...

With this data, the FM is equivalent to the PITF model:

y(x) := w0 + ws + wp + wo + 〈vs , vp〉+ 〈vs , vo〉+ 〈vp, vo〉

Steffen Rendle 21 / 53

[PITF: Rendle et al. 2010, WSDM Best Student Paper, ECML 2009 Best DC Award]

Page 39: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Time with Factorization Machines

Two categorical variables and time as linear predictor:

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

0.2

0.6

0.3

0.5

0.1

0.8

0.61

Time

The FM model would correspond to:

y(x) := w0 + wi + wu + t wtime + 〈vu, vi 〉+ t 〈vu, vtime〉+ t 〈vi , vtime〉

Steffen Rendle 22 / 53

Page 40: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Time with Factorization Machines

Two categorical variables and time discretized in bins (b(t)):

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

User Movie

1 0 0 ... 0 0 1 0 ...x(3)

1

0

1

0

1

0

0

Time

0

1

0

1

0

0

1

0

0

0

0

0

1

0

T1 T2 T3

The FM model would correspond to:2

y(x) := w0 + wi + wu + wb(t) + 〈vu, vi 〉+ 〈vu, vb(t)〉+ 〈vi , vb(t)〉

2libFM, k = 128, MCMC inference, Netflix RMSE=0.8873Steffen Rendle 22 / 53

Page 41: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

SVD++

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...A B C ... TI NH SW ST ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

0.3

0.3

0

0

0.5

0.3

0.3

0

0

0

0.3

0.3

0.5

0.5

0.5

0

0

0.5

0.5

0

...

...

...

...

...

0.5 0 0.5 0 ...TI NH SW ST ...

User Movie Other Movies rated

1 0 0 ... 0 0 1 0 ...x(3) 0.3 0.3 0.3 0 ...

With this data, the FM3 is identical to:

y(x) =

SVD++︷ ︸︸ ︷w0 + wu + wi + 〈vu, vi 〉+

1√|Nu|

∑l∈Nu

〈vi , vl〉

+1√|Nu|

∑l∈Nu

wl + 〈vu, vl〉+1√|Nu|

∑l′∈Nu ,l′>l

〈vl , v′l 〉

3libFM, k = 128, MCMC inference, Netflix RMSE=0.8865

Steffen Rendle 23 / 53

[Koren, 2008]

Page 42: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Factorizing Personalized Markov Chains (FPMC)Two categorical variables (u,i), one set categorical (Bt−1):

1 0 0 ...

1 0 0 ...

0 1 0 ...

0 1 0 ...

0 0 1 ...

1

0

0

0

1

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

...

...

...

...

...

0 0 1 ... 0 0 1 0 ...u1 u2 u3 ... A B C D ...

x(1)

x(2)

x(4)

x(5)

x(6)

x(7)

Feature vector x

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

...

...

...

...

...

1 0 0 0 ...

User ProductA B C D ...

Last Basket

1 0 0 ... 0 0 1 0 ...x(3) 0.5 0.5 0 0 ...

A,Bu1 C

Cu2 D

Au3 C

Sequential Baskets

FM is equivalent to

y(x) := w0 + wu + wi +1

|Bt−1|∑

j∈Bt−1

wj + 〈vu, vi 〉+1

|Bt−1|∑

j∈Bt−1

〈vi , vj〉+ ...

Steffen Rendle 24 / 53

[Rendle et al. 2010, WWW Best Paper]

Page 43: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization MachinesModelExamplesPropertiesLearninglibFM Software

Applications

Summary

Steffen Rendle 25 / 53

Page 44: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Computation Complexity

Factorization Machine model equation:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Trivial computation: O(p2 k)

I Efficient computation can be done in: O(p k)

I Making use of many zeros in x even in: O(Nz(x) k), where Nz(x) isthe number of non-zero elements in vector x.

Steffen Rendle 26 / 53

Page 45: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Computation Complexity

Factorization Machine model equation:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Trivial computation: O(p2 k)

I Efficient computation can be done in: O(p k)

I Making use of many zeros in x even in: O(Nz(x) k), where Nz(x) isthe number of non-zero elements in vector x.

Steffen Rendle 26 / 53

Page 46: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Computation Complexity

Factorization Machine model equation:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

I Trivial computation: O(p2 k)

I Efficient computation can be done in: O(p k)

I Making use of many zeros in x even in: O(Nz(x) k), where Nz(x) isthe number of non-zero elements in vector x.

Steffen Rendle 26 / 53

Page 47: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Efficient Computation

The model equation of an FM can be computed in O(p k).

Proof:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

= w0 +

p∑i=1

wi xi +1

2

k∑f =1

( p∑i=1

xi vi,f

)2

−p∑

i=1

(xi vi,f )2

I In the sums over i , only non-zero xi elements have to be summed up⇒ O(Nz(x) k).

I (The complexity of polynomial regression is O(Nz(x)2).)

Steffen Rendle 27 / 53

Page 48: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Efficient Computation

The model equation of an FM can be computed in O(p k).

Proof:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

= w0 +

p∑i=1

wi xi +1

2

k∑f =1

( p∑i=1

xi vi,f

)2

−p∑

i=1

(xi vi,f )2

I In the sums over i , only non-zero xi elements have to be summed up⇒ O(Nz(x) k).

I (The complexity of polynomial regression is O(Nz(x)2).)

Steffen Rendle 27 / 53

Page 49: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Efficient Computation

The model equation of an FM can be computed in O(p k).

Proof:

y(x) := w0 +

p∑i=1

wi xi +

p∑i=1

p∑j>i

〈vi , vj〉 xi xj

= w0 +

p∑i=1

wi xi +1

2

k∑f =1

( p∑i=1

xi vi,f

)2

−p∑

i=1

(xi vi,f )2

I In the sums over i , only non-zero xi elements have to be summed up⇒ O(Nz(x) k).

I (The complexity of polynomial regression is O(Nz(x)2).)

Steffen Rendle 27 / 53

Page 50: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Multilinearity

FMs are multilinear:

∀θ ∈ Θ = {w0,w1, . . . ,wp, v1,1, . . . , vp,k} : y(x, θ) = h(θ)(x) θ + g(θ)(x)

where g(θ) and h(θ) do not depend on the value of θ.

E.g. for second order effects (θ = vl,f ):

y(x, vl,f ) :=

g(vl,f )(x)︷ ︸︸ ︷w0 +

p∑i=1

wi xi +

p∑i=1

p∑j=i+1

k∑f ′=1

(f ′ 6=f )∨(l 6∈{i,j})

vi,f ′ vj,f ′ xi xj

+ vl,f xl∑

i=1,i 6=l

vi,f xi︸ ︷︷ ︸h(vl,f )(x)

Steffen Rendle 28 / 53

Page 51: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Multilinearity

FMs are multilinear:

∀θ ∈ Θ = {w0,w1, . . . ,wp, v1,1, . . . , vp,k} : y(x, θ) = h(θ)(x) θ + g(θ)(x)

where g(θ) and h(θ) do not depend on the value of θ.

E.g. for second order effects (θ = vl,f ):

y(x, vl,f ) :=

g(vl,f )(x)︷ ︸︸ ︷w0 +

p∑i=1

wi xi +

p∑i=1

p∑j=i+1

k∑f ′=1

(f ′ 6=f )∨(l 6∈{i,j})

vi,f ′ vj,f ′ xi xj

+ vl,f xl∑

i=1,i 6=l

vi,f xi︸ ︷︷ ︸h(vl,f )(x)

Steffen Rendle 28 / 53

Page 52: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization MachinesModelExamplesPropertiesLearninglibFM Software

Applications

Summary

Steffen Rendle 29 / 53

Page 53: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Learning

Using these properties, learning algorithms can be developed:

I L2-regularized regression and classification:I Stochastic gradient descent [Rendle, 2010]I Alternating least squares/ Coordinate Descent [Rendle et al., 2011,

Rendle 2012]I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al.

2011, Rendle 2012]

I L2-regularized ranking:I Stochastic gradient descent [Rendle, 2010]

All the proposed learning algorithms have a runtime of O(k Nz(X ) i),where i is the number of iterations and Nz(X ) the number of non-zeroelements in the design matrix X .

Steffen Rendle 30 / 53

Page 54: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Stochastic Gradient Descent (SGD)

I For each training case (x, y) ∈ S , SGD updates the FM modelparameter θ using:

θ′ = θ − α((y(x)− y)h(θ)(x) + λ(θ)θ

)I α is the learning rate / step size.

I λ(θ) is the regularization value of the parameter θ.

I SGD can easily be applied to other loss functions.

Steffen Rendle 31 / 53

[Rendle, 2010]

Page 55: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Coordinate Descent (CD)

I CD updates each FM model parameter θ using:

θ′ =

∑(x,y)∈S

(y − g(θ)(x)

)h(θ)(x)∑

(x,y)∈S h2(θ)(x) + λ(θ)

I Using caches of intermediate results, the runtime for updating allmodel parameters is O(k Nz(X )).

I CD can be extended to classification [Rendle, 2012].

Steffen Rendle 32 / 53

[Rendle et al., 2011]

Page 56: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Gibbs Sampling (MCMC)

I Gibbs sampling with a block for each FM model parameter θ:

θ|S ,Θ \ {θ} ∼ N

(α∑

(x,y)∈S(y − g(θ)(x)

)h(θ)(x)

α∑

(x,y)∈S h2(θ)(x) + λ(θ)

,

1

α∑

(x,y)∈S h2(θ)(x) + λ(θ)

)I Mean is the same as for CD ⇒ computational complexity is alsoO(k Nz(X )).

I MCMC can be extended to classification using link functions.

Steffen Rendle 33 / 53

[Freudenthaler et al. 2011, Rendle 2012]

Page 57: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Learning Regularization Values

y i

w j

w0

w ,w

i=1,...,n

x ij

v j

v ,v

j=1,...,p

y i y i

w j

w0

w

i=1,...,n

x ij

v j

j=1,...,p

w0,w0

y i

0 ,0

,0 ,0

w0,w0

wv v

Standard FM with priors. Two level FM with hyperpriors.

Steffen Rendle 34 / 53

[Freudenthaler et al., 2011]

Page 58: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization MachinesModelExamplesPropertiesLearninglibFM Software

Applications

Summary

Steffen Rendle 35 / 53

Page 59: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

libFM Software

libFM is an implementation of FMs

I Model: second-order FMs

I Learning/ inference: SGD, ALS, MCMC

I Classification and regression

I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al],SVMlight [Joachims].

I Supports variable grouping.

I Open source: GPLv3.

Steffen Rendle 36 / 53

[http://www.libfm.org/]

Page 60: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 37 / 53

Page 61: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

(Context-aware) Rating Prediction

I Main variables:I User ID (categorical)I Item ID (categorical)

I Additional variables:I timeI moodI user profileI item meta dataI . . .

I Examples: Netflix prize, Movielens, KDDCup 2011

+

User Time Mood

+♪ +

Song

Steffen Rendle 38 / 53

Page 62: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Netflix Prize

Public Leaderboard

Netflix Prize: Prediction Error

RM

S E

rror

0.86

0.87

0.88

0.89

0.90

user

, mov

ie

user

, mov

ie, d

ay

user

, mov

ie, i

mpl

.

user

, mov

ie,

day,

impl

.

user

, mov

ie,

day,

impl

.,fr

eq, l

in. d

ay

$1M Prize

SGD MatrixFactorization

I k = 128 factors, 512 MCMC samples (no burnin phase, initializationfrom random)

I MCMC inference (no hyperparameters (learning rate, regularization)to specify)

Steffen Rendle 39 / 53

Page 63: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Netflix PrizeMethod (Name) Ref. Learning Method k Quiz RMSE

Models using user ID and item IDProbabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211Matrix Factorization [6] Variational Bayes 30 *0.9141Matchbox [15] Variational Bayes 50 *0.9100ALS-MF [7] ALS 100 0.9079ALS-MF [7] ALS 1000 *0.9018SVD/ MF [3] SGD 100 0.9025SVD/ MF [3] SGD 200 *0.9009Bayesian Probablistic Matrix Factorization(BPMF)

[13] MCMC 150 0.8965

Bayesian Probablistic Matrix Factorization(BPMF)

[13] MCMC 300 *0.8954

FM, pred. var: user ID, movie ID - MCMC 128 0.8937

Models using implicit feedbackProbabilistic Matrix Factorization with Cons-traints

[14] Batch GD 30 *0.9016

SVD++ [3] SGD 100 0.8924SVD++ [3] SGD 200 *0.8911BSRM/F [18] MCMC 100 0.8926BSRM/F [18] MCMC 400 *0.8874FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865

Steffen Rendle 40 / 53

Page 64: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Netflix Prize

Method (Name) Ref. Learning Method k Quiz RMSE

Models using time informationBayesian Probabilistic Tensor Factorization(BPTF)

[17] MCMC 30 *0.9044

FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873

Models using time and implicit feedbacktimeSVD++ [5] SGD 100 0.8805timeSVD++ [5] SGD 200 *0.8799FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794

Assorted modelsBRISMF/UM NB corrected [16] SGD 1000 *0.8904BMFSI plus side information [8] MCMC 100 *0.8875timeSVD++ plus frequencies [4] SGD 200 0.8777timeSVD++ plus frequencies [4] SGD 2000 *0.8762FM, pred. var: user ID, movie ID, day, impl.,freq., lin. day

- MCMC 128 0.8779

FM, pred. var: user ID, movie ID, day, impl.,freq., lin. day

- MCMC 256 0.8771

Steffen Rendle 40 / 53

Page 65: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 41 / 53

Page 66: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Link Prediction in Social Networks

I Main variables:I Actor A IDI Actor B ID

I Additional variables:I profilesI actionsI . . .

Actor BActor A

+

Steffen Rendle 42 / 53

Page 67: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

KDDCup 2012: Track 1

Public Leaderboard Private Leaderboard

KDDCup 2012 Track 1: Prediction Quality

Mea

n A

vera

ge P

reci

sion

@3

0.32

0.34

0.36

0.38

0.40

0.42

none

gend

er, a

ge, .

..

keyw

ords

frie

nds

all

none

gend

er, a

ge, .

..

keyw

ords

frie

nds

all

Top 1

Top 5

Top 10

Top 100

I k = 22 factors, 512 MCMC samples (no burnin phase, initializationfrom random)

I MCMC inference (no hyperparameters (learning rate, regularization)to specify)

Steffen Rendle 43 / 53

[Awarded 2nd place (out of 658 teams)]

Page 68: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 44 / 53

Page 69: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Clickthrough Prediction

I Main variables:I User IDI Query IDI Ad/ Link ID

I Additional variables:I query tokensI user profileI . . .

+

User Ad/ LinkQuery

keyword... +Link 1

Link 2

Link 3

Steffen Rendle 45 / 53

Page 70: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

KDDCup 2012: Track 2

Model Inference wAUC (public) wAUC (private)

ID-based model (k = 0) SGD 0.78050 0.78086Attribute-based model (k = 8) MCMC 0.77409 0.77555Mixed model (k = 8) SGD 0.79011 0.79321Final ensemble n/a 0.79857 0.80178

Ensemble

I Rank positions (not predicted clickthrough rates) are used.

I The MCMC attribute-based model and different variations of theSGD models are included..

Steffen Rendle 46 / 53

[Awarded 3rd place (out of 171 teams)]

Page 71: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 47 / 53

Page 72: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

ECML/PKDD Discovery Challenge 2013

I Problem: Recommend given names.

I Main variables:I User IDI Name ID

I Additional variables:I session infoI string representation for each nameI . . .

I FM approach won 1st place (online track) and 2nd (offline track).

Steffen Rendle 48 / 53

Page 73: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 49 / 53

Page 74: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Student Performance Prediction

I Main variables:I Student IDI Question ID

I Additional variables:I question hierarchyI sequence of questionsI skills requiredI . . .

I Examples: KDDCup 2010, Grockit Challenge4 (FM placed 1st/241)

+

Student Question

?

4http://www.kaggle.com/c/WhatDoYouKnow

Steffen Rendle 50 / 53

Page 75: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Outline

Factorization Models & Polynomial Regression

Factorization Machines

ApplicationsRecommender SystemsLink Prediction in Social NetworksClickthrough PredictionPersonalized RankingStudent Performance PredictionKaggle Competitions

Summary

Steffen Rendle 51 / 53

Page 76: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Kaggle Competitions

FMs have been successfully applied to several Kaggle competitions:

I Criteon Display Advertising Challenge: 1st place (team ’3 idiots’).

I Blue Book for Bulldozers: 1st place (team ’Leustagos & Titericz’).

I EMI Music Data Science Hackathon: 2nd place (team ’lns’).

Steffen Rendle 52 / 53

Page 77: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Summary

I Factorization machines combine linear/polynomial regression withfactorization models.

I Feature interactions are learned with a low rank representation.

I Estimation of unobserved interactions is possible.

I Factorization machines can be computed efficiently and have highprediction quality.

Steffen Rendle 53 / 53

Page 78: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

L. Drumond, S. Rendle, and L. Schmidt-Thieme.Predicting rdf triples in incomplete knowledge bases with tensorfactorization.In Proceedings of the 27th Annual ACM Symposium on AppliedComputing, SAC ’12, pages 326–331, New York, NY, USA, 2012.ACM.

C. Freudenthaler, L. Schmidt-Thieme, and S. Rendle.Bayesian factorization machines.In NIPS workshop on Sparse Representation and Low-rankApproximation, 2011.

Y. Koren.Factorization meets the neighborhood: a multifaceted collaborativefiltering model.In KDD ’08: Proceeding of the 14th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 426–434,New York, NY, USA, 2008. ACM.

Y. Koren.The bellkor solution to the netflix grand prize.2009.

Steffen Rendle 53 / 53

Page 79: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Y. Koren.Collaborative filtering with temporal dynamics.In KDD ’09: Proceedings of the 15th ACM SIGKDD internationalconference on Knowledge discovery and data mining, pages 447–456,New York, NY, USA, 2009. ACM.

Y. J. Lim and Y. W. Teh.Variational Bayesian approach to movie rating prediction.In Proceedings of KDD Cup and Workshop, 2007.

I. Pilaszy, D. Zibriczky, and D. Tikk.Fast als-based matrix factorization for explicit and implicit feedbackdatasets.In RecSys ’10: Proceedings of the fourth ACM conference onRecommender systems, pages 71–78, New York, NY, USA, 2010.ACM.

I. Porteous, A. Asuncion, and M. Welling.Bayesian matrix factorization with side information and dirichletprocess mixtures.In Proceedings of the Twenty-Fourth AAAI Conference on ArtificialIntelligence, AAAI 2010, pages 563–568, 2010.

Steffen Rendle 53 / 53

Page 80: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

S. Rendle.Factorization machines.In Proceedings of the 2010 IEEE International Conference on DataMining, ICDM ’10, pages 995–1000, Washington, DC, USA, 2010.IEEE Computer Society.

S. Rendle.Factorization machines with libFM.ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May 2012.

S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme.Factorizing personalized markov chains for next-basketrecommendation.In WWW ’10: Proceedings of the 19th international conference onWorld wide web, pages 811–820, New York, NY, USA, 2010. ACM.

S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme.Fast context-aware recommendations with factorization machines.In Proceedings of the 34th ACM SIGIR Conference on Reasearch andDevelopment in Information Retrieval. ACM, 2011.

R. Salakhutdinov and A. Mnih.

Steffen Rendle 53 / 53

Page 81: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

Bayesian probabilistic matrix factorization using Markov chain MonteCarlo.In Proceedings of the 25th international conference on Machinelearning, ICML ’08, pages 880–887, New York, NY, USA, 2008.ACM.

R. Salakhutdinov and A. Mnih.Probabilistic matrix factorization.In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances inNeural Information Processing Systems 20, pages 1257–1264,Cambridge, MA, 2008. MIT Press.

D. H. Stern, R. Herbrich, and T. Graepel.Matchbox: large scale online bayesian recommendations.In Proceedings of the 18th international conference on World wideweb, WWW ’09, pages 111–120, New York, NY, USA, 2009. ACM.

G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk.Scalable collaborative filtering approaches for large recommendersystems.J. Mach. Learn. Res., 10:623–656, June 2009.

Steffen Rendle 53 / 53

Page 82: Steffen Rendle, Research Scientist, Google at MLconf SF

Factorization Models & Polynomial Regression Factorization Machines Applications Summary

L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell.Temporal collaborative filtering with bayesian probabilistic tensorfactorization.In Proceedings of the SIAM International Conference on DataMining, pages 211–222. SIAM, 2010.

S. Zhu, K. Yu, and Y. Gong.Stochastic relational models for large-scale dyadic data usingMCMC.In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors,Advances in Neural Information Processing Systems 21, pages1993–2000, 2009.

Steffen Rendle 53 / 53