scalable coordinate descent approaches to parallel matrix factorization for recommender systems

34
発表資料 Takuya Makino Saturday, March 23, 13

Upload: tma15

Post on 13-Aug-2015

2.919 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

発表資料Takuya Makino

Saturday, March 23, 13

Page 2: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

紹介する論文

• Scalable Coordinate Descent Approached to Parallel Matrix Factorization for Recommender Systems (ICDM 2012)

• Hsiang-Fu, Cho-Jui Hsieh, Si Si, and Inderjit Dhillon

• Best Paperです

Saturday, March 23, 13

Page 3: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Motivation

• 行列分解 (Matrix factorization)は、行列の要素に欠損値がある場合、推薦システムにおいて良いテクニック

• web-scaleのデータを処理するための、並列・分散化が容易で、かつ効率的な行列分解の計算方法が必要

Saturday, March 23, 13

Page 4: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

Saturday, March 23, 13

Page 5: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 6: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)

観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 7: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

The matrix factorization problem

L2正則化

k次元の素性空間におけるユーザiの素性と商品jの素性の内積(rank-k (k < m, k < n) 行列分解)

||・||_{F}は、フロベニウスノルムといい、行列の全要素の二乗の総和

観測の出来るユーザiの商品jに対する評価

Saturday, March 23, 13

Page 8: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

つまり• (推定に役に立たない素性の重みは0になるようにしつつ、)未観測な要素も含め、Aを近似行列WH^Tで推定できるように誤差を最小化W, Hを求める

• 制約なしの凸計画問題なのでStochastic

Gradient Descent (SGD)などの数値解法でW, Hを求める

• (1)が凸計画問題である証明はパス (See T村本)Saturday, March 23, 13

Page 9: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Coordinate Descent

• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法

• 変数を一つとみたときの目的関数は?

• どういう順番で変数を更新する?

Saturday, March 23, 13

Page 10: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Coordinate Descent

• ある一つ(以上)の変数を更新する際に、他のすべての変数を定数とみなす手法

• 変数を一つとみたときの目的関数は?

• どういう順番で変数を更新する?実はここをうまく考えると計算量を削減できる!

Saturday, March 23, 13

Page 11: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

変数を一つとみたときの目的関数は?

(4)はw_{it}をzとした時の目的関数

Saturday, March 23, 13

Page 12: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

変数を一つとみたときの目的関数は?

(4)はw_{it}をzとした時の目的関数

(1)を、内積の中のw_{it}が関係している項をzに置き換えただけ

Saturday, March 23, 13

Page 13: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

式(4)を解くと

Saturday, March 23, 13

Page 14: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

式(4)を解くとk�

t=1

withjt

素直にz*を計算するとO(|Ω_i|k)

f ’(z)=0とおくと得られます

Saturday, March 23, 13

Page 15: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

residual matrix Rk�

t=1

withjtを毎回計算したくないのでRを保持

Saturday, March 23, 13

Page 16: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

パラメータの更新k�

t=1

withjt はここで保持されている

h_{jt}も同様にして更新可能

O(|Ω_i|k)から O(|Ω_i|)に

Saturday, March 23, 13

Page 17: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

更新の効率化

• residual matrix Rを保持することで計算時間が O(|Ω|k)から O(|Ω|)に

• ここは提案手法ではないです

Saturday, March 23, 13

Page 18: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

どういう順番で変数を更新する?

• Item/User-wise Update

• Feature-wise Update

( ) ( )1i or j

m or n

1 t k

1i or j

m or n

1 t k

Saturday, March 23, 13

Page 19: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Item/User-wise Update

( )

Saturday, March 23, 13

Page 20: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update観点を変えて、Aをk個の行列の積の総和と考える

t番目の素性によるm×n行列m×1行列と1×n行列の積はm×n行列

提案手法では、これを求めることを考えますSaturday, March 23, 13

Page 21: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

u, vを求めるsubproblem

とすると(15)は

と変形できるSaturday, March 23, 13

Page 22: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wiseの何がおいしいのかR̂ij = Rij + wtihtj

wit = wti hjt = htj なので注目しているtに関する項は下線部で相殺して消去される

つまり、u_iとv_jの更新のたびにR^を計算し直す必要がなくなる

= Aij �k�

t�=1

wit�hjt� + wtihtj

Saturday, March 23, 13

Page 23: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update

( )一度のsubploblemについて、Rの計算量はT CCD

iterations中の変数の計算量に比べてO(1/T)倍

Saturday, March 23, 13

Page 24: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Feature-wise Update

( )一度のsubploblemについて、Rの計算量はT CCD

iterations中の変数の計算量に比べてO(1/T)倍

O(1 + 11 + 1

T

) = O(2T

T + 1) 倍速くなる

T回CCDをおこなうと、1回だけCCDをおこなった時より

Saturday, March 23, 13

Page 25: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 26: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

Saturday, March 23, 13

Page 27: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

12

rp

p個の小さなベクトルに分けて

Saturday, March 23, 13

Page 28: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

( )

12

rp

p個の小さなベクトルに分けて並列で更新

(16)はu_iは他のuと独立Saturday, March 23, 13

Page 29: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 30: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

関連研究• Alternating Least Square (ALS)

Hを固定してWを求める、Wを固定してHを求める、を繰り返す

並列化は容易だけど計算量が多い

• Stochastic Gradient Descent (SGD)

計算量は少ないが並列化が難しい

収束が学習率に依存、性能が変数の更新の順序に依存

Saturday, March 23, 13

Page 31: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 32: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 33: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Saturday, March 23, 13

Page 34: Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Conclusions

欠損があるAにおいて、CCD++ (Feature

wise-Update)は計算量が既存手法に比べて少なく、かつマルチコア環境、分散環境においてともに並列化が容易

Saturday, March 23, 13