higher order fused regularization for supervised learning with grouped parameters

33
Copyright©2014 NTT corp. All Rights Reserved. Higher Order Fused Regularization for Supervised Learning with Grouped Parameters Koh Takeuchi 1 , Yoshinobu Kawahara 2 , Tomoharu Iwata 1 1 NTT Communication Science Laboratories, Kyoto, Japan 2 Osaka University, Osaka, Japan

Upload: koh-takeuchi

Post on 16-Apr-2017

376 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

Copyright©2014 NTT corp. All Rights Reserved.

Higher Order Fused Regularization for Supervised Learning with Grouped Parameters Koh Takeuchi1, Yoshinobu Kawahara2, Tomoharu Iwata1 1NTT Communication Science Laboratories, Kyoto, Japan 2Osaka University, Osaka, Japan

Page 2: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

2 Copyright©2014 NTT corp. All Rights Reserved.

This talk: Contributions

n  A new regularizer that incorporates smoothness into overlapping groups of parameters on supervised learning.

n  An efficient network flow algorithm for minimizing our regularizer.

n  Empirical improvements on predictive performances of our regularizer from existing regularizers

Page 3: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

3 Copyright©2014 NTT corp. All Rights Reserved.

Target value, observed vector, parameters Convex empirical loss and regularizer The goal of regularized supervised learning

Its gradient is Lipschitz continuous Non-smooth

Task: Regularized Supervised Learning

�⇤ = argmin�

L(�)

yn 2 Y,xn 2 Rd (n = 1 · · ·N),� 2 Rd

Page 4: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

4 Copyright©2014 NTT corp. All Rights Reserved.

Problem: Overfitting in supervised learning

When # of samples (N) is small, parameter  estimations resultin poor predictive performances. Regularizers prevent overfitting by incorporating prior knowledge to impose sparseness or smoothness on estimations.

Sparseor

Smooth

Regularizer

Train

Test

Erro

r

# of iterations

Page 5: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

5 Copyright©2014 NTT corp. All Rights Reserved.

Existing sparsity-inducing reguralizers (with l1-norm)

Lasso (Tibshirani+, 1996)J Inefficient parameters have zero L Cannot utilize structures Group Lasso (Yuan+, 2006)

J Inefficient groups of parameters have zero L Cannot impose smoothness

⌦(�) =dX

i=1

k�ik1

⌦(�) =KX

k=1

k�(gk)k2The k-th group of parameters

Page 6: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

6 Copyright©2014 NTT corp. All Rights Reserved.

Existing smoothness-inducingreguralizers (with l1-norm)

Generalized Fused Lasso (Tibshirani+, 2005)J Pairs of parameters have the same value L Cannot utilize groups Existing regulalizers cannot utilize group structures to impose smoothness on parameters

⌦(�) =X

(i,j)2E

wi,jk�i � �jk1β1

β2 β3

w1,2 w1,3

E = {(1,2), (1,3)}

Page 7: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

7 Copyright©2014 NTT corp. All Rights Reserved.

Lasso and Laplace distribution

The penalty term of Lasso corresponds to the Laplace prior Park, Trevor, and George Casella. "The bayesian lasso." Journal of the American Statistical Association 103.482 (2008): 681-686.

p(�|�2) =

dY

i=1

1

2

p�2

exp(�k�ik1/p�2

)

Page 8: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

8 Copyright©2014 NTT corp. All Rights Reserved.

Idea: Group structures in parameters

Words expressing the same meaning Music pieces in the same genre Books released in the same year Parameters of linear or quadratic terms in polynomial regression X

i,j

�i,jxixj

X

i

�ixi

Tesco, Sainsbury’s

Page 9: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

9 Copyright©2014 NTT corp. All Rights Reserved.

Idea: Overlapping groups of parameters

Suppose there are five parameters corresponding five objects

Fruit Group Orange Color Group

A group of parameters would provide similar functionality

Page 10: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

10 Copyright©2014 NTT corp. All Rights Reserved.

Idea: Assumption on overlapping groups of parameters

β1 β4β2 β3 β5

GROUP 1 GROUP 2

Agroupofparameterswouldtakesimilarvalues(notequalto0)inasupervisedlearningproblem

Page 11: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

11 Copyright©2014 NTT corp. All Rights Reserved.

Groups are not always match the problem. Utilize only effective part of a group.

Idea: Make regularizer robust for incomplete groups

β1 β2 β3

Group 1

An effictive part of a group for a problem

Page 12: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

12 Copyright©2014 NTT corp. All Rights Reserved.

Recap:

n  Given overlapping groups of parameters n  Groups may be not perfectly fit a problem

n  Demands for our regularizer: U=lizeoverlappinggroupstoimposesmoothnessinparameteres=ma=ons

U=lizeaneffec=vepartofagroupthatmatchtheproblem(Robustness)

Page 13: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

13 Copyright©2014 NTT corp. All Rights Reserved.

Robust Pn Potts model (Kohli+, 2005)

A cut function for image segmentation that enforces label consistency on overlapping groups.J Control consistency in a group (Robust) L Discrete function fho

(S) =KX

k=1

min�✓k0

+ ck0

(V \ S), ✓k1

+ ck1

(S), ✓kmax

�,

Weights for parameters in the k-th group

Hyper parameter for controlling consistency

Page 14: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

14 Copyright©2014 NTT corp. All Rights Reserved.

An example of Robust Pn Potts model (Kohli+, 2005)

Potential with K=1.

1 s t di

0θ1 = θ0

5

θmax

10θ1 + c1(Ui)

θmax

θ0c0(V \Ui)

pixels

Penalty

Page 15: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

15 Copyright©2014 NTT corp. All Rights Reserved.

Continuous relaxation via the Lovász extension

n  A continuous relaxation can be obtained via the Lovász extension (Lovász,1983).

n  Robust Pn Potts model is a submodular function (Edmonds,1970).

n  A Lovász extension is convex if and only if its set function is submodular

Submodular Convex

Page 16: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

16 Copyright©2014 NTT corp. All Rights Reserved.

Parameters are sorted as ü  This regularizer consists of three parts. ü Check an example of estimated

parameters with K = 1.

Higher Order Fused regularizer[Proposition 1]

⌦ho

(�) =KX

k=1

0

@X

i2{j1,...,js�1}

(�i � �js)ck1,i + �js(✓

kmax

� ✓k1

)

+�jt(✓k0

� ✓kmax

) +X

i2{jt+1,··· ,jd}

(�jt � �i)ck0,i

1

A ,

⌦ho

(�) =KX

k=1

0

@X

i2{j1,...,js�1}

(�i � �js)ck1,i + �js(✓

kmax

� ✓k1

)

+�jt(✓k0

� ✓kmax

) +X

i2{jt+1,··· ,jd}

(�jt � �i)ck0,i

1

A ,

�j1 � �j2 � · · · � �jd

k-th group

Page 17: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

17 Copyright©2014 NTT corp. All Rights Reserved.

This property allows parameters whose values are larger or smaller than threshold to have different values.

Higher Order Fused regularizer

1 s t di

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

βj i

βjs

βjt

βjiif i < js then βi = βjs

if i > jt then βi = βjt

i between js and jt have the same value

js and jt are set by the hyper parameter

Page 18: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

18 Copyright©2014 NTT corp. All Rights Reserved.

Optimization: Minimization of HOF regulalizer

is a non-smooth convex function and thus it difficult to minimize Utilize the proximity operator (Moreau, 1962) Remark: The optimal of the proximity operator is equal to the original function with sufficiently small γ (e.g. γ < 0.001)

prox�⌦ho

ˆ� = argmin

�⌦

ho

(�) +1

�k� � ˆ�k2

2

⌦ho

(�)⌦

ho

(�)

Page 19: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

19 Copyright©2014 NTT corp. All Rights Reserved.

Solve the proximity operator via maximum network flow algorithm

The proximity operator can be solved by a Minimum-Norm-Point Algorithm (Fujishige, 06)LThe Fastest algorithm runs in O(d5EO+d6) Propose a maximum flow algorithm that run in O(d|E|log(d2/|E|)) [Theorem 1].

Page 20: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

20 Copyright©2014 NTT corp. All Rights Reserved.

Minimize the loss function L(β)

Forward-backward splitting algorithm can attain optimal of L(β) with O(1/n2) convergence rate (Nesterov’s acceralation).

Page 21: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

21 Copyright©2014 NTT corp. All Rights Reserved.

Experiments: Linear regression problem

n  Compare predictive performances by Root Mean Squared Error (RMSE) on test data set

n  Linear regression experiments with ・Synthesis data sets (d=100) non-overlapping, overlapping groups・Real-world data sets

vuut 1

N

NX

n=1

kyn � ynk22

Page 22: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

22 Copyright©2014 NTT corp. All Rights Reserved.

Synthesis data experiment: Linear regression with five groups

a)  non-overlapping groups setting (d=100)

b)  overlapping groups setting (d=100)

Page 23: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

23 Copyright©2014 NTT corp. All Rights Reserved.

Estimated parameters with non- overlapping groups (d=100, N=30)

0 10 20 30 40 50 60 70 80 90 100−5

0

5

Ordinary Least Squares0 10 20 30 40 50 60 70 80 90 100

−20

0

20Lasso

0 10 20 30 40 50 60 70 80 90 100−10

0

10Sparse Group Lasso

0 10 20 30 40 50 60 70 80 90 100−5

0

5Generalized Fused Lasso

0 10 20 30 40 50 60 70 80 90 100−5

0

5Proposed

・Bluelinesaretrueprameters・Circlesarees=matedparameters

Page 24: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

24 Copyright©2014 NTT corp. All Rights Reserved.

Estimated parameters with overlapping groups (d=100, N=30)

0 10 20 30 40 50 60 70 80 90 100−5

0

5

Ordinary Least Squares0 10 20 30 40 50 60 70 80 90 100

−10

0

10Lasso

0 10 20 30 40 50 60 70 80 90 100−10

0

10Sparse Group Lasso

0 10 20 30 40 50 60 70 80 90 100−5

0

5Generalized Fused Lasso

0 10 20 30 40 50 60 70 80 90 100−5

0

5Proposed

・Bluelinesandcirclesaretrueandes=matedparameters,respec=vely

overlapping

Page 25: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

25 Copyright©2014 NTT corp. All Rights Reserved.

Real-world data experiment: MovieLens100k

Average rating prediction of each movie from users who watched the movie (d = 942) 31 user groups (8 age, 2 gender, and 21 occupation)

a)MovieLens100k

Page 26: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

26 Copyright©2014 NTT corp. All Rights Reserved.

Real-world data experiment:Yelp

Rating value prediction from a review text (bag-of-words) where d = 1,000. We employed 52 word groups (50 semantic groups and two positive and negative groups).

Page 27: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

27 Copyright©2014 NTT corp. All Rights Reserved.

Estimated parameters of four groups

Blue=positive, Red=negative, Size=value Robustness enables to utilize effective parts of groups to impose smoothness.

Page 28: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

28 Copyright©2014 NTT corp. All Rights Reserved.

Conclusion

n  An l1-norm based regularizer that incorporates smoothness into overlapping groups of parameters.

n  An efficient flow algorithm for calculating the exact minimizer of the regularizer.

n  Empirical improvements on predictive performances from existing regularizers in supervised learning.

Page 29: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

29 Copyright©2014 NTT corp. All Rights Reserved.

Future work:

n  Non-linear regression and classification with HOF regulaizer

n  Apply HOF reguralizer to matrix/tensor completion problem

 ・Recommendation (Netflix)  ・Spatio-Temporal data prediction n  Extend HOF reguralizer to ・Hierarchical groups ・Hyper graph

Page 30: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

Copyright©2014 NTT corp. All Rights Reserved.

Higher Order Fused Regularization for Supervised Learning with Grouped Parameters Koh Takeuchi1, Yoshinobu Kawahara2, Tomoharu Iwata1 1NTT Communication Science Laboratories, Kyoto, Japan 2Osaka University, Osaka, Japan

Page 31: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

31 Copyright©2014 NTT corp. All Rights Reserved.

ロバシュ拡張の証明

    はロバスト高階エネルギー関数のロバシュ拡張と一致する(証明略) →近接法を劣モジュラ関数最小化問題へ変換できる(証明略) ロバスト高階エネルギー関数はComputer Visionで提案された画素クラスタリング技術

⌦ho

(�)

fho

(S) =KX

k=1

min�✓k0

+ ck0

(V \ S), ✓k1

+ ck1

(S), ✓kmax

�1 s t d

i

0θ1 = θ0

5

θmax

10θ1 + c1(Ui)

θmax

θ0c0(V \Ui)

Page 32: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

32 Copyright©2014 NTT corp. All Rights Reserved.

グループ内のパラメタを降べき順に並べ替え、値の一致度を評価する

1項目: βjsより大きいパラメタをβjsと一致

2,3項目: βjsとβjt間の値を一致 4項目: βjtより小さいパラメタをβjtと一致

ハイパーパラメタ

X

i2{j1,...,js�1}

(�i � �js)ck1,i

(�js � �jt)✓kmax

� �js✓k1

+ �jt✓k0

Page 33: Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

33 Copyright©2014 NTT corp. All Rights Reserved.

    はRobust Higher Order Potentialのロバシュ拡張と一致する

Robust Higher Order PotentialはComputer Visionの画素クラスタリング技術 cf: ロバシュ拡張

fho

(S) =KX

k=1

min�✓k0

+ ck0

(V \ S), ✓k1

+ ck1

(S), ✓kmax

f(�) =dX

i=1

�ji (f({j1, . . . , ji)� f({j1, . . . , ji�1))

⌦ho

(�)