online learning in dynamic environment · introduction dynamic environment conclusion online...

69
Introduction Dynamic Environment Conclusion Online Learning in Dynamic Environment Lijun Zhang Dept. of Computer Science and Technology, Nanjing University, China September 29, 2016 http://cs.nju.edu.cn/zlj Online Learning Learning And Mining from DatA A D A M L NANJING UNIVERSITY

Upload: others

Post on 21-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion

Online Learning in Dynamic Environment

Lijun Zhang

Dept. of Computer Science and Technology, Nanjing University, China

September 29, 2016

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAMLNANJING UNIVERSITY

Page 2: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 3: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Big Data

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 4: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Big Data

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

http://cs.nju.edu.cn/zlj Online Learning

Papers about Online Learning in 2015

ICML 17+, NIPS 25+, COLT 13+

Learning And Mining from DatA

A DAML

Page 5: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 6: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Online Learning

Online Learning [Shalev-Shwartz, 2011]

Online learning is the process of answering a sequence ofquestions given (maybe partial) knowledge of answersto previous questions and possibly additional information.

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 7: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Online Learning

Online Learning [Shalev-Shwartz, 2011]

Online learning is the process of answering a sequence ofquestions given (maybe partial) knowledge of answersto previous questions and possibly additional information.

Predicted Label

Multi-class Classification

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 8: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Online Learning

Online Learning [Shalev-Shwartz, 2011]

Online learning is the process of answering a sequence ofquestions given (maybe partial) knowledge of answersto previous questions and possibly additional information.

Predicted Label

Ground Truth

Multi-class Classification

Full-Information Online Learning

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 9: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Online Learning

Online Learning [Shalev-Shwartz, 2011]

Online learning is the process of answering a sequence ofquestions given (maybe partial) knowledge of answersto previous questions and possibly additional information.

Predicted Label

Correct or Not

Multi-class Classification

Bandit Online Learning

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 10: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Formal Definitions

Online Learning

1: for t = 1, 2, . . . ,T do

4: end for

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 11: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Formal Definitions

Online Learning

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)

4: end for

Learner Adversary

A classifier

An example

A loss

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 12: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Formal Definitions

Online Learning

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and updates xt

4: end for

Learner Adversary

A classifier

Suffer and update

A loss

An example

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 13: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Formal Definitions

Online Learning

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and updates xt

4: end for

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 14: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Formal Definitions

Online Learning

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and updates xt

4: end for

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

Full-Information Online LearningLearner observes the function ft(·)Online first-order optimization

Bandit Online LearningLearner only observes the value of ft(xt)Online zero-order optimization

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 15: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 16: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Regret

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 17: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Regret

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

Regret

Regret =T∑

t=1

ft(xt) − minx∈X

T∑

t=1

ft(x)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 18: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Regret

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

Regret

Regret =T∑

t=1

ft(xt)

︸ ︷︷ ︸

Cumulative Loss of Online Learner

− minx∈X

T∑

t=1

ft(x)

︸ ︷︷ ︸

Minimal Loss in Batch Learner

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 19: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Regret

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

Regret

Regret =T∑

t=1

ft(xt)

︸ ︷︷ ︸

Cumulative Loss of Online Learner

− minx∈X

T∑

t=1

ft(x)

︸ ︷︷ ︸

Minimal Loss in Batch Learner

Hannan Consistent

lim supT→∞

1T

(T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

)

= 0, with probability 1

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 20: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

Regret

Cumulative Loss

Cumulative Loss =T∑

t=1

ft(xt)

Regret

Regret =T∑

t=1

ft(xt)

︸ ︷︷ ︸

Cumulative Loss of Online Learner

− minx∈X

T∑

t=1

ft(x)

︸ ︷︷ ︸

Minimal Loss in Batch Learner

Hannan ConsistentT∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x) = o(T ), with probability 1

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 21: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 22: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

Round · · · t

Learner · · · xt

Adversary · · · ft(·)

Loss · · · ft(xt)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 23: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

Round · · · t t + 1

Learner · · · xt

Adversary · · · ft(·) ft+1(·)

Loss · · · ft(xt) ft+1(xt+1)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 24: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

Round · · · t t + 1

Learner · · · xt xt+1 = argminx∈X ft+1(x)

Adversary · · · ft(·) ft+1(·)

Loss · · · ft(xt) ft+1(xt+1)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 25: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

Round · · · t t + 1

Learner · · · xt xt+1 = ΠX (xt − ηt∇ft+1(xt))

Adversary · · · ft(·) ft+1(·)

Loss · · · ft(xt) ft+1(xt+1)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 26: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (I)

Online Gradient Descent1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ X

Adversary chooses a function ft(·)3: Learner suffers loss ft(xt) and

xt+1 = ΠX (xt − ηt∇ft(xt))

4: end for

Round · · · t t + 1

Learner · · · xt xt+1 = ΠX (xt − ηt∇ft(xt))

Adversary · · · ft(·) ft+1(·)

Loss · · · ft(xt) ft+1(xt+1)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 27: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (II)

Convex Functions [Zinkevich, 2003]

Online Gradient Descent with ηt = 1/√

tT∑

t=1

ft(xt)−minx∈X

T∑

t=1

ft(x) ≤D2

2

√T+

(√T − 1

2

)

G2 = O(√

T)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 28: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (II)

Convex Functions [Zinkevich, 2003]

Online Gradient Descent with ηt = 1/√

tT∑

t=1

ft(xt)−minx∈X

T∑

t=1

ft(x) ≤D2

2

√T+

(√T − 1

2

)

G2 = O(√

T)

Strongly Convex Functions [Hazan et al., 2007]

Online Gradient Descent with ηt = 1/(λt)T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x) ≤G2

2λ(1 + log T ) = O (log T )

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 29: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (II)

Convex Functions [Zinkevich, 2003]

Online Gradient Descent with ηt = 1/√

tT∑

t=1

ft(xt)−minx∈X

T∑

t=1

ft(x) ≤D2

2

√T+

(√T − 1

2

)

G2 = O(√

T)

Strongly Convex Functions [Hazan et al., 2007]

Online Gradient Descent with ηt = 1/(λt)T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x) ≤G2

2λ(1 + log T ) = O (log T )

Exponentially Concave Functions [Hazan et al., 2007]Convex and Smooth Functions [Srebro et al., 2010]Sparse Online Learning [Langford et al., 2009]Online Classification [Freund and Schapire, 1999]

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 30: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Online Learning Regret

State-of-the-Art in Full-Information Setting (II)

Convex Functions [Zinkevich, 2003]

Online Gradient Descent with ηt = 1/√

tT∑

t=1

ft(xt)−minx∈X

T∑

t=1

ft(x) ≤D2

2

√T+

(√T − 1

2

)

G2 = O(√

T)

Strongly Convex Functions [Hazan et al., 2007]

Online Gradient Descent with ηt = 1/(λt)T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x) ≤G2

2λ(1 + log T ) = O (log T )

Exponentially Concave Functions [Hazan et al., 2007]Convex and Smooth Functions [Srebro et al., 2010]Sparse Online Learning [Langford et al., 2009]Online Classification [Freund and Schapire, 1999]

http://cs.nju.edu.cn/zlj Online Learning

Two CharacteristicsUpdating only once per round

A decreasing step size

Learning And Mining from DatA

A DAML

Page 31: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 32: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

A Paradox

The Ideal Algorithm

xt+1 = ΠX (xt − ηt∇ft+1(xt))

Online Gradient Descent is Imperfect

xt+1 = ΠX (xt − ηt∇ft(xt))

Fail if ft and ft+1 differ much

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 33: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

A Paradox

The Ideal Algorithm

xt+1 = ΠX (xt − ηt∇ft+1(xt))

Online Gradient Descent is Imperfect

xt+1 = ΠX (xt − ηt∇ft(xt))

Fail if ft and ft+1 differ much

Its Theoretical Guarantees are StrongT∑

t=1

ft(xt)−minx∈X

T∑

t=1

ft(x) ≤

O(√

T)

, Convex Functions

O (log T ) , Strongly Convex Functions

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 34: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Dynamic Regret

Regret

Regret =T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 35: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Dynamic Regret

Regret →Static Regret

Regret =T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

The Rationale of RegretOne of the decision is reasonably good during T rounds

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 36: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Dynamic Regret

Regret →Static Regret

Regret =T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

The Rationale of RegretOne of the decision is reasonably good during T rounds

Dynamic EnvironmentDifferent decisions will be good in different periodsE.g. online tracking, online advertising, portfolio investment

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 37: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Dynamic Regret

Regret →Static Regret

Regret =T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

The Rationale of RegretOne of the decision is reasonably good during T rounds

Dynamic EnvironmentDifferent decisions will be good in different periodsE.g. online tracking, online advertising, portfolio investment

Dynamic Regret

Dynamic Regret =T∑

t=1

ft(xt)−T∑

t=1

minx∈X

ft(x)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 38: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Dynamic Regret

Regret →Static Regret

Regret =T∑

t=1

ft(xt)− minx∈X

T∑

t=1

ft(x)

The Rationale of RegretOne of the decision is reasonably good during T rounds

Dynamic EnvironmentDifferent decisions will be good in different periodsE.g. online tracking, online advertising, portfolio investment

Dynamic Regret

Dynamic Regret =T∑

t=1

ft(xt)−T∑

t=1

minx∈X

ft(x)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 39: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

The Challenge

Sublinear Dynamic Regret is Impossible in General!

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 40: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

The Challenge

Sublinear Dynamic Regret is Impossible in General!

An Example—A Random Adversary

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ R

3: Adversary chooses (x − 1)2 and (x + 1)2 randomly4: end for

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 41: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

The Challenge

Sublinear Dynamic Regret is Impossible in General!

An Example—A Random Adversary

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ R

3: Adversary chooses (x − 1)2 and (x + 1)2 randomly4: end for

The expectation of dynamic regret at t-th round12

(

(xt − 1)2 + (xt + 1)2)

− 12

(

minx

(x − 1)2 + minx

(x + 1)2)

=x2t + 1 ≥ 1

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 42: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

The Challenge

Sublinear Dynamic Regret is Impossible in General!

An Example—A Random Adversary

1: for t = 1, 2, . . . ,T do2: Learner picks a decision xt ∈ R

3: Adversary chooses (x − 1)2 and (x + 1)2 randomly4: end for

The expectation of dynamic regret at t-th round12

(

(xt − 1)2 + (xt + 1)2)

− 12

(

minx

(x − 1)2 + minx

(x + 1)2)

=x2t + 1 ≥ 1

E

[T∑

t=1

ft(xt)−T∑

t=1

minx

ft(x)

]

≥ T

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 43: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Make the Impossible Possible

Dynamic Regret

T∑

t=1

ft(xt)−T∑

t=1

minx∈X

ft(x) =T∑

t=1

ft(xt)−T∑

t=1

ft(x∗t )

where x∗t ∈ argminx∈X ft(x).

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 44: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Make the Impossible Possible

Dynamic Regret

T∑

t=1

ft(xt)−T∑

t=1

minx∈X

ft(x) =T∑

t=1

ft(xt)−T∑

t=1

ft(x∗t )

where x∗t ∈ argminx∈X ft(x).

Assumptions

Functional Variation [Besbes et al., 2015]

FT :=T∑

t=2

‖ft(·)− ft−1(·)‖∞ =T∑

t=2

maxx∈X

|ft(x)− ft−1(x)|

Path-length [Mokhtari et al., 2016]

P∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 45: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Existing Results (I)

Functional Variation [Besbes et al., 2015]

For any f1, . . . , fT such that

FT :=T∑

t=2

maxx∈X

|ft(x)− ft−1(x)| ≤ VT

the restarted online gradient descent (with input VT )satisfies

Dynamic Regret ≤

O(

T 2/3VT1/3)

, Convex Functions

O(

log T√

TVT

)

, Strongly Convex Functions

Dynamic regret is sublinear if VT is sublinear

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 46: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Existing Results (I)

Functional Variation [Besbes et al., 2015]

For any f1, . . . , fT such that

FT :=T∑

t=2

maxx∈X

|ft(x)− ft−1(x)| ≤ VT

the restarted online gradient descent (with input VT )satisfies

Dynamic Regret ≤

O(

T 2/3VT1/3)

, Convex Functions

O(

log T√

TVT

)

, Strongly Convex Functions

Dynamic regret is sublinear if VT is sublinear

It is not adaptive

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 47: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Existing Results (II)

Path-length[Zinkevich, 2003, Mokhtari et al., 2016, Yang et al., 2016]

Online gradient descent satisfies

Dynamic Regret ≤

O(√

TP∗T

)

, Convex Functions

O (P∗T ) , Strongly Convex and Smooth

O (P∗T ) , Convex and Smooth Functions

For any f1, . . . , fT such that

P∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖ ≤ BT

online gradient descent (with input BT ) satisfies

Dynamic Regret ≤ O(√

TBT

)

, Convex Functions

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 48: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Our Contribution [Zhang et al., 2016]

Squared Path-length

S∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖2

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 49: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Our Contribution [Zhang et al., 2016]

Squared Path-length

S∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖2

Example 1 (Path-length vs Squared Path-length)

Suppose ‖x∗t − x∗

t−1‖ = T−α for all t ≥ 1 and α > 0, we have

S∗T+1 = T 1−2α ≪ P∗

T+1 = T 1−α.

In particular, when α = 1/2, we have

S∗T+1 = 1 ≪ P∗

T+1 =√

T .

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 50: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Our Contribution [Zhang et al., 2016]

Squared Path-length

S∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖2

Online Multiple Gradient Descent

Strongly convex and smooth functions

Semi-strongly convex and smooth functions

Dynamic Regret ≤ O (min(P∗T ,S∗

T ))

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 51: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Our Contribution [Zhang et al., 2016]

Squared Path-length

S∗T :=

T∑

t=2

‖x∗t − x∗

t−1‖2

Online Multiple Gradient Descent

Strongly convex and smooth functions

Semi-strongly convex and smooth functions

Dynamic Regret ≤ O (min(P∗T ,S∗

T ))

Online Multiple Newton Update

Self-concordant functions

Dynamic Regret ≤ O (min(P∗T ,S∗

T ))

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 52: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 53: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Strongly Convex and Smooth Functions

Definition 1

A function f : X 7→ R is λ-strongly convex, if

f (y) ≥ f (x) + 〈∇f (x), y − x〉+ λ

2‖y − x‖2, ∀x, y ∈ X .

Definition 2

A function f : X 7→ R is L-smooth, if

f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖y − x‖2, ∀x, y ∈ X .

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 54: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Strongly Convex and Smooth Functions

Definition 1

A function f : X 7→ R is λ-strongly convex, if

f (y) ≥ f (x) + 〈∇f (x), y − x〉+ λ

2‖y − x‖2, ∀x, y ∈ X .

Definition 2

A function f : X 7→ R is L-smooth, if

f (y) ≤ f (x) + 〈∇f (x), y − x〉+ L2‖y − x‖2, ∀x, y ∈ X .

Example 2 (Strongly convex and smooth functions)

A quadratic form f (x) = x⊤Ax − 2b⊤x + c whereαI � A � βI, α > 0 and β < ∞;

The regularized logistic lossf (x) = log(1 + exp(b⊤x)) + λ

2‖x‖2, where λ > 0.

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 55: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Online Multiple Gradient Descent

The Algorithm1: Let x1 be any point in X2: for t = 1, . . . ,T do3: z1

t = xt

4: for j = 1, . . . ,K do5:

zj+1t = ΠX

(

zjt − η∇ft(z

jt))

6: end for7: xt+1 = zK+1

t8: end for

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 56: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Online Multiple Gradient Descent

The Algorithm1: Let x1 be any point in X2: for t = 1, . . . ,T do3: z1

t = xt

4: for j = 1, . . . ,K do5:

zj+1t = ΠX

(

zjt − η∇ft(z

jt))

6: end for7: xt+1 = zK+1

t8: end for

Multiple updatings in each round

A constant step size

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 57: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Theoretical Guarantees

Theorem 1

By setting η ≤ 1/L and K = ⌈1/η+λ2λ ln 4⌉, we have

T∑

t=1

ft(xt)− ft(x∗t )

≤min

2GP∗T + 2G‖x1 − x∗

1‖,

12α

T∑

t=1

‖∇ft(x∗t )‖2 + 2(L + α)S∗

T + (L + α)‖x1 − x∗1‖2

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 58: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Theoretical Guarantees

Theorem 1

By setting η ≤ 1/L and K = ⌈1/η+λ2λ ln 4⌉, we have

T∑

t=1

ft(xt)− ft(x∗t )

≤min

2GP∗T + 2G‖x1 − x∗

1‖,

12α

T∑

t=1

‖∇ft(x∗t )‖2 + 2(L + α)S∗

T + (L + α)‖x1 − x∗1‖2

Corollary 1

Suppose∑T

t=1 ‖∇ft(x∗t )‖2 ≤ O(S∗

T ), we haveT∑

t=1

ft(xt)− ft(x∗t ) ≤ O (min(P∗

T ,S∗T )) .

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 59: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Theoretical Guarantees

Theorem 1

By setting η ≤ 1/L and K = ⌈1/η+λ2λ ln 4⌉, we have

T∑

t=1

ft(xt)− ft(x∗t )

≤min

2GP∗T + 2G‖x1 − x∗

1‖,

12α

T∑

t=1

‖∇ft(x∗t )‖2 + 2(L + α)S∗

T + (L + α)‖x1 − x∗1‖2

Corollary 1

In particular, if x∗t belongs to the relative interior of X

T∑

t=1

ft(xt)− ft(x∗t )

≤min(

2GP∗T + 2G‖x1 − x∗

1‖, 2LS∗T + L‖x1 − x∗

1‖2)

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 60: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Semi-strongly Convex and Smooth Functions

Definition 3 ([Gong and Ye, 2014])

A function f : X 7→ R is semi-strongly convex over X , if

f (x)− minx∈X

f (x) ≥ β

2‖x − ΠX ∗(x)‖2

where X ∗ = {x ∈ X : f (x) ≤ minx∈X f (x)}.

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 61: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Semi-strongly Convex and Smooth Functions

Definition 3 ([Gong and Ye, 2014])

A function f : X 7→ R is semi-strongly convex over X , if

f (x)− minx∈X

f (x) ≥ β

2‖x − ΠX ∗(x)‖2

where X ∗ = {x ∈ X : f (x) ≤ minx∈X f (x)}.

Example 3 (Semi-strongly Convex and Smooth Functions)

minx∈X⊆Rd

f (x) = g(Ex) + b⊤x

where g(·) is strongly convex and smooth, and X is either Rd ora polyhedral set. [Luo and Tseng, 1993]

The semi-strong convexity and error bound property areequivalent up to constant factors [Necoara et al., 2015]

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 62: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Theoretical Guarantees

Theorem 2

By setting η ≤ 1/L and K = ⌈1/η+ββ ln 4⌉, we have

T∑

t=1

ft(xt)−T∑

t=1

minx∈X

ft(x)

≤min

2GP∗T + 2G‖x1 − x̄1‖,

12α

G∗T + 2(L + α)S∗

T + (L + α)‖x1 − x̄1‖2

where G∗T = max{x∗

t ∈X∗

t }Tt=1

∑Tt=1 ‖∇ft(x∗

t )‖2, and x̄1 = ΠX ∗

1(x1).

Corollary 2

Suppose G∗T ≤ O(S∗

T ), we haveT∑

t=1

ft(xt)− ft(x∗t ) ≤ O (min(P∗

T ,S∗T )) .

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 63: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Outline

1 IntroductionOnline LearningRegret

2 Online Learning in Dynamic EnvironmentDynamic RegretOnline Multiple Gradient DescentOnline Multiple Newton Update

3 Conclusion and Future Work

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 64: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Self-concordant Functions

Definition 4 ([Nemirovski, 2004])A function f : X 7→ R is called self-concordant on X , if

f (xi) → ∞ along every sequence {xi ∈ X} converging, asi → ∞, to a boundary point of X ;

f satisfies the differential inequality

|D3f (x)[h,h,h]| ≤ 2(

h⊤∇2f (x)h)3/2

for all x ∈ X and all h ∈ Rd .

Example 4 (Self-concordant Functions)

The function f (x) = − log x

A convex quadratic form f (x) = x⊤Ax − 2b⊤x + c whereA ∈ R

d×d , b ∈ Rd , and c ∈ R

If f : Rd 7→ R is self-concordant, and A ∈ Rd×k , b ∈ R

d ,then f (Ax + b) is self-concordant.

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 65: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Online Multiple Newton Update

The Algorithm1: Let x1 be any point in X1

2: for t = 1, . . . ,T do3: z1

t = xt

4: for j = 1, . . . ,K do5:

zj+1t =zj

t −1

1 + λt(zjt)

[

∇2ft(zjt)]−1

∇ft(zjt)

λt(zjt) =

∇ft(zjt)⊤[

∇2ft(zjt)]−1

∇ft(zjt)

6: end for7: xt+1 = zK+1

t8: end for

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 66: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion Dynamic Regret Gradient Descent Newton Update

Theoretical Guarantees

Theorem 3

Assume

‖x∗t−1 − x∗

t ‖2t ≤ 1

144, ∀t ≥ 2.

When t = 1, we choose K = O(1)(f1(x1)− f1(x∗1) + log logµ)

such that

‖x2 − x∗1‖2

1 ≤ 1144µ

.

For t ≥ 2, we set K = ⌈log4(16µ)⌉ thenT∑

t=1

ft(xt)− ft(x∗t ) ≤ min

(16P∗

T , 4S∗T +

136

)

+ f1(x1)− f1(x∗1).

T∑

t=1

ft(xt)− ft(x∗t ) ≤ O (min(P∗

T ,S∗T )) .

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 67: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion

Conclusion and Future Work

Conclusion

Squared path-length: S∗T :=

∑Tt=2 ‖x∗

t − x∗t−1‖2

Online multiple gradient descent (strongly/Semi-stronglyconvex and smooth functions)

T∑

t=1

ft(xt)− ft(x∗t ) ≤ O (min(P∗

T ,S∗T ))

Online multiple Newton update (self-concordant functions)

Future Work

A deep understanding of functional variation, path-length,squared path-length

Beyond dynamic regret

Static regret of semi-strongly convex functions

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML

Page 68: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion

Reference I

Besbes, O., Gur, Y., and Zeevi, A. (2015).

Non-stationary stochastic optimization.Operations Research, 63(5):1227–1244.

Freund, Y. and Schapire, R. E. (1999).

Large margin classification using the perceptron algorithm.Machine Learning, 37(3):277–296.

Gong, P. and Ye, J. (2014).

Linear convergence of variance-reduced stochastic gradient without strong convexity.ArXiv e-prints, arXiv:1406.1102.

Hazan, E., Agarwal, A., and Kale, S. (2007).

Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2-3):169–192.

Langford, J., Li, L., and Zhang, T. (2009).

Sparse online learning via truncated gradient.Journal of Machine Learning Research, 10:777–801.

Luo, Z.-Q. and Tseng, P. (1993).

Error bounds and convergence analysis of feasible descent methods: a general approach.Annals of Operations Research, 46-47(1):157–178.

Mokhtari, A., Shahrampour, S., Jadbabaie, A., and Ribeiro, A. (2016).

Online optimization in dynamic environments: Improved regret rates for strongly convex problems.ArXiv e-prints, arXiv:1603.04954.

http://cs.nju.edu.cn/zlj Online Learning

Thanks!

Learning And Mining from DatA

A DAML

Page 69: Online Learning in Dynamic Environment · Introduction Dynamic Environment Conclusion Online Learning Regret Online Learning Online Learning [Shalev-Shwartz, 2011] Online learning

Introduction Dynamic Environment Conclusion

Reference II

Necoara, I., Nesterov, Y., and Glineur, F. (2015).

Linear convergence of first order methods for non-strongly convex optimization.ArXiv e-prints, arXiv:1504.06298.

Nemirovski, A. (2004).

Interior point polynomial time methods in convex programming.Lecture notes, Technion – Israel Institute of Technology.

Shalev-Shwartz, S. (2011).

Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194.

Srebro, N., Sridharan, K., and Tewari, A. (2010).

Smoothness, low noise and fast rates.In Advances in Neural Information Processing Systems 23, pages 2199–2207.

Yang, T., Zhang, L., Jin, R., and Yi, J. (2016).

Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient.In Proceedings of the 33rd International Conference on Machine Learning.

Zhang, L., Yang, T., Yi, J., Jin, R., and Zhou, Z.-H. (2016).

Improved dynamic regret for non-degeneracy functions.ArXiv e-prints, arXiv:1608.03933.

Zinkevich, M. (2003).

Online convex programming and generalized infinitesimal gradient ascent.In Proceedings of the 20th International Conference on Machine Learning, pages 928–936.

http://cs.nju.edu.cn/zlj Online Learning

Learning And Mining from DatA

A DAML