lecture 11: mirror descent and variable metric...

Lecture 11 Mirror Descent and Variable MetricMethods

May 8 - 13 2020

Mirror Descent and Variable Metrics DAMC Lecture 11 May 8 - 13 2020 1 26


Definition Let h be a differentiable convex function differentiableon C The Bregman divergence associated with h is defined as

Dh(xy) = h(x)minus h(y)minus 〈nablah(y)xminus y〉

Dh(xy) is always non-negative and convex in its first argument x


h is λ-strongly convex over C with respect to the norm 983042 middot 983042 if

h(x) ge h(y) + 〈nablah(y)xminus y〉+ λ

2983042xminus y9830422 forall xy isin C

Note that strong convexity of h is equivalent to

Dh(xy) geλ


2 Mirror Descent method

Compute subgradient gk isin partf(xk)

Update

xk+1 = argminxisinC

983069f(xk) + 〈gkxminus xk〉+ 1

αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

αkDh(xx

k)

983070


We often attempt to choose h to better match the geometry of theunderlying constraint set C The goal is roughly to choose astrongly convex function h so that the diameter of C is small in thenorm 983042 middot 983042 (need not be ℓ2 or Euclidean norm) with respect towhich h is strongly convex

Example Gradient descent is mirror descent Consider

h(x) =1

2983042x98304222 Dh(xy) =

1

2983042xminus y98304222

The update of mirror descent is

xk+1 = argminxisinC


αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

2αk983042xminus xk98304222

983070


Example Solving problems on the probability simplex withexponentiated gradient methods

Consider the negative entropy (h is convex)

h(x) =

n983131

j=1

xj log xj

restricting x to the probability simplex x ge 0

n983131

j=1

xj = 1 Then

Dh(xy) =

n983131

j=1

[xj(log xj minus log yj) + (yj minus xj)] =

n983131

j=1

xj logxjyj

the entropic or Kullback-Leibler divergence


The subproblem in mirror descent is

minx

〈gx〉+n983131

j=1

xj logxjyj

st x ge 0

n983131

j=1

xj = 1

By introducing Lagrange multipliers τ isin R for the equalityconstraint and λ isin Rn

+ for the inequality constraint Then weobtain Lagrangian

L(x τλ) = 〈gx〉+n983131

j=1

983063xj log

xjyj

+ τxj minus λjxj

983064minus τ

Taking derivatives with respect to x and setting them to zero yield

xj = yj exp(minusgj minus 1minus τ + λj)


Taking λj = 0 by

n983131

j=1

xj = 1 we have (a special solution)

xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)

The update in mirror descent is

xk+1i =

xki exp(minusαkgki )

n983131

j=1

xkj exp(minusαkgkj )

This is the so-called exponentiated gradient update also known asentropic mirror descent


Example using ℓp norms (p isin (1 2]) Consider

h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1

pminus 1983042x9830422minusp

p

983045sign(x1)|x1|pminus1 middot middot middot sign(xn)|xn|pminus1

983046T

Defineφ(x) = (pminus 1)nablah(x)

and

ψ(y) = 983042y9830422minusqq

983045sign(y1)|y1|qminus1 middot middot middot sign(yn)|yn|qminus1

983046T

where q =p

pminus 1 We have ψ(φ(x)) = x and hence ψ = φminus1

and φ = ψminus1


The mirror descent for C = Rn is

xk+1 = argminxisinRn


αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070

= (nablah)minus1(nablah(xk)minus αkgk)

= ψ(φ(xk)minus αk(pminus 1)gk)

The update involving the inverse of the gradient mapping (nablah)minus1holds more generally that is for any differentiable and strictlyconvex h

This is the original form of the mirror descent update and itjustifies the name mirror descent as the gradient is ldquomirroredrdquothrough the distance-generating function h and back again


The case C = x isin Rn+

n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p

wherev = αk(pminus 1)gk minus φ(xk)

Exercise Show by KKT conditions that the solution is given byx = ψ([τeminus v]+) where τ isin R satisfies

n983131

j=1

ψj([τeminus v]+) = 1

Binary search can be used to find τ


Theorem 1

Let αk gt 0 be any sequence of non-increasing stepsizes Assume that his strongly convex with respect to 983042 middot 983042 Let 983042 middot 983042lowast denote the dual normLet xk be generated by the mirror descent iteration If Dh(xx983183) le R2

for all x isin C then for all K ge 1

K983131

k=1

[f(xk)minus f(x983183)] leR2

αK+

K983131

k=1

αk

2983042gk9830422lowast

If αk = α is constant then for all K ge 1K983131

k=1

[f(xk)minus f(x983183)] le1

αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk

f(xk) 983042g983042lowast le M for all

g isin partf(x) for x isin C then f(xK)minus f(x983183) le1

KαDh(x983183x

1) +α

2M2


Exercise Prove that the negative entropy

h(x) =

n983131

j=1

xj log xj

is 1-strongly convex with respect to the ℓ1-norm

Example Let C = x isin Rn+

983123nj=1 xj = 1 Let x1 =

1

ne The

exponentiated gradient method with fixed stepsize α guarantees

f(xK)minus f(x983183) lelog n

Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk


Example Let C sub x isin Rn 983042x9830421 le R1 Let h(x) =1

2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1

[f(xk)minus f(x983183)] le2R2

1 log(2n)

αK+

e2

2

K983131

k=1

αk983042gk9830422infin

In particular taking αk =R1

e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives

f(xK)minus f(x983183) le3eR1

983155log(2n)radicK


A simulated mirror-descent example Let

A =983045a1 middot middot middot am

983046T isin Rmtimesn

have entries drawn iid N (0 1) Let

bi =1

2(ai1 + ai2) + εi

where εi drawn iid N (0 10minus2) and n = 3000 m = 20 Define

f(x) = 983042Axminus b9830421 =m983131

i=1

|〈aix〉 minus bi|

which has subgradients ATsign(Axminus b) Consider

minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102


We compare the subgradient method to exponentiated gradientdescent for this problem noting that the Euclidean projection of avector v isin Rn to the set C has coordinates

xj = [vj minus τ ]+

where τ isin R is chosen so that

n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1

We use stepsize αk = α0radick where the initial stepsize α0 is chosen

to optimize the convergence guarantee for each of the methods


fkbest minus f983183 versus k


21 Stochastic version

Theorem 2

Let the conditions of Theorem 1 hold except that instead of receiving avector

gk isin partf(xk)

at iteration k the vector gk is a stochastic subgradient satisfying

E[gk|xk] isin partf(xk)

Then for any non-increasing stepsize sequence αk (where αk may bechosen dependent on g1 middot middot middot gk)

E

983077K983131

k=1

(f(xk)minus f(x983183))

983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078


22 Adaptive stepsizes

Let xK =1

K

K983131

k=1

xk If Dh(xx983183) le R2 for all x isin C then

E[f(xK)minus f(x983183)] le E

983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078

If αk = α for all k we see that the choice of stepsize minimizing

R2

Kα+

α

2K

K983131

k=1

983042gk9830422lowast is α983183 =radic2R

983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields

E[f(xK)minus f(x983183)] leradic2R

KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

But this α983183 is not known in advance because


Corollary 3

Let the conditions of Theorem 2 hold Let

αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast

which is the ldquoup to nowrdquo optimal choice Then

E[f(xK)minus f(x983183)] le3R

KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk


3 Variable metric methods

The idea is to adjust the metric with which one constructs updatesto better reflect problem structure

1 Choose subgradient gk isin partf(xk) or stochastic subgradient gk

satisfyingE[gk|xk] isin partf(xk)

2 Update positive semidefinite matrix Hk isin Rntimesn

3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1

2〈xminus xkHk(xminus xk)〉

983070

Special cases Subgradient method Hk =1

αkI

Newton method Hk = nabla2f(xk)


Theorem 4

Let Hk be a sequence of positive definite matrices where Hk is afunction of g1 middot middot middot gk (and potentially some additional randomness)Let gk be (stochastic) subgradients with


Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk

minus 983042xk minus x9831839830422Hkminus1

983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078


31 Adaptive gradient variable metric method

Theorem 5

LetRinfin = sup

xisinC983042xminus x983183983042infin

be the ℓinfin radius of the set C and the conditions of Theorem 4 hold Let

Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612

where α gt 0 is a pre-specified constant Then we have

E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]


Example Support vector machine classification problem

Consider

minx

f(x) =1

m

m983131

i=1

[1minus bi〈aix〉]+ st 983042x983042infin le 1

where the vectors ai isin minus1 0 1n with m = 5000 and n = 1000

For each coordinate j isin [n] we set aij isin plusmn1 to have a randomsign with probability 1j and aij = 0 otherwise

Let u isin minus1 1n uniformly at random We set bi = sign(〈aiu〉)with probability 095 and bi = minussign(〈aiu〉) otherwiseWe choose a stochastic gradient by selecting i isin [m] uniformly atrandom then set

gk isin part[1minus bi〈aixk〉]+

and αk = αradick


f(xk)minus f983183 versus k100 various initial stepsizes


f(xk)minus f983183 versus k100 best initial stepsize











Dh(xy) geλ




Update

xk+1 = argminxisinC


αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

αkDh(xx

k)

983070




h(x) =1

2983042x98304222 Dh(xy) =

1

2983042xminus y98304222


xk+1 = argminxisinC


αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

2αk983042xminus xk98304222

983070




h(x) =

n983131

j=1

xj log xj


n983131

j=1

xj = 1 Then

Dh(xy) =

n983131

j=1


n983131

j=1

xj logxjyj




minx

〈gx〉+n983131

j=1

xj logxjyj

st x ge 0

n983131

j=1

xj = 1



L(x τλ) = 〈gx〉+n983131

j=1

983063xj log

xjyj

+ τxj minus λjxj

983064minus τ




Taking λj = 0 by

n983131

j=1


xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)


xk+1i =


n983131

j=1





h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick








h(x) =1

2983042x98304222 Dh(xy) =

1

2983042xminus y98304222


xk+1 = argminxisinC


αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

αkDh(xx

k)

983070

= argminxisinC

983069〈gkx〉+ 1

2αk983042xminus xk98304222

983070




h(x) =

n983131

j=1

xj log xj


n983131

j=1

xj = 1 Then

Dh(xy) =

n983131

j=1


n983131

j=1

xj logxjyj




minx

〈gx〉+n983131

j=1

xj logxjyj

st x ge 0

n983131

j=1

xj = 1



L(x τλ) = 〈gx〉+n983131

j=1

983063xj log

xjyj

+ τxj minus λjxj

983064minus τ




Taking λj = 0 by

n983131

j=1


xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)


xk+1i =


n983131

j=1





h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick








h(x) =

n983131

j=1

xj log xj


n983131

j=1

xj = 1 Then

Dh(xy) =

n983131

j=1


n983131

j=1

xj logxjyj




minx

〈gx〉+n983131

j=1

xj logxjyj

st x ge 0

n983131

j=1

xj = 1



L(x τλ) = 〈gx〉+n983131

j=1

983063xj log

xjyj

+ τxj minus λjxj

983064minus τ




Taking λj = 0 by

n983131

j=1


xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)


xk+1i =


n983131

j=1





h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







minx

〈gx〉+n983131

j=1

xj logxjyj

st x ge 0

n983131

j=1

xj = 1



L(x τλ) = 〈gx〉+n983131

j=1

983063xj log

xjyj

+ τxj minus λjxj

983064minus τ




Taking λj = 0 by

n983131

j=1


xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)


xk+1i =


n983131

j=1





h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick






Taking λj = 0 by

n983131

j=1


xi =yi exp(minusgi)

n983131

j=1

yj exp(minusgj)


xk+1i =


n983131

j=1





h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







h(x) =1

2(pminus 1)983042x9830422p

We have

nablah(x) =1


p


983046T


and

ψ(y) = 983042y9830422minusqq


983046T

where q =p


and φ = ψminus1





αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick









αkDh(xx

k)

983070

= argminxisinRn

983069〈gkx〉+ 1

αkDh(xx

k)

983070







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







n983131

j=1

xj = 1


minxisinC

〈vx〉+ 1

2983042x9830422p



n983131

j=1




Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick






Theorem 1



K983131

k=1


αK+

K983131

k=1

αk

2983042gk9830422lowast


k=1


αDh(x983183x

1) +α

2

K983131

k=1

983042gk9830422lowast

If xK =1

K

K983131

k=1

xk or xK = argminxk



KαDh(x983183x

1) +α

2M2



h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







h(x) =

n983131

j=1

xj log xj



983123nj=1 xj = 1 Let x1 =

1

ne The



Kα+

α

2K

K983131

k=1

983042gk9830422infin

where xK =1

K

K983131

k=1

xk



2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







2(pminus 1)983042x9830422p

with p = 1 +1

log(2n) Then

K983131

k=1


1 log(2n)

αK+

e2

2

K983131

k=1



e

983157log(2n)

kand xK =

1

K

K983131

k=1

xk gives


983155log(2n)radicK






bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick










bi =1

2(ai1 + ai2) + εi


f(x) = 983042Axminus b9830421 =m983131

i=1



minx

f(x) st x isin C =

983099983103

983101x isin Rn+

n983131

j=1

xj = 1

983100983104

983102





n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick









n983131

j=1

xj =

n983131

j=1

[vj minus τ ]+ = 1







Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick









Theorem 2


gk isin partf(xk)




E

983077K983131

k=1


983078le E

983077R2

αK+

K983131

k=1

αk

2983042gk9830422lowast

983078



Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







Let xK =1

K

K983131

k=1



983077R2

KαK+

1

K

K983131

k=1

αk

2983042gk9830422lowast

983078


R2

Kα+

α

2K

K983131

k=1


983075K983131

k=1

983042gk9830422lowast

983076minus 12

which yields


KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096



Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick






Corollary 3


αk =R983161983160983160983159

k983131

i=1

983042gi9830422lowast



KE

983093

983095983075

K983131

k=1

983042gk9830422lowast

983076 12

983094

983096

where

xK =1

K

K983131

k=1

xk







3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick











3 Compute update

xk+1 = argminxisinC

983069〈gkx〉+ 1


983070


αkI



Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick






Theorem 4



Then

E

983077K983131

k=1


983078

le 1

2E

983077K983131

k=2

983059983042xk minus x9831839830422Hk


983060+ 983042x1 minus x9831839830422H1

983078

+1

2E

983077K983131

k=1

983042gk9830422Hminus1

k

983078



Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







Theorem 5

LetRinfin = sup



Hk =1

α

983075diag

983075k983131

i=1

gigiT

98307698307612


E

983077K983131

k=1


983078le

983061R2

infin2

+ α2

983062E[tr(HK)]



Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick







Consider

minx

f(x) =1

m

m983131

i=1






and αk = αradick






lecture 11: mirror descent and variable metric...

Documents