an online learning view via a projections' path in the sparse-landicore/docs/sergios... ·...

Post on 26-Aug-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Online Learning View via a Projections’ Path in theSparse-land

Sergios Theodoridis1

1Dept. of Informatics and Telecommunications, National and Kapodistrian Universityof Athens, Athens, Greece.

Workshop on Sparse Signal ProcessingFriday, Sep. 16, 2016

Joint work withP. Bouboulis, S. Chouvardas, Y. Kopsinis, G. Papageorgiou, K. Slavakis

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 1/58

Sparsity

Sparse Modeling

• Sparse modeling has been a major focus of research effort overthe last decade or so.

• Sparsity promoting regularization of cost functions copes with:

Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.

Promote zeros when the underlying models have manynear-to-zero values.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58

Sparsity

Sparse Modeling

• Sparse modeling has been a major focus of research effort overthe last decade or so.

• Sparsity promoting regularization of cost functions copes with:

Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.

Promote zeros when the underlying models have manynear-to-zero values.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58

Sparsity

Sparse Modeling

• Sparse modeling has been a major focus of research effort overthe last decade or so.

• Sparsity promoting regularization of cost functions copes with:

Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.

Promote zeros when the underlying models have manynear-to-zero values.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58

Sparsity

Sparse Modeling

• Sparse modeling has been a major focus of research effort overthe last decade or so.

• Sparsity promoting regularization of cost functions copes with:

Ill conditioning-overfitting when solving inverse problems; Learningfrom data is an instance of inverse problems.

Promote zeros when the underlying models have manynear-to-zero values.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 2/58

Sparse Modeling

The need for sparse Models: Two examples

• Compression

• Echo Cancelation

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 3/58

Sparse Modeling

The Generic Model

OUTPUT=INPUT× SPARSE MODEL+NOISE

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 4/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

The Regression Model

• A generic model that covers a large class of problems (Filtering,Prediction)

yn = uTna∗ + vn

a∗ ∈ RL, is the unknown vector.un ∈ RL, is the incoming signal (sensing vectors).yn ∈ R, is the observed signal (measurements).vn is the additive noise process.

• a∗ is assumed to be sparse. That is, only a few, K << L, of itscomponents are nonzero

a∗ = [0, 0, ?︸︷︷︸1

, 0, . . . , 0, ?︸︷︷︸2

, 0, 0, . . . , 0, ?︸︷︷︸K

, 0, . . . , 0]T

• In its simplest formulation the task comprises the estimation ofa∗, based on a set of measurements (yn,un), n = 1 . . . N .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 5/58

Sparse Modeling

Dictionary Learning

• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.

[y1, . . . ,yN ]︸ ︷︷ ︸L×N

= [u1, . . . ,um]︸ ︷︷ ︸L×m

[a1, . . . ,aN ]︸ ︷︷ ︸m×N

, m>L

Y = UA

yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:

yn =

m∑i=1

uiani

where, an, n = 1, 2, . . . , N , sparse vectors.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58

Sparse Modeling

Dictionary Learning

• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.

[y1, . . . ,yN ]︸ ︷︷ ︸L×N

= [u1, . . . ,um]︸ ︷︷ ︸L×m

[a1, . . . ,aN ]︸ ︷︷ ︸m×N

, m>L

Y = UA

yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:

yn =

m∑i=1

uiani

where, an, n = 1, 2, . . . , N , sparse vectors.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58

Sparse Modeling

Dictionary Learning

• This is a powerful tool in analysing signals in terms ofovercomplete basis vectors.

[y1, . . . ,yN ]︸ ︷︷ ︸L×N

= [u1, . . . ,um]︸ ︷︷ ︸L×m

[a1, . . . ,aN ]︸ ︷︷ ︸m×N

, m>L

Y = UA

yn,∈ RL n = 1, 2, . . . , N , are the observation vectors.ui ∈ RL, i = 1, 2, . . . ,m, are the unknown atoms of thedictionary.an ∈ Rm, n = 1, 2, . . . , N , are the vectors of the unknownweights, corresponding in the respective expansion of the nthinput vector:

yn =

m∑i=1

uiani

where, an, n = 1, 2, . . . , N , sparse vectors.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 6/58

Sparse Modeling

Low Rank Matrix Factorization

• This task is at the heart of dimensionality reduction.

Y = UA

=

r∑i=1

uiaTi

[y1, . . . ,yN ]︸ ︷︷ ︸L×N

= [u1, . . . ,ur]︸ ︷︷ ︸L×r

aT1...

arT

︸ ︷︷ ︸

r×N

• r < N .

• PCA performs low rank matrix factorization, by imposing sparsityon the singular values as well as orthogonality on U .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 7/58

Sparse Modeling

Low Rank Matrix Factorization

• This task is at the heart of dimensionality reduction.

Y = UA

=

r∑i=1

uiaTi

[y1, . . . ,yN ]︸ ︷︷ ︸L×N

= [u1, . . . ,ur]︸ ︷︷ ︸L×r

aT1...

arT

︸ ︷︷ ︸

r×N

• r < N .

• PCA performs low rank matrix factorization, by imposing sparsityon the singular values as well as orthogonality on U .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 7/58

Sparse Modeling

Low Rank Matrix Factorization

• Matrix Completion is a special constrained version of low rankmatrix factorization

• Y has missing elements and the lower rank matrix factorization isconstrained to provide the non-missing elements at the respectivepositions

Y =

∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗...

......

......

...∗ ∗ ∗ ∗ ∗ ∗

=

r∑i=1

uiaTi

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 8/58

Sparse Modeling

Low Rank Matrix Factorization

• Matrix Completion is a special constrained version of low rankmatrix factorization

• Y has missing elements and the lower rank matrix factorization isconstrained to provide the non-missing elements at the respectivepositions

Y =

∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗...

......

......

...∗ ∗ ∗ ∗ ∗ ∗

=

r∑i=1

uiaTi

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 8/58

Sparse Modeling

Low Rank Matrix Factorization

• Robust PCA is another special constrained version of low rankmatrix factorization.

Y = L+ V

L is a low rank matrix and V is a sparse matrix. The latter

models OUTLIER NOISE. Being outlier is sparse.

• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58

Sparse Modeling

Low Rank Matrix Factorization

• Robust PCA is another special constrained version of low rankmatrix factorization.

Y = L+ V

L is a low rank matrix and V is a sparse matrix. The latter

models OUTLIER NOISE. Being outlier is sparse.

• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58

Sparse Modeling

Low Rank Matrix Factorization

• Robust PCA is another special constrained version of low rankmatrix factorization.

Y = L+ V

L is a low rank matrix and V is a sparse matrix. The latter

models OUTLIER NOISE. Being outlier is sparse.

• The goal of the task is to obtain estimates L and V by imposingsparsity on the singular values of Y as well as on the elements ofV , constrained so that Y = L+ V .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 9/58

Sparse Modeling

Robust Regression

• Robust Regression is an old problem, with a major impact comingfrom the works of Huber. The revival of interest is due to a newlook via sparsity-aware learning techniques. For example, thenoise may comprise a few large values (outliers) on top of theGaussian component. Since the large values are only a few, theycan be treated via sparse modeling arguments.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 10/58

Sparse Regression Modeling

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch Learning Problem

Linear Regression Model yn = uTna∗ + vn

• U := [u1,u2, . . . ,uN ]T ∈ RN×L

• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .

Batch Formulation: y = Ua∗ + v

=

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58

Sparse Regression Modeling

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch Learning Problem

Linear Regression Model yn = uTna∗ + vn

• U := [u1,u2, . . . ,uN ]T ∈ RN×L

• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .

Batch Formulation: y = Ua∗ + v

=

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58

Sparse Regression Modeling

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch Learning Problem

Linear Regression Model yn = uTna∗ + vn

• U := [u1,u2, . . . ,uN ]T ∈ RN×L

• y := [y1, y2, . . . , yN ]T ∈ RN , and v := [v1, v2, . . . , vN ]T ∈ RN .

Batch Formulation: y = Ua∗ + v

=

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 11/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

========

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

========

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

========

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

=

=======

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

===

=====

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

======

==

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Estimating the unknown

There are two paths that lead to the “truth”, e.g, obtain an estimate aof the unknown a∗.

Batch vs Online Learning

Batch formulation: y = Ua∗ + v

Online Formulation: yn = uTna∗ + vn,

obtain an estimate, an, after (yn,un) has been received

========

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 12/58

Sparse Vs Online Learning

Sparsity-promoting Batch algorithm(Compressed Sensing)

• Are mobilized after a finitenumber of data, (un, yn)

N−1n=0 , is

collected.

• For any new datum, theestimation of a∗, is repeatedfrom scratch.

• Computational complexity mightbecome prohibitive.

• Excessive storage demands.

• It is a “mature” research fieldwith a diverse number oftechniques and applications.

Sparsity-promoting Online algorithms

• Infinite number of data.

• For any new datum, the estimateof a∗ is updated dynamically.

• Cases of time-varying a∗ are“naturally” handled.

• Low complexity is required forstreaming applications.

• Fast convergence / Tracking.

• Large potential in Big Dataapplications

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58

Sparse Vs Online Learning

Sparsity-promoting Batch algorithm(Compressed Sensing)

• Are mobilized after a finitenumber of data, (un, yn)

N−1n=0 , is

collected.

• For any new datum, theestimation of a∗, is repeatedfrom scratch.

• Computational complexity mightbecome prohibitive.

• Excessive storage demands.

• It is a “mature” research fieldwith a diverse number oftechniques and applications.

Sparsity-promoting Online algorithms

• Infinite number of data.

• For any new datum, the estimateof a∗ is updated dynamically.

• Cases of time-varying a∗ are“naturally” handled.

• Low complexity is required forstreaming applications.

• Fast convergence / Tracking.

• Large potential in Big Dataapplications

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58

Sparse Vs Online Learning

Sparsity-promoting Batch algorithm(Compressed Sensing)

• Are mobilized after a finitenumber of data, (un, yn)

N−1n=0 , is

collected.

• For any new datum, theestimation of a∗, is repeatedfrom scratch.

• Computational complexity mightbecome prohibitive.

• Excessive storage demands.

• It is a “mature” research fieldwith a diverse number oftechniques and applications.

Sparsity-promoting Online algorithms

• Infinite number of data.

• For any new datum, the estimateof a∗ is updated dynamically.

• Cases of time-varying a∗ are“naturally” handled.

• Low complexity is required forstreaming applications.

• Fast convergence / Tracking.

• Large potential in Big Dataapplications

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 13/58

Sparsity-Promoting Methods

`0-norm constrained minimization

• `0 (pseudo) norm minimization: NP-hard nonconvex task.

• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε

• The above is carried out via greedy-type algorithmic arguments.

Constrained Least Squares Estimation: Three equivalent formulations

• a := argmina∈Rl

{‖y − Ua‖22 + λ‖a‖1

}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ

• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of

nonzero elements) that retains its convex nature.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58

Sparsity-Promoting Methods

`0-norm constrained minimization

• `0 (pseudo) norm minimization: NP-hard nonconvex task.

• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε

• The above is carried out via greedy-type algorithmic arguments.

Constrained Least Squares Estimation: Three equivalent formulations

• a := argmina∈Rl

{‖y − Ua‖22 + λ‖a‖1

}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ

• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of

nonzero elements) that retains its convex nature.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58

Sparsity-Promoting Methods

`0-norm constrained minimization

• `0 (pseudo) norm minimization: NP-hard nonconvex task.

• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε

• The above is carried out via greedy-type algorithmic arguments.

Constrained Least Squares Estimation: Three equivalent formulations

• a := argmina∈Rl

{‖y − Ua‖22 + λ‖a‖1

}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ

• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of

nonzero elements) that retains its convex nature.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58

Sparsity-Promoting Methods

`0-norm constrained minimization

• `0 (pseudo) norm minimization: NP-hard nonconvex task.

• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε

• The above is carried out via greedy-type algorithmic arguments.

Constrained Least Squares Estimation: Three equivalent formulations

• a := argmina∈Rl

{‖y − Ua‖22 + λ‖a‖1

}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ

• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of

nonzero elements) that retains its convex nature.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58

Sparsity-Promoting Methods

`0-norm constrained minimization

• `0 (pseudo) norm minimization: NP-hard nonconvex task.

• a : mina∈Rl ‖a‖0, s.t. ‖y − Ua‖22 ≤ ε

• The above is carried out via greedy-type algorithmic arguments.

Constrained Least Squares Estimation: Three equivalent formulations

• a := argmina∈Rl

{‖y − Ua‖22 + λ‖a‖1

}• a : mina∈Rl ‖y − Ua‖22, s.t. ‖a‖1 ≤ ρ

• a : mina∈Rl ‖a‖1, s.t. ‖y − Ua‖22 ≤ ε• Why `1 norm: It is the “closest” to `0 “norm” (number of

nonzero elements) that retains its convex nature.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 14/58

Sparsity-Promoting Methods

Hard and Soft thresholding

• The `1 norm is associated with a soft thresholding operation on therespective coefficients. This is a continuous function operation, butit adds bias even for the large values. On the other hand, hardthresholding is a discontinuous one.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 15/58

Batch Penalized Least-Squares Estimator

Penalized Least-Squares - General Case

mina∈RL

{1

2‖y −Ua‖22 + λ

L∑i=1

p(|ai|)

}• p(·), sparsity-promoting penalty function,

• λ, regularization parameter.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 16/58

Batch Penalized Least-Squares Estimator

Penalized Least-Squares - General Case

mina∈RL

{1

2‖y −Ua‖22 + λ

L∑i=1

p(|ai|)

}• p(·), sparsity-promoting penalty function,

• λ, regularization parameter.

Examples: Penalty functions

• p(|ai|) := |ai|γ , ∀ai ∈ R

• p(|ai|) = λ(1− e−β|ai|

)• p(|ai|) := λ

log(γ+1) log(γ|ai|+ 1), ∀ai ∈ R

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 16/58

Online Sparsity-Promoting Methods

Penalized Recursive LS

mina∈RL

{1

2

N∑n=1

βN−ne2n + λL∑i=1

p(|ai|)

},

rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1

an+1 = f(rn+1,Rn+1)

• It Works!

• Complexity O(L2)

• Regularization parameter needs fine tuning

• [Angelosante, Bazerque and Giannakis, 2010]

• [Eksioglu and Tanc, 2011]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58

Online Sparsity-Promoting Methods

Penalized Recursive LS

mina∈RL

{1

2

N∑n=1

βN−ne2n + λL∑i=1

p(|ai|)

},

rN :=

N∑n=1

βN−nynun, RN :=

N∑n=1

βN−nunuTn

rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1

an+1 = f(rn+1,Rn+1)

• It Works!

• Complexity O(L2)

• Regularization parameter needs fine tuning

• [Angelosante, Bazerque and Giannakis, 2010]

• [Eksioglu and Tanc, 2011]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58

Online Sparsity-Promoting Methods

Penalized Recursive LS

mina∈RL

{1

2

N∑n=1

βN−ne2n + λL∑i=1

p(|ai|)

},

rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1

an+1 = f(rn+1,Rn+1)

• It Works!

• Complexity O(L2)

• Regularization parameter needs fine tuning

• [Angelosante, Bazerque and Giannakis, 2010]

• [Eksioglu and Tanc, 2011]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58

Online Sparsity-Promoting Methods

Penalized Recursive LS

mina∈RL

{1

2

N∑n=1

βN−ne2n + λL∑i=1

p(|ai|)

},

rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1

an+1 = f(rn+1,Rn+1)

• It Works!

• Complexity O(L2)

• Regularization parameter needs fine tuning

• [Angelosante, Bazerque and Giannakis, 2010]

• [Eksioglu and Tanc, 2011]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58

Online Sparsity-Promoting Methods

Penalized Recursive LS

mina∈RL

{1

2

N∑n=1

βN−ne2n + λL∑i=1

p(|ai|)

},

rn+1 = βrn + yn+1un+1, Rn+1 = βRn + un+1uTn+1

an+1 = f(rn+1,Rn+1)

• It Works!

• Complexity O(L2)

• Regularization parameter needs fine tuning

• [Angelosante, Bazerque and Giannakis, 2010]

• [Eksioglu and Tanc, 2011]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 17/58

Online Sparsity-Promoting Methods

Penalized stochastic gradient descent: LMS type

mina∈RL

{1

2e2n + λ

L∑i=1

p(|ai|)

}

an+1 = an + µen(a)un − µλf(an)

f(an) =

[∂p(|an,1|)∂an,1

,∂p(|an,2|)∂an,2

, . . . ,∂p(|an,L|)∂an,L

]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning

• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58

Online Sparsity-Promoting Methods

Penalized stochastic gradient descent: LMS type

mina∈RL

{1

2e2n + λ

L∑i=1

p(|ai|)

}

an+1 = an + µen(a)un − µλf(an)

f(an) =

[∂p(|an,1|)∂an,1

,∂p(|an,2|)∂an,2

, . . . ,∂p(|an,L|)∂an,L

]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning

• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58

Online Sparsity-Promoting Methods

Penalized stochastic gradient descent: LMS type

mina∈RL

{1

2e2n + λ

L∑i=1

p(|ai|)

}

an+1 = an + µen(a)un − µλf(an)

f(an) =

[∂p(|an,1|)∂an,1

,∂p(|an,2|)∂an,2

, . . . ,∂p(|an,L|)∂an,L

]T• Complexity O(L)• It Works! (when it is compared to standard LMS)• Slow convergence• Regularization parameter needs fine tuning

• [Chen, Gu and Hero, 2009]• [Mileounis, Babadi, Kalouptsidis and Tarokh, 2010]• [Wang and Gu, 2012]

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 18/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Metric Projection Mapping

Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Metric Projection Mapping

Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Metric Projection Mapping

Metric Projection is the mappingPC : RL → C : a 7→ PC(a) := a∗.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The main concept

A descendent of POCS

Projection onto a Closed Convex Set

Let C be a closed convex set in RL. Then, for each a ∈ RL there exists aunique a∗ ∈ C such that

‖a− a∗‖ = ming∈C‖a− g‖.

Relaxed Projection Mapping

The relaxed Projection is the mappingTC(a) := a+ µ(PC(a)− a),µ ∈ (0, 2),∀a ∈ RL.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 19/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

PC1(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

PC1(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

PC2PC1

(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

PC1PC2

PC1(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

PC2PC1

PC2PC1

(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

The POCS: Finite number of Convex Sets [Von Neumann ’33], [Bregman’65], [Gubin, Polyak, Raik ’67]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂qi=1 Ci 6= ∅, let

their associated projection mappings be PC1, . . . , PCq

. For any a ∈ RL, definethe sequence of projections:

· · ·PC2PC1

PC2PC1

(a).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 20/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

The Set-Theoretic Estimation Approach

Convex Combination of Projection Mappings [Pierra ’84]

Given a finite number of closed convex sets C1, . . . , Cq, with⋂q

i=1 Ci 6= ∅, let theirassociated projection mappings be PC1 , . . . , PCq . Let also a set of positive constantsw1, . . . , wq such that

∑qi=1 wi = 1. Then for any a0, the sequence

an+1 = an + µn(

q∑i=1

wiPCi(an)︸ ︷︷ ︸Convex combination of projections

−an), ∀n,

converges weakly to a point a∗ in⋂q

i=1 Ci,where µn ∈ (ε,Mn), for ε ∈ (0, 1), and

Mn :=∑q

i=1 wi‖PCi(an)−an‖2

‖∑q

i=1 wiPCi(an)−an‖2

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 21/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs

1

1

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs

2

2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs

3

3

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs4

4

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

Find a point in the intersection of all the hyperslabs5

5

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Set-Theoretic Estimation: The Online Case Approach

Constructing the Convex Sets

For each received set of measurements (training pairs) (un, yn), construct ahyperslab:

Sn[ε] :={a ∈ RL : |uTna− yn| ≤ ε

}

Solution

[Yamada 2001], [Yamada, Slavakis, Yamada 2002], [Yamada, Ogura 2004],[Slavakis, Yamada Ogura 2006].

[Chouvardas, Slavakis, Theodoridis, Yamada, 2013]: Under the assumption ofBounded noise it converges with probability 1 arbitrarily close to the truemodel.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 22/58

Adaptive Projection Subgradient Method (APSM)

The Algorithm

an+1 := an + µn

n∑i=n−q+1

ω(n)i

(PSn[ε](an)− an

)Projection onto Hyperslab

PSn[ε](a) = a+

yn−ε−uT

na‖un‖2 un, if yn − ε > uTna

0, if |uTna− yn| ≤ εyn+ε−uT

na‖un‖2 un, if yn + ε < uTna

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 23/58

Adaptive Projection Subgradient Method (APSM)

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58

Adaptive Projection Subgradient Method (APSM)

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58

Adaptive Projection Subgradient Method (APSM)

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58

Adaptive Projection Subgradient Method (APSM)

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 24/58

APSM under the `1 ball constraint

The `1-ball case

Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .

‖a‖1 ≤ δ.

The recursion:

an+1 := PB`1[δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

,

converges to

a∗ ∈ B`1 [δ] ∩

⋂n≥n0

Sn[ε]

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58

APSM under the `1 ball constraint

The `1-ball case

Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .

‖a‖1 ≤ δ.

The recursion:

an+1 := PB`1[δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

,

converges to

a∗ ∈ B`1 [δ] ∩

⋂n≥n0

Sn[ε]

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58

APSM under the `1 ball constraint

The `1-ball case

Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .

‖a‖1 ≤ δ.

The recursion:

an+1 := PB`1[δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

,

converges to

a∗ ∈ B`1 [δ] ∩

⋂n≥n0

Sn[ε]

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58

APSM under the `1 ball constraint

The `1-ball case

Given (un, yn), n = 0, 1, 2, . . ., find a such that∣∣aTun − yn∣∣ ≤ ε, n = 0, 1, 2, . . .

‖a‖1 ≤ δ.

The recursion:

an+1 := PB`1[δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

,

converges to

a∗ ∈ B`1 [δ] ∩

⋂n≥n0

Sn[ε]

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 25/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the `1 ball constraint

Geometric illustration example

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 26/58

APSM under the weighted `1 ball constraint

The weighted `1-ball case:

• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.

• Definition:

‖a‖1,w :=

L∑i=1

wi|ai|.

• Time-adaptive weighted norm:

wn,i :=1

|an,i|+ ε′n.

• A time varying constraint case.

• The recursion:

an+1 := PB`1[wn,δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58

APSM under the weighted `1 ball constraint

The weighted `1-ball case:

• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.

• Definition:

‖a‖1,w :=

L∑i=1

wi|ai|.

• Time-adaptive weighted norm:

wn,i :=1

|an,i|+ ε′n.

• A time varying constraint case.

• The recursion:

an+1 := PB`1[wn,δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58

APSM under the weighted `1 ball constraint

The weighted `1-ball case:

• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.

• Definition:

‖a‖1,w :=

L∑i=1

wi|ai|.

• Time-adaptive weighted norm:

wn,i :=1

|an,i|+ ε′n.

• A time varying constraint case.

• The recursion:

an+1 := PB`1[wn,δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58

APSM under the weighted `1 ball constraint

The weighted `1-ball case:

• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.

• Definition:

‖a‖1,w :=

L∑i=1

wi|ai|.

• Time-adaptive weighted norm:

wn,i :=1

|an,i|+ ε′n.

• A time varying constraint case.

• The recursion:

an+1 := PB`1[wn,δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58

APSM under the weighted `1 ball constraint

The weighted `1-ball case:

• Convergence can be significantly speeded up if `1-ball, is replacedby the weighted `1 ball.

• Definition:

‖a‖1,w :=

L∑i=1

wi|ai|.

• Time-adaptive weighted norm:

wn,i :=1

|an,i|+ ε′n.

• A time varying constraint case.

• The recursion:

an+1 := PB`1[wn,δ]

an + µn

n∑j=n−q+1

ω(n)j PSj [ε](an)− an

.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 27/58

APSM under the weighted `1 ball constraint

Geometric illustration example

_

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58

APSM under the weighted `1 ball constraint

Geometric illustration example

_

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58

APSM under the weighted `1 ball constraint

Geometric illustration example

_

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58

APSM under the weighted `1 ball constraint

Geometric illustration example

_

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 28/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

APSM under the weighted `1 ball constraint

Convergence of the Scheme

• Does this scheme converge?Note that our constraint, i.e., the weighted `1-ball is atime-varying constraint.Remark: This case was not covered by the existing theory.

0

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 29/58

Simulation Examples

Example: Time-invariant signal sparse in wavelet domain

L := 1024, ‖a∗‖0 := 100 wavelet coefficients. The radius of the `1-ball is set to δ := 101.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 30/58

Simulation Examples

Example: Time varying signal compressible in wavelet domain

L := 4096.The sum of two chirp signals.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 31/58

Simulation Examples

Example: Time varying signal compressible in wavelet domain

L := 4096. The radius of the `1-ball is set to δ := 40.

Movies of the OCCD, and the APWL1sub.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 32/58

Generalized Thresholding Rules

Thresholding rules associated with non-convex penalty functions

• Penalized LS thresholding operators:

mina

1

2(a− a)2 + λp(|a|)

• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)

• Under some general conditions it has a unique solution [Antoniadis2007].

• PLSTO basically defines a mapping

a 7→ mina

1

2(a− a)2 + λp(|a|)

which corresponds to a Shrinkage operator.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58

Generalized Thresholding Rules

Thresholding rules associated with non-convex penalty functions

• Penalized LS thresholding operators:

mina

1

2(a− a)2 + λp(|a|)

• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)

• Under some general conditions it has a unique solution [Antoniadis2007].

• PLSTO basically defines a mapping

a 7→ mina

1

2(a− a)2 + λp(|a|)

which corresponds to a Shrinkage operator.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58

Generalized Thresholding Rules

Thresholding rules associated with non-convex penalty functions

• Penalized LS thresholding operators:

mina

1

2(a− a)2 + λp(|a|)

• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)

• Under some general conditions it has a unique solution [Antoniadis2007].

• PLSTO basically defines a mapping

a 7→ mina

1

2(a− a)2 + λp(|a|)

which corresponds to a Shrinkage operator.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58

Generalized Thresholding Rules

Thresholding rules associated with non-convex penalty functions

• Penalized LS thresholding operators:

mina

1

2(a− a)2 + λp(|a|)

• p(·): nonnegative, nondecreasing and differentiable function on(0,∞)

• Under some general conditions it has a unique solution [Antoniadis2007].

• PLSTO basically defines a mapping

a 7→ mina

1

2(a− a)2 + λp(|a|)

which corresponds to a Shrinkage operator.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 33/58

Generalized Thresholding Rules

Examples: Penalty functions

• p(|a|) := |a|γ , ∀a ∈ R

• p(|a|) = λ(1− e−β|a|

)• p(|a|) := λ

log(γ+1) log(γ|a|+ 1), ∀a ∈ R

Examples: Penalized Least-Squares Thresholding Operators

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 34/58

Generalized Thresholding Rules

Examples: Penalty functions

• p(|a|) := |a|γ , ∀a ∈ R

• p(|a|) = λ(1− e−β|a|

)• p(|a|) := λ

log(γ+1) log(γ|a|+ 1), ∀a ∈ R

Examples: Penalized Least-Squares Thresholding Operators

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 34/58

Generalized Thresholding Rules

Generalized Thresholding (GT) operator: Definition:

For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:

∀l ∈ 1, L, zl :=

{al, If, al is one of the largest K components,

shr(al), otherwise

Shrinkage Function (Shr)

• τshr(τ) ≥ 0, ∀τ ∈ R.

• shr acts as a strict shrinkage operator over all intervals which do notinclude 0.

• Any arbitrary function inline with the properties above can be used.

• All the penalized Least-Squares thresholding operators are included.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58

Generalized Thresholding Rules

Generalized Thresholding (GT) operator: Definition:

For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:

∀l ∈ 1, L, zl :=

{al, If, al is one of the largest K components,

shr(al), otherwise

Shrinkage Function (Shr)

• τshr(τ) ≥ 0, ∀τ ∈ R.

• shr acts as a strict shrinkage operator over all intervals which do notinclude 0.

• Any arbitrary function inline with the properties above can be used.

• All the penalized Least-Squares thresholding operators are included.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58

Generalized Thresholding Rules

Generalized Thresholding (GT) operator: Definition:

For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:

∀l ∈ 1, L, zl :=

{al, If, al is one of the largest K components,

shr(al), otherwise

In words

• Choose the largest K components of the estimate.

• The rest are shrunk according to the shrinkage rule.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58

Generalized Thresholding Rules

Generalized Thresholding (GT) operator: Definition:

For any a ∈ RL, z := T(K)GT (a) is obtained coordinate-wise:

∀l ∈ 1, L, zl :=

{al, If, al is one of the largest K components,

shr(al), otherwise

Examples: Generalized Thresholding (GT) operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 35/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*P ( )

P ( )

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*P ( )

P ( )

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

**

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

***

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

=0

**

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

=0shr( )

**

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GeneralizedThresholding (APGT)

The Algorithm

an+1 := Tn

an + µn

n∑i=n−q+1

ω(n)i (P (an)− an)

• Each piece of a-priori information, is also represented by a set

*

=0

=0

Thresholding Operator

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 36/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.∀x ∈ RL,∃Yx ⊂ Fix(T ) : ∀y ∈ Yx,‖T (x)− y‖ ≤ ‖x− y‖

• The fixed point set of GT is a union ofsubspaces (non-convex).

Examples: Union of Subspaces for s = 2

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 37/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.

• The fixed point set of GT is a union ofsubspaces (non-convex).

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.

• The fixed point set of GT is a union ofsubspaces (non-convex).

• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:

The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set

of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.

• The fixed point set of GT is a union ofsubspaces (non-convex).

• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:

The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set

of cluster points is nonempty,

each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.

• The fixed point set of GT is a union ofsubspaces (non-convex).

• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:

The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set

of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,

the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58

Adaptive Projection-Based Algorithm With GT (APGT)

Convergence of APGT

• Partially Quasi-nonexpansive Mapping.

• The fixed point set of GT is a union ofsubspaces (non-convex).

• It has been shown [Slavakis, Kopsinis, Theodoridis, McLaughlin, 2013]:

The algorithm leads to a sequence of estimates (a)n∈Z≥0whose set

of cluster points is nonempty,each one of the cluster points is guaranteed to be, at most, s-sparse,the solution is located arbitrarily close to an intersection of aninfinite number of hyperslabs.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 38/58

Simulation Examples

Example: Time-varying case exhibiting an abrupt change

L := 1024, s = 100 (up to n = 1500, and s = 110 afterwards)

APGT:O(qL+ qK)OSCD: O(L2)IPAPA: O(q3)

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 39/58

Simulation Examples

Example: Sparse system identification with colored input

L := 600, s = 60, AR input (cond ' 100) .

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 40/58

Bibliography• I. Yamada and N. Ogura. Adaptive Projected Subgradient Method for asymptotic minimization of sequence of

nonnegative convex functions. Numerical Functional Analysis and Optimization, 25(7&8), 2004.• K. Slavakis, I. Yamada, and N. Ogura. The adaptive projected subgradient method over the fixed point set of

strongly attracting nonexpansive mappings. Numerical Functional Analysis and Optimization, 27 (7&8), Nov 2006.• S. Theodoridis, K. Slavakis, and I. Yamada, “Adaptive learning in a world of projections: a unifying framework for

linear and nonlinear classification and regression tasks,” ,” IEEE Trans. Signal Proc., vol. 28, Jan. 2011.• K. Slavakis and I. Yamada. The adaptive projected subgradient method constrained by families of quasi-

nonexpansive mappings and its application to online learning. SIAM Journal on Optimization, vol. 23, no. 1, 2013.• Y. Kopsinis, K. Slavakis, and S. Theodoridis, “Online sparse system identification and signal reconstruction using

projections onto weighted `1 balls,” IEEE Trans. Signal Proc., vol. 59, Mar. 2011.• S. Chouvardas, K. Slavakis, and S. Theodoridis. Adaptive robust distributed learning in diffusion sensor networks.

IEEE Transactions on Signal Processing, vol. 59, Oct. 2011.• S. Chouvardas, K. Slavakis, Y. Kopsinis, and S. Theodoridis. A sparsity promoting adaptive algorithm for

distributed learning. IEEE Transactions on Signal Processing, vol. 60, Oct. 2012.• K. Slavakis, Y. Kopsinis, S. Theodoridis, and S. McLaughlin. Generalized thresholding and online sparsity-aware

learning in a union of subspaces. Accepted for publication in the IEEE Transactions on Signal Processing, 2013• S. Chouvardas, K. Slavakis, S. Theodoridis, and I. Yamada. Stochastic analysis of hyperslab-based adaptive

projected subgradient method under bounded noise. IEEE Signal Processing Letters, vol. 20, 2013.• Y. Kopsinis, K. Slavakis, S. Theodoridis, “Thresholding-Based Online Algorithms of Complexity Comparable to

Sparse LMS methods,” Under preparation (A part submitted to ISCAS 2013),

• D. Angelosante, J. A. Bazerque, and G. B. Giannakis, ‘Online Adaptive Estimation of Sparse Signals: Where RLSMeets the l1-Norm’, IEEE Transactions on Signal Processing, vol. 58, Jul. 2010.

• E. M. Eksioglu and A. K. Tanc, ‘RLS Algorithm With Convex Regularization’, IEEE Signal Processing Letters, vol.18, Aug. 2011.

• Y. Chen, Y. Gu, and A. O. Hero, ‘Sparse LMS for system identification’, in IEEE International Conference onAcoustics, Speech and Signal Processing, ICASSP 2009.

• X. Wang and Y. Gu, ‘Proof of Convergence and Performance Analysis for Sparse Recovery via Zero-PointAttracting Projection’, IEEE Transactions on Signal Processing, vol. 60, Aug. 2012.

• G. Mileounis, B.Babadi, N. Kalouptsidis, and V. Tarokh, ”An Adaptive Greedy Algorithm with Application toNonlinear Communications”, IEEE Trans. on Signal Processing, vol. 58, Jun. 2010.

• P. Di Lorenzo and A. H. Sayed, “Sparse distributed learning based on diffusion adaptation,” IEEE Trans. SignalProcessing, vol. 61, March 2013.

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 41/58

“Machine Learning: A Bayesian and Optimization Perspective”

by

Sergios Theodoridis

Academic Press, 2015

1050 pages

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 42/58

Thank youfor your patience...

I hope that there are NO QUESTIONS !!!!!!!!!

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 43/58

Thank youfor your patience...

I hope that there are NO QUESTIONS !!!!!!!!!

Sergios Theodoridis, University of Athens. An Online Learning View via a Projections’ Path in the Sparse-land, 43/58

top related