signal processing course : theory for sparse recovery
DESCRIPTION
Slides for a course on signal and image processing.TRANSCRIPT
RecoveryGabriel Peyré
www.numerical-tours.com
�1Sparse
Inverse problem:
K : RN0 � RP , P � N0
Example: Regularization
K Kf0f0
�1
y = Kf0 + wmeasurements
Inverse problem:
observations
� = K �⇥ ⇥ RP�N
K : RN0 � RP , P � N0
f0 = �x0 sparse in dictionary � � RN0�N , N � N0.
x0 � RN f0 = �x0 � RN0 y = Kf0 + w � RP
w
Example: Regularization
Model:
K Kf0f0
coe�cients image� K
�1
y = Kf0 + wmeasurements
Inverse problem:
Fidelity Regularization
minx�RN
12
||y � �x||2 + �||x||1
observations
� = K �⇥ ⇥ RP�N
K : RN0 � RP , P � N0
f0 = �x0 sparse in dictionary � � RN0�N , N � N0.
x0 � RN f0 = �x0 � RN0 y = Kf0 + w � RP
w
Example: Regularization
Model:
K
Sparse recovery: f� = �x� where x� solves
Kf0f0
coe�cients image� K
�1
y = Kf0 + wmeasurements
f0 = �x0
y = �x0 + w
Recovery:
Observations:
Data:
x� ⇥ argminx�RN
12
||�x� y||2 + �||x||1
Variations and Stability
(P�(y))
f0 = �x0
y = �x0 + w
Recovery:
Observations:
Data:
x� ⇥ argminx�RN
12
||�x� y||2 + �||x||1
x� � argmin�x=y
||x||1��
0+
(no noise)
Variations and Stability
(P�(y))
(P0(y))
f0 = �x0
y = �x0 + w
Questions:
Recovery:
Observations:
Data:
– Behavior of x� with respect to y and �.
– Criterion to ensure ||x� � x0|| = O(||w||).
– Criterion to ensure x� = x0 when w = 0 and � = 0+.
x� ⇥ argminx�RN
12
||�x� y||2 + �||x||1
x� � argmin�x=y
||x||1��
0+
(no noise)
Variations and Stability
(P�(y))
(P0(y))
�! Mapping � ! x? looks polygonal.
�! If x0 sparse and � well chosen, sign(x
?) = sign(x0).
Numerical Illustration
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
s = 3 s = 6
s = 13 s = 25
�
� �
�
y = �x0 + w, ||x0||0 = s,� 2 R50⇥200 Gaussian.
Overview
• Polytope Noiseless Recovery
• Local Behavior of Sparse Regularization
• Robustness to Small Noise
• Robustness to Bounded Noise
• Compressed Sensing RIP Theory
�(B�)
x0 �x0
�
y �� x�(y)�1
��2
�2�3
��3
��1
� = (�i)i � R2�3
B� = {x \ ||x||1 � �}� = ||x0||1
min�x=y
||x||1
x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)
Polytopes Approach
�(B�)
x0 �x0
�
y �� x�(y)�1
��2
�2�3
��3
��1
� = (�i)i � R2�3
B� = {x \ ||x||1 � �}� = ||x0||1
min�x=y
||x||1
x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)
Polytopes Approach
(P0(y))
Suppose x0 not solution, show �(x0) � int(�B�)
⇥z, such that�
�x0 = �z,||z||1 = (1� �)||x0||1.
||z + ⇥||1 � ||z|| + ||�+h||1 � (1� �)||x0||1 + ||�||1,1||h||1 < ||x0||1
For any h = �� � Im(�) such that ||h||1 <�
||�+||1,1
=� �(x0) + h � �(B�)
Proof
=�
�(x0) + h = �(z + �)
x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)
=�
Suppose x0 not solution, show �(x0) � int(�B�)
⇥z, such that�
�x0 = �z,||z||1 = (1� �)||x0||1.
||z + ⇥||1 � ||z|| + ||�+h||1 � (1� �)||x0||1 + ||�||1,1||h||1 < ||x0||1
For any h = �� � Im(�) such that ||h||1 <�
||�+||1,1
=� �(x0) + h � �(B�)
Suppose �(x0) � int(�B�)
Then ⇥z, �x0 = (1� �)�z and ||z||1 < ||x0||1.||(1� �)z||1 < ||x0||1 so x0 is not a solution.
z
0
Proof
=�
�(x0) + h = �(z + �)
x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)
x0
�(B�)
C(0,1,1)
K(0,1,1)
Ks =�(�isi)i � R3 \ �i � 0
� 2-D conesCs = �Ks
2-D quadrant
Basis-Pursuit Mapping in 2-D
�
y �� x�(y)
�1
�2�3
� = (�i)i � R2�3
� = (�i)i � R3�N
�� Empty spherical caps property
RN
�j
Delaunay paving of the sphere with spherical triangles Cs
Basis-Pursuit Mapping in 3-D
�
y �� x�(y)
�k
Cs
�i
All MostRIP
Counting faces of random polytopes:
� Sharp constants.
� No noise robustness.
All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.
Call(1/4) � 0.065
Cmost(1/4) � 0.25
[Donoho]
Polytope Noiseless Recovery
50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Overview
• Polytope Noiseless Recovery
• Local Behavior of Sparse Regularization
• Robustness to Small Noise
• Robustness to Bounded Noise
• Compressed Sensing RIP Theory
0 � �E(x�)��
I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}Support of the solution:
First order condition: x� solution of P�(y)
��(�x� � y) + �s = 0 where�
sI = sign(x�I),
||sIc ||� � 1
First Order CNS Condition
��
x� ⇥ argminx�RN
E(x) =12||�x� y||2 + �||x||1
0 � �E(x�)��
I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}Support of the solution:
First order condition: x� solution of P�(y)
��(�x� � y) + �s = 0 where�
sI = sign(x�I),
||sIc ||� � 1
sIc =1�
��Ic(�x� � y)Note:
x� solution of P�(y)||��Ic(�x� � y)||� � � ��
First Order CNS Condition
��
Theorem:
x� ⇥ argminx�RN
E(x) =12||�x� y||2 + �||x||1
If �I has full rank:
=���(�x� � y) + �s = 0Implicit equation
Local Parameterization
x�I = �+
I y � �(��I�I)�1sI
�+I = (��I�I)�1��I
If �I has full rank:
=���(�x� � y) + �s = 0Implicit equation
Given y � compute x� � compute (s, I).
x�(y)I = �+I y � �(��I�I)�1sIDefine
By construction x�(y) = x�.x�(y)Ic = 0
Local Parameterization
x�I = �+
I y � �(��I�I)�1sI
�+I = (��I�I)�1��I
If �I has full rank:
=���(�x� � y) + �s = 0Implicit equation
Given y � compute x� � compute (s, I).
x�(y)I = �+I y � �(��I�I)�1sIDefine
By construction x�(y) = x�.
Theorem:
x�(y)Ic = 0
Remark: the theorem holds outside a union of hyperplanes.
12 ||x�||0=0
such that �I is full rank, I = supp(x?),
For (y,�) /2 H, let x
?be a solution of P�(y),
Local Parameterization
x�I = �+
I y � �(��I�I)�1sI
�+I = (��I�I)�1��I
2 2
2
2 2
11
1
11
for (
¯
�, y) close to (�, y), x�(y) is solution of P�(y)
�! if ker(�I) 6= {0}, x?not unique.
Full Rank Condition
Lemma:There exists x
?such that
ker(�I) = {0}.
�! if ker(�I) 6= {0}, x?not unique.
Proof:
Define 8 t 2 R, xt = x
? + t⌘.
If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.
Full Rank Condition
Lemma:There exists x
?such that
ker(�I) = {0}.
�! if ker(�I) 6= {0}, x?not unique.
Proof:
Define 8 t 2 R, xt = x
? + t⌘.
If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.
Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).
Full Rank Condition
Lemma:There exists x
?such that
t0t
xt
0
ker(�I) = {0}.
�! if ker(�I) 6= {0}, x?not unique.
Proof:
Define 8 t 2 R, xt = x
? + t⌘.
If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.
Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).
8 |t| < t0, xt is solution.
�xt = �x? and same sign:
Full Rank Condition
Lemma:There exists x
?such that
t0t
xt
0
ker(�I) = {0}.
�! if ker(�I) 6= {0}, x?not unique.
Proof:
Define 8 t 2 R, xt = x
? + t⌘.
If ker(�I) 6= {0}, let ⌘I 2 ker(�I) 6= 0.
Let t0 the smallest |t| s.t. sign(xt) 6= sign(x?).
By continuity, xt0 solution.
and | supp(xt0)| < | supp(x?)|.
8 |t| < t0, xt is solution.
�xt = �x? and same sign:
Full Rank Condition
Lemma:There exists x
?such that
t0t
xt
0
ker(�I) = {0}.
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
I = supp(s)
�! ok, by continuity.
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 1: dsj(y,�) < �
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
I = supp(s)
�! ok, by continuity.
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �
then dsj(y, ¯�) = ¯� �! ok.
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
I = supp(s)
�! ok, by continuity.
'j /2 Im(�I)
�! exclude this case.
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �
then dsj(y, ¯�) = ¯� �! ok.
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
Case 3: dsj(y,�) = � and
I = supp(s)
�! ok, by continuity.
'j /2 Im(�I)
�! exclude this case.
Exclude hyperplanes:
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �
H =[
{Hs,j \ 'j /2 Im(�I)}Hs,j =
�(y,�) \ dsj(y, �) = �
then dsj(y, ¯�) = ¯� �! ok.
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
Case 3: dsj(y,�) = � and
I = supp(s)
x
?=0
H;,j
�! ok, by continuity.
'j /2 Im(�I)
�! exclude this case.
Exclude hyperplanes:
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �
H =[
{Hs,j \ 'j /2 Im(�I)}Hs,j =
�(y,�) \ dsj(y, �) = �
then dsj(y, ¯�) = ¯� �! ok.
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
Case 3: dsj(y,�) = � and
I = supp(s)
x
?=0
H;,j
�! ok, by continuity.
'j /2 Im(�I)
�! exclude this case.
Exclude hyperplanes:
HI,j
d
sj(y, �) = |h'j , y � �I x�(y)i| 6 �
Case 2: dsj(y,�) = � and 'j 2 Im(�I)Case 1: dsj(y,�) < �
H =[
{Hs,j \ 'j /2 Im(�I)}Hs,j =
�(y,�) \ dsj(y, �) = �
then dsj(y, ¯�) = ¯� �! ok.
Proofx�(y)I = �+
I y � �(��I�I)�1sI
To show: 8 j /2 I,
Case 3: dsj(y,�) = � and
I = supp(s)
Local parameterization:
y �� x�
� �� x�
Under uniqueness assumption:
are piecewise a�ne functions.
Local Affine Maps
x�(y)I = �+I y � �(��I�I)�1sI
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
(BP sol.)
breaking points
change of support of x���
Corrolary: µ(y) = �x1 = �x2 is uniquely defined.
Projector
Proposition: If x1 and x2 minimize E�,
E�(x) = 12 ||�x� y||2 + �||x||1
then �x1 = �x2.
Corrolary: µ(y) = �x1 = �x2 is uniquely defined.
x3 = (x1 + x2)/2 is solution and if �x1 6= �x2,
Projector
Proposition: If x1 and x2 minimize E�,
E�(x) = 12 ||�x� y||2 + �||x||1
then �x1 = �x2.
Proof:
2||x3||1 6 ||x1||1 + ||x2||12||�x3 � y||2 < ||�x1 � y||2 + ||�x2 � y||2
E�(x3)< E�(x1) = E�(x2) =) contradiction.
For (y,�) close to (y,�) /2 H:
µ(y) = PI(y)� �dI= �I�
+I = �+,⇤
I sIPI : orthogonal projector on {�x \ supp(x) = I}.
Corrolary: µ(y) = �x1 = �x2 is uniquely defined.
x3 = (x1 + x2)/2 is solution and if �x1 6= �x2,
Projector
Proposition: If x1 and x2 minimize E�,
E�(x) = 12 ||�x� y||2 + �||x||1
then �x1 = �x2.
Proof:
2||x3||1 6 ||x1||1 + ||x2||12||�x3 � y||2 < ||�x1 � y||2 + ||�x2 � y||2
E�(x3)< E�(x1) = E�(x2) =) contradiction.
Overview
• Polytope Noiseless Recovery
• Local Behavior of Sparse Regularization
• Robustness to Small Noise
• Robustness to Bounded Noise
• Compressed Sensing RIP Theory
Uniqueness Sufficient Condition
E�(x) = 12 ||�x� y||2 + �||x||1
Uniqueness Sufficient Condition
Theorem: If �I has full rank and ||��Ic(�x� � y)||� < �
E�(x) = 12 ||�x� y||2 + �||x||1
then x
?is the unique minimizer of E�.
||�Ic(�x? � y)||1 = ||�Ic(�x? � y)||1 < �
=) supp(x?) ⇢ I
x
?I � x
?I 2 ker(�I) = {0}.
Let x? be a minimizer.
Then �x? = �x? =)
=) x
? = x
?
Uniqueness Sufficient Condition
Theorem: If �I has full rank and ||��Ic(�x� � y)||� < �
E�(x) = 12 ||�x� y||2 + �||x||1
Proof:
then x
?is the unique minimizer of E�.
F(s) = ||�IsI ||� where ⇥I = ��Ic�+,�
I
Identifiability crition: [Fuchs]
(�I is assumed to have full rank)
For s ⇥ {�1, 0,+1}N , let I = supp(s)
�+I = (��I�I)�1��I satisfies �+
I �I = IdI
Robustness to Small Noise
F(s) = ||�IsI ||� where ⇥I = ��Ic�+,�
I
Identifiability crition: [Fuchs]
(�I is assumed to have full rank)
�⇥ If ||w|| small enough, ||x� � x0|| = O(||w||).
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
If F (sign(x0)) < 1,
x0 + �+I w � �(��I�I)�1 sign(x0,I)
T = mini�I
|x0,i|
For s ⇥ {�1, 0,+1}N , let I = supp(s)
�+I = (��I�I)�1��I satisfies �+
I �I = IdI
Robustness to Small Noise
Theorem:
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j⇥|
where dI defined by:� i � I, �dI , �i� = si
dI = �I(��I�I)�1sI
Geometric Interpretation
�j
�idI = �+,�
I sI
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j⇥|
where dI defined by:� i � I, �dI , �i� = si
Condition F (s) < 1: no vector �j inside the cap Cs.
dI
Cs
dI = �I(��I�I)�1sI
Geometric Interpretation
�j
�i
�i
�j
|�dI , �⇥| < 1
dI = �+,�I sI
F(s) = ||�IsI ||� = maxj /�I
|�dI , �j⇥|
where dI defined by:� i � I, �dI , �i� = si
Condition F (s) < 1: no vector �j inside the cap Cs.
dI
Cs
dI
�i
�j
�k
dI = �I(��I�I)�1sI
Geometric Interpretation
�j
�i
�i
�j
|�dI , �⇥| < 1
|�dI ,
�⇥|<
1
dI = �+,�I sI
Local candidate: x� = x(sign(x�))
x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where
implicit equation
�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).
Sketch of Proof
Local candidate:
Sign consistency:
x� = x(sign(x�))
x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where
implicit equation
sign(x) = sign(x0) (C1)
y = �x0 + w =� x = x0 + �+I w � �(��I�I)�1sI
||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T =� (C1)
�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).
Sketch of Proof
Local candidate:
Sign consistency:
First order conditions:
x� = x(sign(x�))
x(s)I = �+I y � �(��I�I)�1sI , I = supp(s)where
implicit equation
sign(x) = sign(x0) (C1)
(C2)
y = �x0 + w =� x = x0 + �+I w � �(��I�I)�1sI
||��Ic(�x� y)||� < �
||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0
=� (C1)
=� (C2)
�⇥ To prove: x = x(sign(x0)) is the unique solution of P�(y).
Sketch of Proof
=� x isthe solution
�
� �
Sketch of Proof (cont)
||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0
�
=� x isthe solution
�
� �
�
||w||
�||w||+⇥⇤
=T
For ||w||/T < ⇥max, one can choose � � ||w||/T
such that x is the solution of P�(y).
T�max
�||w||�
⇥⇤=
0
Sketch of Proof (cont)
||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0
�
=� x isthe solution
�
� �
�
||w||
�||w||+⇥⇤
=T
For ||w||/T < ⇥max, one can choose � � ||w||/T
such that x is the solution of P�(y).
T�max
�||w||�
⇥⇤=
0
= O(||w||)||x� x0|| � ||�+
I w|| + � ||(��I�I)�1||�,2
=⇥ ||x� x0|| = O(||w||)
Sketch of Proof (cont)
||�+I ||�,2||w|| + ||(��I�I)�1||�,�� < T
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0
�
Overview
• Polytope Noiseless Recovery
• Local Behavior of Sparse Regularization
• Robustness to Small Noise
• Robustness to Bounded Noise
• Compressed Sensing RIP Theory
Exact Recovery Criterion (ERC): [Tropp]
ERC(I) = ||�I ||�,�
Relation with F criterion: ERC(I) = maxs,supp(s)�I
F(s)
For a support I ⇥ {0, . . . , N � 1} with �I full rank,
= ||�+I �Ic ||1,1 = max
j�Ic||�+
I �j ||1
(use ||(aj)j ||1,1 = maxj ||aj ||1)
Robustness to Bounded Noise
where ⇥I = ��Ic�+,�
I
Exact Recovery Criterion (ERC): [Tropp]
ERC(I) = ||�I ||�,�
Relation with F criterion: ERC(I) = maxs,supp(s)�I
F(s)
For a support I ⇥ {0, . . . , N � 1} with �I full rank,
= ||�+I �Ic ||1,1 = max
j�Ic||�+
I �j ||1
(use ||(aj)j ||1,1 = maxj ||aj ||1)
Robustness to Bounded Noise
where ⇥I = ��Ic�+,�
I
Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then
||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and
Restricted recovery:x ⇥ argmin
supp(x)�I
12
||�x� y||2 + �||x||1
�⇥ To prove: x is the unique solution of P�(y).
Sketch of Proof
Restricted recovery:x ⇥ argmin
supp(x)�I
12
||�x� y||2 + �||x||1
Implicit equation: xI = �+I y � �(��I�I)�1sI
Important: s = sign(x) is not equal to sign(x�).
�⇥ To prove: x is the unique solution of P�(y).
Sketch of Proof
Restricted recovery:x ⇥ argmin
supp(x)�I
12
||�x� y||2 + �||x||1
Implicit equation: xI = �+I y � �(��I�I)�1sI
Important: s = sign(x) is not equal to sign(x�).
�⇥ To prove: x is the unique solution of P�(y).
Sketch of Proof
First order conditions: (C2)||��Ic(�x� y)||� < �
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0 =� (C2)
Restricted recovery:x ⇥ argmin
supp(x)�I
12
||�x� y||2 + �||x||1
Implicit equation: xI = �+I y � �(��I�I)�1sI
Important: s = sign(x) is not equal to sign(x�).
ERC(I) < 1 =� F (s) < 1Since s is arbitrary:
Hence, choosing � � ||w|| implies (C2).
�⇥ To prove: x is the unique solution of P�(y).
Sketch of Proof
First order conditions: (C2)||��Ic(�x� y)||� < �
||��Ic(�I�+I � Id)||2,�||w||� (1� F (s))� < 0 =� (C2)
�(A,B) = maxj
�
i�I
|�ai, bj⇥|
�(A) = maxj
�
i �=j
|�ai, aj⇥|
w-ERC(I) =
��
�
�(�I ,�Ic)1� �(�I)
if �(�I) < 1
+� otherwise.
Weak Exact Recovery Criterion: [Gribonval,Dossal]
(for I = supp(s))
For A = (ai)i, B = (bi)i, where ai, bi � RP ,
F(s) � ERC(I) � w-ERC(I)
Weak ERC
Denoting � = (�i)N�1i=0 where �i � RP
Theorem:
ERC(I) = maxj /�I
||�+I �j ||1 � ||(��I�I)�1||1,1max
j /�I||��I�j ||1
maxj /�I
||��I⇥j ||1 = max
j /�I
�
i�m
|�⇥i, ⇥j⇥| = �(�I ,�Ic)
Proof
(for I = supp(s))F(s) � ERC(I) � w-ERC(I)Theorem:
ERC(I) = maxj /�I
||�+I �j ||1 � ||(��I�I)�1||1,1max
j /�I||��I�j ||1
maxj /�I
||��I⇥j ||1 = max
j /�I
�
i�m
|�⇥i, ⇥j⇥| = �(�I ,�Ic)
One has ��I�I = Id�H, if ||H||1,1 < 1,
(��I�I)�1 = (Id�H)�1 =�
k�0
Hk
||(��I�I)�1||1,1 ��
k�0
||H||k1,1 =1
1� ||H||1,1
||H||1,1 = maxi�I
�
j �=i
|�⇥i, ⇥j⇥| = �(�I)
Proof
(for I = supp(s))F(s) � ERC(I) � w-ERC(I)Theorem:
P = 200, N = 1000
F < 1ERC < 1 x� = x0
w-ERC < 1
Example: Random Matrix
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
⇥x =�
i
xi�(·��i)
Increasing �:� reduces correlation.
F (s)ERC(I)
w-ERC(I)
� reduces resolution.
�
Example: Deconvolution
�x0
x0�
Coherence Boundsµ(�) = max
i �=j|��i, �j⇥|Mutual coherence:
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
Coherence Bounds
Theorem:
||x0 � x�|| = O(||w||)
||x0||0 <12
�1 +
1µ(�)
�If
µ(�) = maxi �=j
|��i, �j⇥|Mutual coherence:
one has supp(x�) � I, and
and � � ||w||,
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
Coherence Bounds
Theorem:
||x0 � x�|| = O(||w||)
||x0||0 <12
�1 +
1µ(�)
�If
µ(�) = maxi �=j
|��i, �j⇥|Mutual coherence:
one has supp(x�) � I, and
and � � ||w||,
Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)
For Gaussian matrices:
For convolution matrices: useless criterion.
µ(�) ��
log(PN)/P
One has: Optimistic setting:||x0||0 � O(
�P )
µ(�) ��
N � P
P (N � 1)
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
Coherence - Examples
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
minx�R2N
12
||y � �x||2 + �||x||1
minx1,x2�RN
12
||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��
= +
Coherence - Examples
Incoherent pair of orthobases:
�2 =�k �� N�1/2e
2i�N mk
�
m�1 = {k ⇤⇥ �[k �m]}m
Diracs/Fourier
� = [�1,�2] � RN�2N
µ(�) =1�N
=� separates up to�
N/2 Diracs + sines.
minx�R2N
12
||y � �x||2 + �||x||1
minx1,x2�RN
12
||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��
= +
Coherence - Examples
Overview
• Polytope Noiseless Recovery
• Local Behavior of Sparse Regularization
• Robustness to Small Noise
• Robustness to Bounded Noise
• Compressed Sensing RIP Theory
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
⇥ argminx
12
||�x� y||2 + �||x||1 ��� �
CS with RIP
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
⇥ argminx
12
||�x� y||2 + �||x||1 ��� �
CS with RIP
[Candes 2009]
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
Theorem: If �2k ��
2� 1, then
where xk is the best k-term approximation of x0.
||x0 � x�|| � C0⇥k
||x0 � xk||1 + C1�
||hT c0||1 � ||hT0 ||1 + 2||xT c
0||1Optimality conditions:
C0 =2�
1� �C1 =
�
1� ⇥� = 2
�1 + �2k
1� �2k
� =�
2�2k
1� �2k
Explicit constants:
Reference:
Elements of Proof
{0, . . . , N � 1} = T0 ⇥ T1 ⇥ . . . ⇥ Tm
k elements
of x0 of hT c0
largesth = x� � x0
xk = xT0largest
||x0 � x�|| � C0⇥s
||x0 � xk||1 + C1�
E. J. Candes, CRAS, 2006
f�(⇥) =1
2⇤�⇥
�(⇥� b)+(a� ⇥)+
Eigenvalues of ��I�I with |I| = k are essentially in [a, b]
a = (1��
�)2 and b = (1��
�)2 where � = k/P
When k = �P � +�, the eigenvalue distribution tends to
[Marcenko-Pastur]
Large deviation inequality [Ledoux]
Singular Values Distributions
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
P = 200, k = 10
f�(�)
�
�k = 30
Link with coherence:
�k � (k � 1)µ(�)
�2 = µ(�)
RIP for Gaussian Matrices
µ(�) = maxi �=j
|��i, �j⇥|
Link with coherence:
�k � (k � 1)µ(�)
For Gaussian matrices:
�2 = µ(�)
RIP for Gaussian Matrices
µ(�) = maxi �=j
|��i, �j⇥|
µ(�) ��
log(PN)/P
Link with coherence:
�k � (k � 1)µ(�)
For Gaussian matrices:
Stronger result:
�2 = µ(�)
RIP for Gaussian Matrices
k � C
log(N/P )PTheorem: If
then �2k ��
2� 1 with high probability.
µ(�) = maxi �=j
|��i, �j⇥|
µ(�) ��
log(PN)/P
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
smallest / largest eigenvalues of A�A
Numerics with RIP
�2� 1
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
Upper/lower RIC:
�ik = max
|I|=k�i(�I)
�k = min(�1k, �2
k)
k
�2k
�2k
Monte-Carlo estimation:�k � �k
smallest / largest eigenvalues of A�A
Numerics with RIP
Local behavior:
� ! x
?polygonal.
y ! x
? piecewise a�ne.
Conclusion
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
�
Local behavior:
� ! x
?polygonal.
y ! x
? piecewise a�ne.
Noiseless recovery:
() geometry of polytopes.
Conclusion
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
�
x0
Local behavior:
� ! x
?polygonal.
y ! x
? piecewise a�ne.
Noiseless recovery:
() geometry of polytopes.
Small noise:
�! sign stability.
�! support inclusion.
Bounded noise:
RIP-based:
�! no support stability, L1bounds.
Conclusion
10 20 30 40 50 60−1
−0.5
0
0.5
s=3
10 20 30 40 50 60
−0.5
0
0.5
s=6
20 40 60 80 100
−0.5
0
0.5
1
s=13
20 40 60 80 100 120 140
−1.5
−1
−0.5
0
0.5
1
1.5
s=25
�
x0