a dictionary based generalization of robust pca sirisha ... · sirisha rambhatla, xingguo li and...

5
A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin Cities, Minneapolis, MN-55455 {rambh002, lixx1661, jdhaupt}@umn.edu ABSTRACT We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demix- ing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recov- ered under some relatively mild assumptions up to a certain global sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes. Index TermsLow-rank, dictionary sparse, Robust PCA. 1. INTRODUCTION Exploiting the inherent structure of data for the recovery of relevant information is at the heart of data analysis. In this paper, we analyze a scenario where a data matrix Y 2 R nm arises as a result of a superposition of a rank-r component X 2 R nm , and a dictionary sparse component expressed here as RA. Here, R 2 R nd is an a priori known dictionary with normalized columns, and A 2 R dm is the unknown sparse coefficient matrix with at most s total non-zeros. Specifically, we will study the following model, Y = X + RA, (1) and identify the conditions under which the components X and A, can be successfully recovered, given Y and R. A wide range of problems can be expressed in the form described above. Perhaps the most celebrated of these is prin- cipal component analysis (PCA) [1], which can be viewed as a special case of eq.(1), with the matrix A set to zero. In the absence of the component X, the problem reduces to that of sparse recovery [24]; See [5] and references therein for an overview of related works. The popular framework of Robust PCA tackles a case when the dictionary R is an identity ma- trix [6, 7]; variants include [811]. In addition, other variants of Robust PCA, such as Outlier Pursuit [12], where R = I and the sparse component is column sparse, and randomized adaptive sensing approaches [1317], have also been explored. Our work is most closely related to [18], which explores the application of the model shown in eq.(1) to detect traffic anomalies, and focuses on a case where the dictionary R is The authors graciously acknowledge support from NSF Award CCF- 1217751 and the DARPA Young Faculty Award, Grant N66001-14-1-4047. overcomplete, i.e., fat. The model described therein, is ap- plicable to a case where the rows of R are orthogonal, e.g., RR 0 = I, and the coefficient matrix A, has at most k nonzero elements per row and column. In this paper, we analyze a more general case, where we relax some of the aforementioned as- sumptions for the fat case, and develop an analogous analysis for the thin case. Specifically, this paper makes the following contributions towards guaranteeing the recovery of X and A in eq.(1). First, we analyze the thin case, where we assume R to be a frame [19] with a global sparsity of at most s; See [20] for a brief overview of frames. Next, for the fat case, we extend the analysis presented in [18], and assume that the dictionary R satisfies the restricted isometry property (RIP) of order k, with a global sparsity of at most s, and a column sparsity of at most k. Consequently, we eliminate the sparsity constraint on the rows of the coefficient matrix A and the orthogonality constraint on the rows of the dictionary R required by [18]. The model shown in eq.(1) is propitious in a number of ap- plications. For example, it can be used for target identification in hyperspectral imaging, and in topic modeling applications to identify documents with certain properties. Further, in source separation tasks, a variant of this model was used in singing voice separation in [21, 22]. Further, we can also en- vision source separation tasks where X is not low-rank, but can in turn be modeled as being sparse in a known [23] or unknown [24] dictionary. The rest of the paper is organized as follows. We formu- late the problem, introduce the notation and describe various considerations on the structure of the component matrices in section 2. In section 3, we present our main result and a proof sketch, followed by numerical simulations in section 4. Finally, we conclude in section 5 with some insights on future work. 2. PROBLEM FORMULATION Our aim is to recover the low-rank component X, and the sparse coefficient matrix A, given the dictionary R, and sam- ples Y generated according to the model described in eq.(1). Utilizing the assumed structure of the components X and A, we consider the following convex problem for λ 0, minimize X,A kXk+ λkAk1 s.t. Y = X + RA. (2) where, k.kdenotes the nuclear norm, and k.k1 refers to the l 1 - norm, which serve as convex relaxations of rank and sparsity (i.e. l 0 -norm), respectively. Depending upon the number of dictionary elements, d in R, we analyze the problem described

Upload: others

Post on 20-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha ... · Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin

A DICTIONARY BASED GENERALIZATION OF ROBUST PCA

Sirisha Rambhatla, Xingguo Li and Jarvis HauptDepartment of Electrical and Computer Engineering,

University of Minnesota-Twin Cities, Minneapolis, MN-55455{rambh002, lixx1661, jdhaupt}@umn.edu

ABSTRACTWe analyze the decomposition of a data matrix, assumed tobe a superposition of a low-rank component and a componentwhich is sparse in a known dictionary, using a convex demix-ing method. We provide a unified analysis, encompassing bothundercomplete and overcomplete dictionary cases, and showthat the constituent components can be successfully recov-ered under some relatively mild assumptions up to a certainglobal sparsity level. Further, we corroborate our theoreticalresults by presenting empirical evaluations in terms of phasetransitions in rank and sparsity for various dictionary sizes.

Index Terms— Low-rank, dictionary sparse, Robust PCA.

1. INTRODUCTIONExploiting the inherent structure of data for the recovery ofrelevant information is at the heart of data analysis. In thispaper, we analyze a scenario where a data matrix Y 2 Rn⇥m

arises as a result of a superposition of a rank-r componentX 2 Rn⇥m, and a dictionary sparse component expressed hereas RA. Here, R 2 Rn⇥d is an a priori known dictionary withnormalized columns, and A 2 Rd⇥m is the unknown sparsecoefficient matrix with at most s total non-zeros. Specifically,we will study the following model,

Y = X+RA, (1)

and identify the conditions under which the components X

and A, can be successfully recovered, given Y and R.A wide range of problems can be expressed in the form

described above. Perhaps the most celebrated of these is prin-cipal component analysis (PCA) [1], which can be viewed asa special case of eq.(1), with the matrix A set to zero. In theabsence of the component X, the problem reduces to that ofsparse recovery [2–4]; See [5] and references therein for anoverview of related works. The popular framework of RobustPCA tackles a case when the dictionary R is an identity ma-trix [6, 7]; variants include [8–11]. In addition, other variantsof Robust PCA, such as Outlier Pursuit [12], where R = I

and the sparse component is column sparse, and randomizedadaptive sensing approaches [13–17], have also been explored.

Our work is most closely related to [18], which exploresthe application of the model shown in eq.(1) to detect trafficanomalies, and focuses on a case where the dictionary R is

The authors graciously acknowledge support from NSF Award CCF-1217751 and the DARPA Young Faculty Award, Grant N66001-14-1-4047.

overcomplete, i.e., fat. The model described therein, is ap-plicable to a case where the rows of R are orthogonal, e.g.,RR0 = I, and the coefficient matrix A, has at most k nonzeroelements per row and column. In this paper, we analyze a moregeneral case, where we relax some of the aforementioned as-sumptions for the fat case, and develop an analogous analysisfor the thin case. Specifically, this paper makes the followingcontributions towards guaranteeing the recovery of X and A ineq.(1). First, we analyze the thin case, where we assume R tobe a frame [19] with a global sparsity of at most s; See [20] fora brief overview of frames. Next, for the fat case, we extendthe analysis presented in [18], and assume that the dictionaryR satisfies the restricted isometry property (RIP) of order k,with a global sparsity of at most s, and a column sparsity ofat most k. Consequently, we eliminate the sparsity constrainton the rows of the coefficient matrix A and the orthogonalityconstraint on the rows of the dictionary R required by [18].

The model shown in eq.(1) is propitious in a number of ap-plications. For example, it can be used for target identificationin hyperspectral imaging, and in topic modeling applicationsto identify documents with certain properties. Further, insource separation tasks, a variant of this model was used insinging voice separation in [21, 22]. Further, we can also en-vision source separation tasks where X is not low-rank, butcan in turn be modeled as being sparse in a known [23] orunknown [24] dictionary.

The rest of the paper is organized as follows. We formu-late the problem, introduce the notation and describe variousconsiderations on the structure of the component matrices insection 2. In section 3, we present our main result and a proofsketch, followed by numerical simulations in section 4. Finally,we conclude in section 5 with some insights on future work.

2. PROBLEM FORMULATIONOur aim is to recover the low-rank component X, and thesparse coefficient matrix A, given the dictionary R, and sam-ples Y generated according to the model described in eq.(1).Utilizing the assumed structure of the components X and A,we consider the following convex problem for � � 0,

minimizeX,A

kXk⇤ + �kAk1 s.t. Y = X+RA. (2)

where, k.k⇤ denotes the nuclear norm, and k.k1 refers to the l1-norm, which serve as convex relaxations of rank and sparsity(i.e. l0-norm), respectively. Depending upon the number ofdictionary elements, d in R, we analyze the problem described

Page 2: A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha ... · Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin

above for two cases – a) when d n, the thin case, and b)when d > n, the fat case. For the thin case, we assume thatthe rows of the dictionary R to comprise a frame, i.e. for anyvector v 2 Rd, we have

FLkvk22 kRvk22 FUkvk22, (3)

where FL and FU are the lower and upper frame bounds,respectively, with 0 < FL FU . Next, for d > n, the fatcase, we assume that R obeys the restricted isometry property(RIP) of order k, i.e. for any k-sparse vector v 2 Rd, we have

(1� �)kvk22 kRvk22 (1 + �)kvk22, (4)

where, � 2 [0, 1] is the restricted isometry constant (RIC).The aim of this paper is to answer the following question

– Given R, under what conditions can we recover X and A

from the mixture Y? We observe that there are a few wayswe can run into trouble right away, namely– a) the dictionarysparse part RA is low-rank, and b) the low rank part, X, issparse in the dictionary, R. Indeed, these relationships takethe center stage in our analysis. We begin by defining a fewrelevant subspaces, similar to those used in [18], which willhelp us to formalize the said relationships.

Let the pair {X0,A0} be the solution to the problem shownin eq.(2). We define � as the linear space of matrices spanningthe row and column spaces of the low-rank component X0.Specifically, let U⌃V0 denote the singular value decomposi-tion of X0, then the space � is defined as

� := {UW1 +W2V0,W1,W2 2 Rn⇥r}.

Next, let ⌦ be the space spanned by d⇥m matrices that havethe same support (location of non-zero elements) as A0, andlet ⌦R be defined as

⌦R := {Z = RH,H 2 ⌦}.

In addition, we denote the corresponding complements of thespaces described above by appending ‘?’. Next, let the orthog-onal projection operator(s) onto the space of matrice(s) definedabove be P�(.), P⌦(.) and P⌦R(.), respectively. Further, wewill use PU and PV to denote the projection matrices corre-sponding to the column and row spaces of X0, respectively,i.e., implying the following for any matrix X,

P�(X) = PUX+XPV �PUXPV

P�?(X) = (I�PU )X(I�PV ).

As alluded to previously, there are indeed situations underwhich we cannot hope to recover the matrices X and A. Toidentify these scenarios, we will employ various notions ofincoherence. We define the incoherence between the low-rankpart, X0, and the dictionary sparse part, RA0 as,

µ := maxZ2⌦R\{0d⇥m}

kP�(Z)kFkZkF

,

where µ 2 [0, 1] is small when these components are incoherent(good for recovery). The next two measures of incoherencecan be interpreted as a way to avoid the cases where for X0 =

U⌃V0, (a) U resembles the dictionary R, and (b) V resemblesthe sparse coefficient matrix A0. In this case, the low-rank

part essentially mimics the dictionary sparse component. Tothis end, similar to [18], we define respectively the followingto measure these properties,

�UR := maxi

kPUReik2kReik2

and �V := maxi

kPV eik2,

where �V 2 [r/m, 1]. Also, we define ⇠ := kR0UV0k1.

3. MAIN RESULTIn this section, we present the conditions under which solv-ing the problem stated in eq.(2) will successfully recover thetrue matrices X and A. As discussed in the previous section,the structure of the dictionary R plays a crucial role in theanalysis of the two paradigms, i.e. the thin case and the fatcase. Consequently, we provide results corresponding to thesecases separately. We begin by introducing a few definitionsand assumptions applicable to both cases. To simplify the anal-ysis we assume that d < m. Specifically, we will assume thatd m

↵r , where ↵ > 1 is a constant. In addition, our analysis isapplicable to the case when s < m.Definition D.1. We define

c :=

⇢ct, for d n

cf , for d > n,

where, ct and cf are defined as,ct :=

FU2 [(1+2�UR)(min(s, d)+s�V )+2s�V ]� FL

2 [min(s, d)+s�V ] andcf := 1+�

2 [(1+ 2�UR)(min(s, k) + s�V ) + 2s�V ]� 1��2 (min(s, k) + s�V ).

Further, we define C and �min as,

C :=

(ct

FL(1�µ)2�ct, for d n and FL 1

(1�µ)2cf

(1��)(1�µ)2�cf, for d > n

and �min := 1+C1�C ⇠.

Definition D.2.

�max :=

(1ps

�pFL( 1 � µ ) �

prFUµ

�, if d n

1ps

�p(1 � �)( 1 � µ ) �

pr(1 + �)µ

�, if d > n

Assumption A.1. �max � �min

Assumption A.2. Let smax := (1�µ)2

2mr , then

�UR (

(1�µ)2�2s�V2s(1+�V ) , for s min (d, smax)

(1�µ)2�2s�V2(d+s�V ) , for d < s smax

.

Assumption A.3. For smax as above,

�UR (

(1�µ)2�2s�V2s(1+�V ) , for s min (k, smax)

(1�µ)2�2s�V2(k+s�V ) , for k < s smax

.

Theorem 1. Consider a superposition of a low-rank matrixX0 2 Rn⇥m of rank r, and a dictionary sparse component RA0,wherein the sparse coefficient matrix A0 has at most s non-zeros,i.e., kA0k0 = s, and Y = X0 + RA0, with parameters �UR, ⇠,�V 2 [r/m, 1] and µ 2 [0, 1]. Then, solving the formulation shownin eq.(2) will recover matrices X0 and A0 if the following conditionshold for any � 2 [�min,�max], as defined in D.1 and D.2, respectively.

• For d n, the dictionary R 2 Rn⇥d obeys the frame conditionwith frame bounds [FL,FU ] and assumptions A.1, and A.2 hold.

• For d > n > C1klog(d), the dictionary R 2 Rn⇥d obeys theRIP of order k with RIC �, a constant C1 and assumptions A.1 andA.3 hold.

Thm. 1 establishes the sufficient conditions for the exis-tence of �s to guarantee recovery of {X0,A0} for both the thinand the fat case. For both cases, we see that the conditions areclosely related to the various incoherence measures �UR, �V

Page 3: A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha ... · Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin

and µ between the low-rank part, X, the dictionary, R, and thesparse component A. Further, we observe that the theorem im-poses an an upper-bound on the global sparsity, i.e., s smax.This is similar to what was reported in [12], and seems a resultof the deterministic analysis presented here. Further, the con-dition shown in assumption A.1, i.e., �min �max, translates toa relationship between rank r, and the sparsity s, namely,

r ✓q

FLFU

1�µµ � ⇠p

FUµ

1+C1�C

ps

◆2

, (5)

8s � 0 such that,ps FL(1�µ)

⇠1�C1+C , for the thin case, and

r ✓q

1��1+�

1�µµ � ⇠p

(1+�)µ

1+C1�C

ps

◆2

, (6)

8s � 0 such that,ps (1��)(1�µ)

⇠1�C1+C , for the fat case. These

relationships are indeed what we observe in empirical evalua-tions; this will be revisited in the next section. We now presenta brief proof sketch of the results presented in this section.3.1. Proof SketchWe use dual certificate construction procedure to prove themain result in Thm. 1 [25]. To this end, we start by construct-ing a dual certificate for the convex problem shown in eq.(2).In our analysis, we use kMk := �max(M) for the spectral norm,here �max(M) denotes the maximum singular value of the ma-trix M, kMk1 := max

{i, j}|Mij |, and kMk1,1 := max

ike0

iMk1.

The following lemma shows the conditions the dual certificateneeds to satisfy.Lemma 2 : (from Lemma 2 in [18] and Thm. 3 in [12]): If thereexists a dual certificate � 2 Rn⇥m satisfying

C1 : P�(�) = UV0 C2 : P⌦(R0�) = �sign(A0)C3 : kP�? (�)k < 1 C4 : kP⌦? (R0�)k1 < �

then the pair {X0, A0} is the unique solution of eq (2).We will now proceed with the construction of the dual

certificate which satisfies the conditions outlined by conditionsC1-4 by Lemma 2. Using the analysis similar to [18] (sectionV. B.), we construct the dual certificate as

� = UV0 + (I�PU)X(I�PV),

for arbitrary X 2 Rn⇥m. The condition C2 then translates to

P⌦(R0UV0) + P⌦(R

0(I�PU)X(I�PV)) = � sign(A0)

Let Z := R0(I�PU)X(I�PV) and B⌦ := �sign(A0) �P⌦(R

0UV0), then we can write the equation above as,

P⌦(Z) = B⌦.

Note that vec(Z) = [(I�PV)⌦R0(I�PU)]vec(X). Now, letA := (I�PV) ⌦ R0(I�PU), and let A⌦ 2 Rs⇥nm denotethe rows of A that correspond to support of A0, and A⌦?

correspond to the remaining rows of A. Further, let b⌦ be alength s vector containing elements of B⌦ corresponding tosupport of A0. Using these definitions and results, we concludethat

A⌦vec(X) = b⌦

This implies that vec(X) = A0⌦(A⌦A

0⌦)

�1b⌦. Now, we lookat the condition C3, i.e. kP�?(�)k, this is where our analysisdeparts from [18]; we write

kP�? (�)k = k(I�PU)X(I�PV)k k(I�PU)kkXkk(I�PV)k

kXk kXkF kA0⌦(A⌦A

0⌦)

�1kkb⌦k2,

where we have used the fact that k(I�PU)k 1 andk(I�PV)k 1. Now, as A0

⌦(A⌦A0⌦)

�1 is the pseudo-inverse of A⌦, i.e., A⌦A

0⌦(A⌦A

0⌦)

�1 = I, we have thatkA0

⌦(A⌦A0⌦)

�1k = 1/�min(A⌦), where �min(A⌦) is the small-est singular value of A⌦. Therefore, we have

kP�?(�)k kb⌦k2�min(A⌦)

.

To obtain an upper bound on kP�?(�)k, we will now presentthe following lemmata.Lemma 3 : The lower bound on �min(A⌦) is given by

�min(A⌦) �⇢p

FL�1 � µ

�, for d np

(1 � �)�1 � µ

�, for d > n

.

Lemma 4 : Upper bound on kb⌦k2 is given by

kb⌦k2 ⇢�ps +

prFUµ, for d n

�ps +

pr(1 + �)µ, for d > n

.

Assembling the results of the lemmata to obtain the upperbound on kP�?(�)k and consequently C3, we arrive at theexpression for �max defined in D.2. Now, we move onto findingconditions under which C4 is satisfied by our dual certificate.For this we will bound kP⌦?(R0�)k1. From eq.(16) in [18]we have,kP⌦?(R0�)k1 kQk1,1kb⌦k1 + kP⌦?(R0UV0)k1, (7)

where Q := A⌦?A0⌦(A⌦A

0⌦)

�1. Our aim now will be tobound kQk1,1 and kb⌦k1 for our case. For this, we presentthe following lemmata.Lemma 5 (from eq.(17) in [18] ): The upper bound on kb⌦k1 isgiven by �+ kP⌦(R

0UV0)k1.Lemma 6: The upper bound on kQk1,1 is given by C, where C isas defined in D.1Substituting these in eq.(7) and C4, we havekP⌦?(R0�)k1 C(�+ kP⌦(R

0UV0)k1) + kP⌦?(R0UV0)k1.

The expression above is further upper bounded by � due to C4.Here, C and c are as defined in D.1. Hence, we arrive at thefollowing lower bound for �,

�min :=1 + C

1� C⇠.

Gleaning from the expressions for �max and �min, we observethat the following conditions need to be satisfied for the exis-tence of �s that can recover the desired matrices– a) �max ��min > 0, and b) 0 < C < 1. These conditions are satisfied bythe assumptions A.1-A.3. This completes the proof.

4. SIMULATIONSOur analysis in the previous section shows that dependingupon the size of the dictionary R, if the conditions of Thm. 1are met, a convex program which solves eq.(2) will recover thecomponents X and A. In this section, we empirically evaluatethe claims of Thm. 1. To this end, we employ the acceleratedproximal gradient algorithm outlined in Algorithm 1 of [18] toanalyze the phase transition in rank and sparsity for differentsizes of the dictionary R. For our analysis, we consider thecase where n = m = 100. Here, we generate the low-rank

Page 4: A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha ... · Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin

d = 5 d = 150

Rec

over

yof

X

20 40 60 80 100Sparsity, s

20

40

60

80

100

Rank,r

20 40 60 80 100Sparsity, s

20

40

60

80

100

Rank,r

(a) (b)

Rec

over

yof

A

20 40 60 80 100Sparsity, s

20

40

60

80

100

Rank,r

20 40 60 80 100Sparsity, s

20

40

60

80

100R

ank,r

(c) (d)

Rec

over

yof

Aan

dX

20 40 60 80 100Sparsity, s

20

40

60

80

100

Rank,r

20 40 60 80 100Sparsity, s

20

40

60

80

100

Rank,r

(e) (f)Fig. 1: Recovery for varying rank of X, sparsity of A and numberof dictionary elements in R as per Thm. 1. Each plot shows averagerecovery across 10 trials for varying ranks (y-axis) and sparsity (x-axis) up to s m, white region representing correct recovery, forn = m = 100. We decide success if kX� XkF /kXkF 0.02 andkA� AkF /kAkF 0.02, where X and A are the recovered X andA, respectively. Panels (a)-(b) show the recovery of the low-rankpart X and (c)-(d) show the recovery of the sparse part with varyingdictionary sizes d = 5 and 150, respectively. Panels (e) -(f) showthe region of overlap where both X and A are recovered successfully.Also, the predicted trend between rank r and sparsity s as per Thm. 1,eq.(5) and eq.(6) is shown in red in panels (a-b).part X by outer product of two random matrices of sizes n⇥r and m ⇥ r, with entries drawn from the standard normaldistribution. In addition, the non-zero entries (s in number),of the sparse component A, are drawn from the Rademacherdistribution, also the dictionary R is drawn from the standardnormal distribution, then its columns are normalized. Phasetransition in rank and sparsity over 10 trials for dictionariesof sizes d = 5 (thin) and d = 150 (fat), corresponding to ourtheoretical results, and for all admissible levels of sparsity areshown in Fig. 1 and Fig. 2, respectively.

Fig. 1 shows the recovery of the low-rank part X in panels(a-b), while panels (c-d) show the recovery for the sparsecomponent A, for d = 5 and 150, respectively. Next, panels(e-f) show the region of overlap between the low-rank recoveryand sparse recovery plots, for d = 5 and d = 150, respectively.This corresponds to the region in which both X and A arerecovered successfully. Further, the red line in panels (a) and(b) show the trend predicted by our analysis, i.e., eq.(5) andeq.(6), with the parameters hand-tuned for best fit. Indeed, the

d = 5 d = 150

Rec

over

yof

X

100 200 300 400 500Sparsity, s

10

20

30

40

50

60

Rank,r

500 1000 1500 2000Sparsity, s

10

20

30

40

50

60

Rank,r

(a) (b)

Rec

over

yof

A

100 200 300 400 500Sparsity, s

10

20

30

40

50

60

Rank,r

500 1000 1500 2000Sparsity, s

10

20

30

40

50

60

Rank,r

(c) (d)Fig. 2: Recovery for varying rank of X, sparsity of A and number ofdictionary elements in R. Each plot shows average recovery across10 trials for varying ranks (y-axis) and sparsity (x-axis), white regionrepresenting correct recovery, for n = m = 100. We decide successif kX� XkF /kXkF 0.02 and kA� AkF /kAkF 0.02, where X

and A are the recovered X and A, respectively. Panels (a)-(b) showthe recovery of the low-rank part X and (c)-(d) show the recoveryof the sparse part with varying dictionary sizes d = 5 and 150,respectively.empirical relationship between rank and sparsity has the sametrend as predicted by Thm. 1.

Similarly, Fig. 2 shows the recovery of the low-rank partX in panels (a-b), while panels (c-d) show the recovery forthe sparse component A for d = 5 and 150, respectively, fora much wider range of global sparsity s. Indeed, these phasetransition plots show that we can successfully recover thecomponents for sparsity levels much greater than those putforth by the theorem. This can be attributed to the deterministicanalysis presented here. To this end, we conjecture that arandomized analysis of the problem can potentially improvethe results presented here.

5. CONCLUSIONSWe analyze a dictionary based generalization of Robust PCA.Specifically, we extend the theoretical guarantees presentedin [18] to a setting wherein the dictionary R may have arbitrarynumber of columns, and the coefficient matrix A has globalsparsity of s, i.e. kAk0 = s smax. We generalize the resultsby assuming R to be a frame for the thin case, to obey theRIP condition for the fat case, and eliminate the orthogonalityconstraints on the rows of the dictionary R and the sparsityconstraint on the rows of the coefficient matrix A (as in [18]),rendering the results useful for a potentially wide range ofapplications. Further, we provide empirical evaluations viaphase transition plots in rank and sparsity corresponding to ourtheoretical results. We draw motivations from the promisingphase transition plots, beyond the sparsity level tolerated byour analysis, and propose randomized analysis of the problemto improve the upper-bound on the sparsity as a future work.

Page 5: A DICTIONARY BASED GENERALIZATION OF ROBUST PCA Sirisha ... · Sirisha Rambhatla, Xingguo Li and Jarvis Haupt Department of Electrical and Computer Engineering, University of Minnesota-Twin

6. REFERENCES

[1] I. Jolliffe, Principal component analysis, Wiley OnlineLibrary, 2002.

[2] B. K. Natarajan, “Sparse approximate solutions to linearsystems,” SIAM journal on computing, vol. 24, no. 2, pp.227–234, 1995.

[3] D. L. Donoho and X. Huo, “Uncertainty principles andideal atomic decomposition,” IEEE Transactions onInformation Theory, vol. 47, no. 7, pp. 2845–2862, 2001.

[4] E. J. Candès and T. Tao, “Decoding by linear program-ming,” IEEE Transactions on Information Theory, vol.51, no. 12, pp. 4203–4215, 2005.

[5] H. Rauhut, “Compressive sensing and structured ran-dom matrices,” Theoretical foundations and numericalmethods for sparse recovery, vol. 9, pp. 1–92, 2010.

[6] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robustprincipal component analysis?,” Journal of the ACM(JACM), vol. 58, no. 3, pp. 11, 2011.

[7] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S.Willsky, “Rank-sparsity incoherence for matrix decom-position,” SIAM Journal on Optimization, vol. 21, no. 2,pp. 572–596, 2011.

[8] Z. Zhou, X. Li, J. Wright, E. J. Candès, and Y. Ma, “Sta-ble principal component pursuit,” in Information TheoryProceedings (ISIT), 2010 IEEE International Symposiumon. IEEE, 2010, pp. 1518–1522.

[9] X. Ding, L. He, and L. Carin, “Bayesian robust princi-pal component analysis,” IEEE Transactions on ImageProcessing, vol. 20, no. 12, pp. 3419–3430, 2011.

[10] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressiveprincipal component pursuit,” Information and Inference,vol. 2, no. 1, pp. 32–68, 2013.

[11] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis, “Low-rank matrix recovery from errors and erasures,” IEEETransactions on Information Theory, vol. 59, no. 7, pp.4324–4337, 2013.

[12] H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCAvia outlier pursuit,” in Advances in Neural InformationProcessing Systems, 2010, pp. 2496–2504.

[13] X. Li and J. Haupt, “Identifying outliers in large matricesvia randomized adaptive compressive sampling,” Trans.Signal Processing, vol. 63, no. 7, pp. 1792–1807, 2015.

[14] X. Li and J. Haupt, “Locating salient group-structuredimage features via adaptive compressive sensing,” inIEEE Global Conference on Signal and Information Pro-cessing (GlobalSIP), 2015, pp. 393–397.

[15] X. Li and J. Haupt, “Outlier identification via randomizedadaptive compressive sampling,” in IEEE InternationalConference on Acoustic, Speech and Signal Processing(ICASSP), 2015, pp. 3302–3306.

[16] M. Rahmani and G. Atia, “Randomized robust subspacerecovery for high dimensional data matrices,” arXivpreprint arXiv:1505.05901, 2015.

[17] X. Li and J. Haupt, “A refined analysis for the samplecomplexity of adaptive compressive outlier sensing,” inIEEE Statistical Signal Processing Workshop (SSP), June2016, pp. 1–5.

[18] M. Mardani, G. Mateos, and G. B. Giannakis, “Recoveryof low-rank plus compressed sparse matrices with appli-cation to unveiling traffic anomalies,” IEEE Transactionson Information Theory, vol. 59, no. 8, pp. 5186–5205,2013.

[19] R. J. Duffin and A. C. Schaeffer, “A class of nonhar-monic Fourier series,” Transactions of the AmericanMathematical Society, vol. 72, no. 2, pp. 341–366, 1952.

[20] C. Heil, “What is ... a frame?,” Notices of the AmericanMathematical Society, vol. 60, no. 6, June/July 2013.

[21] P. S. Huang, S. D. Chen, P. Smaragdis, and M. J.Hasegawa, “Singing-voice separation from monauralrecordings using robust principal component analysis,”in IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP), 2012, 2012, pp. 57–60.

[22] P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Real-time online singing voice separation from monauralrecordings using robust low-rank modeling.,” in ISMIR,2012, pp. 67–72.

[23] J. L. Starck, Y. Moudden, J. Bobin, M. Elad, and D. L.Donoho, “Morphological component analysis,” in Optics& Photonics 2005. International Society for Optics andPhotonics, 2005, pp. 59140Q–59140Q.

[24] S. Rambhatla and J. Haupt, “Semi-blind source sepa-ration via sparse representations and online dictionarylearning,” in Signals, Systems and Computers, 2013Asilomar Conference on. IEEE, 2013, pp. 1687–1691.

[25] S. Rambhatla, X. Li, and J. Haupt, “A dictionary basedgeneralization of robust PCA with applications,” (Inpreparation), 2016.

[26] G.A. Watson, “Characterization of the subdifferential ofsome matrix norms,” Linear Algebra and its Applications,vol. 170, pp. 33 – 45, 1992.