outlier pursuit: robust pca and collaborative filtering pursuit: robust pca and collaborative...
TRANSCRIPT
![Page 1: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/1.jpg)
Outlier Pursuit: Robust PCA and Collaborative Filtering
Huan Xu
Dept. of Mechanical Engineering & Dept. of Mathematics
National University of Singapore
Joint w/ Constantine Caramanis, Yudong Chen, Sujay Sanghavi
![Page 2: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/2.jpg)
Outline
PCA and Outliers - Why SVD fails - Corrupted features vs. corrupted points
Our idea + Algorithms Results - Full observation - Missing Data Framework for Robustness in High Dimensions
![Page 3: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/3.jpg)
Principal Components Analysis
Data Points Low Rank Matrix
Classical technique: 1. Organize points as matrix 2. Take SVD 3. Top singular vectors span space
Given points that lie on/near a Lower dimensional subspace, find this subspace.
![Page 4: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/4.jpg)
Fragility
Gross errors of even one/few points can completely throw off PCA
Reason: Classical PCA minimizes error, which is susceptible to gross outliers
![Page 5: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/5.jpg)
Two types of gross errors
- Individual entries corrupted -Entire columns corrupted
.. and missing data versions of both
Corrupted Features Outliers
![Page 6: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/6.jpg)
PCA with Outliers Points
Objective: find identities of outliers (and hence col. space of true matrix)
![Page 7: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/7.jpg)
Outlier Pursuit - Idea Points
Standard PCA
![Page 8: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/8.jpg)
Outlier Pursuit - Method Points
We propose:
Convex surrogate for Rank constraint
Convex surrogate for Column-sparsity
![Page 9: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/9.jpg)
When does it (not) work ?
When certain directions of column space of poorly represented
This vector has large inner product with some coordinate axes
is large
![Page 10: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/10.jpg)
Results
Assumption: Columns of true are incoherent:
Note:
First consider: Noiseless case
![Page 11: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/11.jpg)
Results
Assumption: Columns of true are incoherent:
Theorem: (noiseless case) Our convex program can identify upto a fraction of outliers as long as
Note:
Outer bound: makes the problem un-identifiable
![Page 12: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/12.jpg)
Proof Technique
A point is the optimum of a convex function
Zero lies in the (sub) gradient of at
Steps: 1. guess a “nice” point, -- oracle problem 2. show it is the optimum by showing zero is in subgradient
![Page 13: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/13.jpg)
Proof Technique
Guessing a “nice” optimum (Note: Due to the fact that column sparse matrix is also low-rank, hence “nice” points are non-unique. In problems like matrix completion, compressed sensing or RPCA with sparse corruption, this is not an issue)
Oracle Problem:
is, by definition, a nice point. Rest of proof: showing it is the optimum of original program, under our assumption.
![Page 14: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/14.jpg)
Proof Technique
Constructing Dual Certificate: need a Q to satisfy
A first guess is to use This works when each corrupted column is orthogonal, but fails otherwise. To satisfy the equalities, use a corrected Q. Then show the inequalities hold
under the conditions given in the theorem.
![Page 15: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/15.jpg)
(More detailed) Proof Technique
• One can verify that
• Let then (1) and (2) are satisfied • To satisfy (3), we need , such that
• This can be constructed via
• Indeed the least square solution. • The inverse operator = , provided the
series converges. • satisfies all three equalities. Then
show the inequalities hold.
![Page 16: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/16.jpg)
Noisy case
• We observe for unknown noise • Relax the equality constraint to inequality, i.e., solve
• Under essentially same condition as the noiseless case, the noisy outlier pursuit succeeds: it finds a solution close to a pair with correct column space and column support.
![Page 17: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/17.jpg)
Implementation Issue
• Outlier Pursuit is an SDP.
• Interior methods may not scale.
• We are interested in the structure – can sacrifice accuracy for scalability
• Use proximal gradient algorithms.
![Page 18: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/18.jpg)
Performance
L + C formulation L + S formulation ( from [Chandrasekaran et. al.], [Candes,et. al.] )
![Page 19: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/19.jpg)
More experiment
• 220 samples of digit ‘1’ and 11 samples of digit ‘7’, unlabeled.
![Page 20: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/20.jpg)
Another view…
Mean is solution of
Median is solution of
Standard PCA of M is solution of
Our method is (convex rel. of)
Fragile: Can be easily skewed by one / few points
Robust: skewing requires Error in constant fraction of pts
![Page 21: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/21.jpg)
Collaborative Filtering w/ Adversaries
![Page 22: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/22.jpg)
“In an incident that highlights the pitfalls of online recommendation systems, Amazon.com on Friday removed a link to a sex manual that appeared next to a listing for a spiritual guide by well-known Christian televangelist Pat Robertson.”
http://news.cnet.com/2100-1023-976435.html
![Page 23: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/23.jpg)
Manipulation in RS
• Recommendation systems utilize ALL users’ ratings to predict theirs future preferences.
• Malicious users try to skew prediction by injecting fake ratings.
• Some popular algorithms (e.g. kNN) are vulnerable to manipulation.
![Page 24: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/24.jpg)
RS via Matrix Completion (convention wisdom)
• Preference Matrix
• Partial Observations
Given Recover L0?
![Page 25: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/25.jpg)
RS via Matrix Completion
• Assumption: User Preferences form Low-rank Matrix
Users
Products
Products Features
User Features
Products Features
User 3’s overall preference for Product 2
Product 2’s feature vector
User 3’s feature vector
Performance
Portability
Preference for performance
Preference for portability
e.g. Product = Laptop
![Page 26: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/26.jpg)
Robust Matrix Completion (New idea)
L0 (Low-rank)
Authentic columns M
C0 (Column-Sparse)
Arbitrarily corrupted columns
Partial Observation
Problem:
Given , find L0 and indentify columns of C0
![Page 27: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/27.jpg)
Collaborative Filtering w/ Adversaries
Users
Low-rank matrix that -Is partiallly observed
-Has some corrupted columns
== outliers with missing data !
Our setting: -Good users == random sampling of incoherent matrix (as in matrix completion)
-Manipulators == completely arbitrary sampling, values
![Page 28: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/28.jpg)
Outlier Pursuit with Missing Data
for observed
Now: need row space to be incoherent as well - since we are doing matrix completion and manipulator identification
![Page 29: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/29.jpg)
Our Result
Theorem: Convex program optimum is such that has the correct column space and the support of is exactly the set of manipulators, whp, provided
Sampling density
Fraction of users that are manipulators
Note: no assumptions on manipulators
![Page 30: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/30.jpg)
Corollaries
• The algorithm succeeds if – there are a constant fraction of corrupted
columns, and the fraction of observation is also a constant;
– there are a growing number of corrupted columns, and the fraction of observations goes to zero.
![Page 31: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/31.jpg)
Roadmap of the Proof
• Need a Q to satisfy
• Least square solution to satisfy (1), (2), and (5)? • Problem: least square solution involves infinite series,
which breaks down independence, while to get sharp inequality, we need to exploit iid of sampling.
![Page 32: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/32.jpg)
Roadmap of the proof -2
• Rethink of the optimality condition: Optim. certificate insures for any feasible pertubation • The condition , comes from the need to
show above inequality for any • However, that is not necessary. One can show that
• We can relax the equality condition to
inequality, but tighten the other inequalities. • Makes it possible to avoid infinite series.
![Page 33: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/33.jpg)
Robust Collaborative Filtering
Algo: Partially observed Low-rank + Column-sparse
Algo: Partially observed Sparse + Low-rank
![Page 34: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/34.jpg)
More generally …
Several methods in High-dim. Statistics
Our approach:
Loss function regularizer
(same) Loss function Weighted sum of regularizers
Yields robustness + flexibility in several settings. Today: PCA wit Outliers + missing data
![Page 35: Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative Filtering Huan Xu Dept. of Mechanical Engineering & Dept. of Mathematics National University](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad9b4857f8b9a53618b9c2e/html5/thumbnails/35.jpg)
Papers can be found in my website: http://guppy.mpe.nus.edu.sg/~mpexuh/