an efficient greedy method for unsupervised feature selection ahmed farahat joint work with ali...
TRANSCRIPT
![Page 1: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/1.jpg)
An Efficient Greedy Method for Unsupervised Feature Selection
Ahmed Farahat
Joint work with
Ali Ghodsi, and
Mohamed Kamel{afarahat, aghodsib, mkamel}@uwaterloo.ca
ICDM 2011
![Page 2: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/2.jpg)
Outline
• Introduction– Dimension Reduction & Feature Selection– Previous Work
• Proposed Work– Feature Selection Criterion– Recursive Formula– Greedy Feature Selection
• Experiments and Results• Conclusion
2
![Page 3: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/3.jpg)
Dimension Reduction
• In data mining applications, data instances are typically described by a huge number of features.– Images (>2 megapixels)– Documents (>10K words)
• Most of these features are irrelevant or redundant.
• Goal: Reduce the dimensionality of the data:– allow a better understanding of data– improve the performance of other learning tasks
3
![Page 4: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/4.jpg)
Feature Selection vs. Extraction
• Feature Selection (a.k.a variable selection)
searches for a relevant subset of existing features
(−) a combinatorial optimization problem
(+) features are easy to interpret
• Feature Extraction (a.k.a feature transformation)
learns a new set of features
(+) unique solutions in polynomial time
(−) features are difficult to interpret
4
![Page 5: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/5.jpg)
Feature Selection
• Wrapper vs. filter methods: – Wrapper methods search for features which enhance the
performance of the learning task (+) more accurate, (−) more complex
– Filter methods analyze the intrinsic properties of the data, and select highly-ranked features according to some criterion.
(+) less complex, (−) less accurate
• Supervised vs. unsupervised methods
• This work: filter and unsupervised methods
5
![Page 6: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/6.jpg)
Previous Work
• PCA-basedcalculate PCA, associate features with principal components based on their coefficients, select features associated with the first principal components (Jolliffe, 2002)
• Sparse PCA-based calculate sparse PCA (Zou et al. 2006), select for each principal component the subset of features with non-zero coefficients
• Convex Principal Feature Selection (CPFS) (Masaeli et al SDM’10)
formulates a continuous optimization problem which minimizes the reconstruction error of the data matrix with sparsity constraints
6
![Page 7: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/7.jpg)
Previous Work (Cont.)
• Feature Selection using Feature Similarity (FSFS) (Mitra et al. TPAMI’02)
groups features into clusters and then selects a representative feature for each cluster
• Laplacian Score (LS) (He et al. NIPS’06)
selects features that preserve similarities between data instances
• Multi-Cluster Feature Selection (MCFS) (Cai et al. KDD’10)
selects features that preserve the multi-cluster structure of the data
7
![Page 8: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/8.jpg)
This Work
• A criterion for unsupervised feature selection– minimizes the reconstruction error of the data matrix based on
the selected subset of features
• A recursive formula for calculating the criterion
• An effective greedy algorithm for unsupervised feature selection
8
P
S
![Page 9: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/9.jpg)
Feature Select Criterion9
Data matrix
n
mLeast squares
Minimize lossfeatures
instances
Reconstructed matrix
![Page 10: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/10.jpg)
Problem 1: (Unsupervised Feature Selection) Find a subset of features such that
where
and
This is an NP-hard combinatorial optimization problem.
Feature Select Criterion (Cont.)
10
![Page 11: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/11.jpg)
Theorem 1: Given a set of features . For any ,
where
Recursive Selection Criterion11
P
S
![Page 12: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/12.jpg)
Lemma 1: Given a set of features . For any ,
where
Recursive Selection Criterion (Cont.)
12
![Page 13: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/13.jpg)
Proof of Lemma 113
![Page 14: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/14.jpg)
Proof of Lemma 1 (Cont.)
• Let be the Schur complement of in .
• Use block-wise inversion formula of :
14
![Page 15: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/15.jpg)
Recursive Selection Criterion (Cont.)
• Corollary 1: Given a set of features . For any ,
• Proof:
– Using Lemma 1,
15
0
![Page 16: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/16.jpg)
Theorem 1: Given a set of features . For any ,
where
Recursive Selection Criterion16
![Page 17: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/17.jpg)
Proof of Theorem 117
![Page 18: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/18.jpg)
Greedy Selection Criterion
• Problem 2: (Greedy Feature Selection) At iteration t, find feature l such that,
• Using Theorem 1:
where
• Problem 2 is equivalent to:
18
![Page 19: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/19.jpg)
Greedy Selection Criterion (Cont.)
19
![Page 20: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/20.jpg)
Greedy Selection Criterion (Cont.)
• At iteration t:
• Problems:– Memory inefficient: – Computationally complex: per iteration
20
![Page 21: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/21.jpg)
Greedy Selection Criterion (Cont.)
• At iteration t, define:
• Calculate E and G recursively as:
• ,
• Define ,
21
![Page 22: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/22.jpg)
Update formulas for f and g
Memory-Efficient Selection22
![Page 23: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/23.jpg)
Partition-based Selection
• Greedy selection criterion: + per iteration
• At each iteration, n candidate features x n projections• Solution:
– Partition features into c << n random groups– Select the feature which best represents the centroids of these
groups – Similar update formulas can be developed for f and g– Complexity: + per iteration
23
![Page 24: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/24.jpg)
24
![Page 25: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/25.jpg)
Experiments
Seven methods were compared• PCA-LRG: is a PCA-based method that selects features associated
with the first k principal components (Masaeli et al 2010)
• FSFS: is the Feature Selection using Feature Similarity (Mitra et al. 2006)
• LS: is the Laplacian Score (LS) method (He et al. 2006)
• SPEC: is the spectral feature selection method (Zhao et al. 2007)
• MCFS: is the Multi-Cluster Feature Selection method (Cai et al. 2010)
• GreedyFS: is the basic greedy algorithm (using recursive update formulas for f and g but without random partitioning)
• PartGreedyFS: is the partition-based greedy algorithm
25
![Page 26: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/26.jpg)
Data Sets26
• These data sets were recently used by Cai et al. (2010) to evaluate different feature selection methods in comparison to the Multi-Cluster Feature Selection (MCFS) method.
![Page 27: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/27.jpg)
Results – k-means27
![Page 28: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/28.jpg)
Results – Affinity Propagation28
![Page 29: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/29.jpg)
Results – Run Times29
![Page 30: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/30.jpg)
Results – Run Times30
![Page 31: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/31.jpg)
Conclusion
• This work presents a novel greedy algorithm for unsupervised feature selection.– a feature selection criterion which measures the reconstruction
error of the data matrix based on the subset of selected features– a recursive formula for calculating the feature selection criterion– an efficient greedy algorithm for feature selection, and two
memory and time efficient variants
• It has been empirically shown that the proposed algorithm – achieves better clustering performance– is less computationally demanding than methods that give
comparable clustering performance
31
![Page 32: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/32.jpg)
32
Thank you!
![Page 33: An Efficient Greedy Method for Unsupervised Feature Selection Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel { afarahat, aghodsib, mkamel](https://reader036.vdocuments.site/reader036/viewer/2022062518/5697bf711a28abf838c7dfcd/html5/thumbnails/33.jpg)
References
• I. Jolliffe, Principal Component Analysis, 2nd ed. Springer, 2002• H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” J.
Comput. Graph. Stat., 2006• M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. Dy, “Convex principal feature
selection,” SIAM SDM 2010• X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” NIPS 2006• Y. Cui and J. Dy, “Orthogonal principal feature selection,” in the Sparse
Optimization and Variable Selection Workshop, ICML 2008• Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised
learning,” ICML 2007• D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi-cluster
data,” KDD 2010• P. Mitra, C. Murthy, and S. Pal, “Unsupervised feature selection using feature
similarity,” IEEE Trans. Pattern Anal. Mach. Intell., 2002.
33