margin based sample weighting for stable feature selection yue han, lei yu state university of new...

15
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton

Post on 21-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Margin Based Sample Weighting for Stable Feature Selection

Yue Han, Lei YuState University of New York at Binghamton

Outline• Introduction

• Related Work

• Hypothesis-Margin Feature Space Transformation

• Margin Based Sample Weighting

• Experimental Study

• Conclusion and Future Work

Introduction

Features(Genes or Proteins)

Sam

pl

es

p: # of features n: # of samplesHigh-dimensional data: p >> n

Feature Selection:Alleviating the effect of the curse of dimensionality.Enhancing generalization capability.Speeding up learning process.Improving model interpretability.

High dimensional Data

Dimension reduced Data

Feature Selection(Filter or Wrapper)

Learning Model

D1

D2

Sports

T1 T2 ….…… TN

12 0 ….…… 6

DM

C

Travel

Jobs

… … …

Terms

Docu

men

ts

3 10 ….…… 28

0 11 ….…… 16

Cont’s

D1

D2

Features

Sam

ple

s

Given Unlimited Sample Size of D:Feature selection results from D1 and D2 are the sameSize of D is limited(n<<p for high dimensional data)Feature selection results from D1 and D2 are differentIncreasing #of samples could be very costly or impractical

Stability of feature selection - the insensitivity of the resultof a feature selection algorithm to variations in the training set.

Identifying characteristic markers to explain the observed phenomena

Related Work• Bagging-based Ensemble Feature Selection • (Saeys et al. ECML07)Different bootstrapped samples of the same training

set;Apply a conventional feature selection algorithm;Aggregates the feature selection results.

• Group-based Stable Feature Selection • (Yu et al. KDD08 , KDD09)Explore the intrinsic feature correlations;Identify groups of correlated features;Select relevant feature groups.

Hypothesis-Margin Feature Space Transformation

A framework of margin based instance weighting for stable feature selection

Introduce the concept of hypothesis-margin feature space;

Propose the framework of margin based instance weighting for stable feature selection;

Develop an efficient algorithm under the proposed framework.

Hypothesis-Margin Feature Space Transformation

X’ captures the local profile of feature importance for all features at X.Multiple nearest neighbors can be used to compute the HM of a sample

hitmiss

Cont’s

Hypothesis-margin based feature space transformation: (a) original feature space, and (b) hypothesis-margin (HM) feature space.

Margin Based Sample Weighting

• Discrepancy among samples w.r.t. their local profiles of feature importance(HM feature space)

• Measure the average distance of X’ to all other samples in the HM feature space and greater average distance indicates higher outlying degree.

• overall time complexity O(n2q) and n is the number of samples and q is the dimensionality of D.

Experimental Study

Feature Ranking

Feature Subset Selection

Feature Correlation

Stability of a feature selection algorithm is measured as the average of the pair-wise similarity of various feature selection results produced by the same algorithm from different training sets.

Stability Metrics

Cont’s•Experimental Setup

• SVM-RFE: 10 percent of remaining features eliminated at each iteration.

• En-RFE: 20 bootstrapped training sets to construct the ensemble. • IW-RFE: k = 10 for hypothesis margin transformation.• 10tims shuffling and 10 fold cross-validation to generate 100

datasets.

Consistent improvement in terms of stability of feature selection results from different stability measures

differentfeature selection algorithms can lead to similarly good classification results

Conclusion and Future Work• Introduced the concept of hypothesis-margin

feature space• Proposed the framework of margin based

sample weighting for stable feature selection• Developed an efficient algorithm under the

frameworkInvestigate alternative methods of sample

weighting based on HM feature spaceStrategies to combine margin based sample

weighting with group-based stable feature selection

Questions?

Thank you!