feature selection in k-median clustering olvi mangasarian and edward wild university of wisconsin -...

28
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Upload: martha-morton

Post on 18-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Feature Selection in k-Median Clustering

Olvi Mangasarian and Edward Wild

University of Wisconsin - Madison

Page 2: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Principal Objective

Find a reduced number of input space features such that clustering in the reduced space closely replicates the clustering in the full dimensional space

Page 3: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Basic Idea Based on rigorous optimization theory, make a simple

but fundamental modification in one of the two steps of the k-median algorithm

In each cluster, find a point closest in the 1-norm to all points in that cluster and to the median of ALL data points

Proposed approach can lead to a feature reduction as high as 64%, with clustering comparable to within 4% to that with the original set of features

Based on increasing weight given to the data median, more features are deleted from problem

Page 4: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

FSKM Example

Start with median at origin

Apply k-median algorithm

As weight of data median increases, features are removed from the problem

Page 5: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Outline of Talk

Ordinary k-median algorithm

Two steps of the algorithm

Feature Selecting k-Median (FSKM) Algorithm

Overall optimization objective

Basic idea Mathematical optimization formulation Algorithm statement

Numerical examplesConclusion & outlook

Page 6: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Ordinary k-Median Algorithm

Given m data points in n-dimensional input feature spaceFind k cluster centers with the following propertyThe sum of the 1-norm distances between each data point

and the closest cluster center is minimizedFinding the minimum of a bunch of linear functions

is a concave minimization problem and is NP-hardHowever, the two-step k-median algorithm

terminates in a finite number of steps at a point satisfying the minimum principle necessary optimality condition

Page 7: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Two-Step k-Median Algorithm

(0) Start with k initial cluster centers

(1) Assign each data point to a 1-norm closest cluster center

(2) For each cluster compute a new cluster center that is 1-norm closest to all points in the cluster (median of cluster)

(3) Stop if all cluster centers are unchanged else go to (1)

Algorithm terminates in a finite number of steps at a point satisfying the minimum principle necessary optimality conditions

Page 8: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Key Change in Step (2) of k-Median Algorithm

(0)(1)(2) For each cluster compute a new cluster center that

minimizes the sum of 1-norm distances to all points in the cluster and a weighted 1-norm distance to the median of all data points

(3)Weight of 1-norm distance to dataset median determines number of features deleted:

For a zero weight no features are suppressed

For a sufficiently large weight all features are suppressed

and a weighted 1-norm distance to the median of all data points

Page 9: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

FSKM Theory

Page 10: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Subgradients

f(y)-f(x) ¸ f(x)0(y-x) 8 x,y 2 Rn Consider ||x||1 , x 2 R1

If x < 0 ||x||1 = -1

If x > 0 ||x||1 = 1

If x = 0 ||x||1 2 [-1, 1]

Page 11: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

FSKM Theory (Continued)

Page 12: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Zeroing Cluster Features(Based on Necessary and Sufficient Optimality Conditions

for Nondifferentiable Convex Optimization)

Thatis, cj = 0; whenever

Page 13: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

FSKM Algorithm

Thatis, cj = 0; whenever

Page 14: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

FSKM Example (Revisited)

Start with median at origin Apply k-median algorithm Compute ’s

x1 = 1

y1 = 5

x2 = 0

y2 = 4

max x = 1 max y = 5 For =1, feature x is removed

from the problem

1

2

x

y

Page 15: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Numerical Testing

Thatis, cj = 0; whenever

FSKM tested on five publicly available labeled datasets

Labels were used only to test effectiveness of FSKM

Data is first clustered using k-median then FSKM is applied to delete one feature at a time

Without using data labels, “error” in FSKM clustering with reduced features is obtained by comparison with the “gold standard” clustering with the full set of features

FSKM clustering error curve obtained without labels is compared with classification error curve obtained using data labels

Page 16: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

3-Class Wine Dataset178 Points in 13-dimensional Space

Page 17: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Remarks

Curves close togetherLargest increase in error

as last few features are removed

Reduced 13 features to 4:Clustering error < 4%Classification error

decreased by 0.56 percentage points

Page 18: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

2-Class Votes Dataset435 Points in 16-dimensional Space

Page 19: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Remarks

Curves have similar shape Largest increase in error as

last few features are removed

Reduced 16 features to 3: Clustering error < 10% Classification error increased

by 1.84 percentage points

Page 20: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

2-Class WDBC Dataset(Wisconsin Diagnostic Breast Cancer)569 Points in 30-dimensional Space

Page 21: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Remarks

Curves have similar shape for 14 and fewer features

First 3 features removed cause no change to either error curve

Reduced 30 features to 7: Clustering error < 10% Classification error increased

by 3.69 percentage points

Page 22: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

2-Class Star/Galaxy-Bright Dataset2462 Points in 14-dimensional Space

Page 23: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Remarks

Clustering error increases gradually as number of features is reduced

Some features obstructing classification

Reduced 14 features to 4: Clustering error < 10% Classification error decreased

by 1.42 percentage points

Page 24: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

2-Class Cleveland Heart Dataset297 Points in 13-dimensional Space

Page 25: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Remarks

Largest increase in both curves going from 13 to 9 features

Most features useful?Reduced 13 features to

8:Clustering error < 17%Classification error

increased by 7.74 percentage points

Page 26: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Conclusion

FSKM is a fast method for selecting relevant features while maintaining clusters similar to those in the original full dimensional space

Features selected by FSKM without labels may be useful for labeled data classification as well

FSKM eliminates costly search for appropriately reduced number of features required for clustering in smaller dimensional spaces (e.g. 14-choose-6 = 3003 k-median runs to get best 6 features out of 14 for the Star/Galaxy-Bright dataset compared to 9 k-median runs required by FSKM)

Page 27: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Outlook

Feature & data selection for support vector machinesSparse kernel approximation methodsGene expression selection

Incorporation of prior knowledge into learning

Optimization-based clustering may be useful in other machine learning applications

Minimalist supervised & unsupervised

learningSelect minimal knowledge for best model

Page 28: Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

Web Pages(Containing Paper & Talk)

www:cs:wisc:edu=øolvi

www:cs:wisc:edu=øwildt