![Page 1: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/1.jpg)
Feature Grouping-Based Fuzzy-Rough Feature Selection
Richard JensenNeil Mac Parthaláin
Chris Cornelis
![Page 2: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/2.jpg)
Outline
• Motivation/Feature Selection (FS)
• Rough set theory
• Fuzzy-rough feature selection
• Feature grouping
• Experimentation
![Page 3: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/3.jpg)
The problem: too much data
• The amount of data is growing exponentially– Staggering 4300% annual growth in global data
• Therefore, there is a need for FS and other data reduction methods– Curse of dimensionality: a problem
for machine learning techniques
• The complexity of the problem is vast– (e.g. the powerset of features for FS)
![Page 4: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/4.jpg)
Feature selection
• Remove features that are: – Noisy – Irrelevant – Misleading
• Task: find a subset that– Optimises a measure of subset goodness– Has small/minimal cardinality
• In rough set theory, this is a search for reducts– Much research in this area
![Page 5: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/5.jpg)
Rough set theory (RST)
• For a subset of features P
Upper approximation
Set X
Lower approximation
Equivalence class [x]P
![Page 6: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/6.jpg)
Rough set feature selection
• By considering more features, concepts become easier to define…
![Page 7: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/7.jpg)
Rough set theory
• Problems:– Rough set methods (usually) require data
discretization beforehand– Extensions require thresholds, e.g. tolerance
rough sets– Also no flexibility in approximations• E.g. objects either belong fully to the lower (or upper)
approximation, or not at all
![Page 8: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/8.jpg)
Fuzzy-rough sets
• Extends rough set theory– Use of fuzzy tolerance instead of crisp equivalence – Approximations are fuzzified– Collapses to traditional RST when data is crisp
• New definitions:
Fuzzy upper approximation:
Fuzzy lower approximation:
![Page 9: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/9.jpg)
Fuzzy-rough feature selection• Search for reducts– Minimal subsets of features that preserve the fuzzy lower
approximations for all decision concepts
• Traditional approach– Greedy hill-climbing algorithm used– Other search techniques have been applied (e.g. PSO)
• Problems– Complexity is problematic for large data (e.g. over several
thousand features)– No explicit handling of redundancy
![Page 10: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/10.jpg)
Feature grouping• Idea: don’t need to consider all features
– Those that are highly correlated with each other carry the same or similar information
– Therefore, we can group these, and work on a group by group basis
• This paper: based on greedy hill-climbing– Group-then-rank approach
• Relevancy and redundancy handled by– Correlation: similar features grouped together– Internal ranking (correlation with decision feature)
f7f4
f1f2 f9
f8
F1
![Page 11: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/11.jpg)
Forming groups of features
Calculate correlations
F1 F2 F3 Fn. . .
#1 f3
#2 f12 #3 f1
…#m fn
#1 f#2 f #3 f…#m fn
#1 f#2 f #3 f…#m fn
#1 f#2 f #3 f…#m fn
Feature groups
Internally-rankedfeature groups
Correlation measure
Threshold:
Redundancy
Relevancy
Dataτ
![Page 12: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/12.jpg)
. . .
Selecting features
Feature subset search and selection
Search mechanism
Subset evaluation
Selected subset(s)
![Page 13: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/13.jpg)
Fuzzy-rough feature grouping
![Page 14: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/14.jpg)
Initial experimentation• Setup:
– 10 datasets (9-2557 features)– 3 classifiers– Stratified 5 x 10-fold cross-validation
• Performance evaluation in terms of– Subset size– Classification accuracy– Execution time
• FRFG compared with – Traditional greedy hill-climber (GHC)– GA & PSO (200 generations, population size: 40)
![Page 15: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/15.jpg)
Results: average subset size
![Page 16: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/16.jpg)
Results: classification accuracy
JRip
IBk (k=3)
![Page 17: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/17.jpg)
Results: execution times (s)
![Page 18: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/18.jpg)
ConclusionFRFG– Motivation: reduce computational overhead; improve
consideration of redundancy– Group-then-rank approach– Parameter determines granularity of grouping– Weka implementation available: http://bit.ly/1oic2xM
Future work– Automatic determination of parameter τ – Experimentation using much larger data, other FS methods, etc– Clustering of features– Unsupervised selection?
![Page 19: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/19.jpg)
Thank you!
![Page 20: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/20.jpg)
Simple exampleDataset of six features
After initialisation, the following groups are formed
Within each group, rank determines relevance: e.g. f4 more relevant than f3
Ordering of groups
Greedy hill-climber
f1f4 f3
f2
F1
F2
f1f3F3
f5f4 f1F4
etc…
{F4, F1, F3, F5, F2, F6}F =
![Page 21: Feature Grouping-Based Fuzzy-Rough Feature Selection](https://reader036.vdocuments.site/reader036/viewer/2022062520/56816133550346895dd089da/html5/thumbnails/21.jpg)
Simple example...• First group to be considered: F4
– Feature f4 is preferable over others– So, add this to current (initially empty) subset R – Evaluate M(R + {f4}):
• If better score than the current best evaluation, store f4
• Current best evaluation = M(R + {f4})
– Set of features which appear in F4: ({f1 , f4 , f5}) • Add to the set Avoids
• Next feature group with elements that do not appear in Avoids: F1
And so on…
f5f4 f1
F4
f1f4
f3F1