feature selection
DESCRIPTION
Feature Selection. Dr. Gheith Abandah. Definition. Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features. Benefits: For excluding irrelevant and redundant features, - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/1.jpg)
Dr. Gheith Abandah
1
![Page 2: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/2.jpg)
Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Benefits:◦ For excluding irrelevant and redundant features,◦ it allows reducing system complexity and
processing time,◦ and often improves the recognition accuracy.
For large number of features, exhaustive search for best subset out of 2M possible subsets is infeasible.
2
![Page 3: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/3.jpg)
Feature subset selection is applied on a set of feature values ijkx ; Ni ,,2,1 ;
Cj ,,2,1 ; and Mk ,,2,1 , where ijkx is the ith sample of the jth class of
the kth feature. Therefore, the average of the kth feature for letter form j is
N
iijkjk x
Nx
1
1.
And the overall average of the kth feature is
C
jjkk x
Cx
1
1.
3
![Page 4: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/4.jpg)
Generally be classified according to the criterion function used in searching for good features.1. Wrapper algorithm: the performance of the
classifier is used to evaluate the feature subsets.
2. Filter algorithm: some feature evaluation function is used rather than optimizing the classifier’s performance.
Wrapper methods are usually slower than filter methods but offer better performance.
4
![Page 5: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/5.jpg)
Select best individual features. A feature evaluation function is used to rank individual features, then the highest ranked m features are selected.
Although these methods can exclude irrelevant features, they often include redundant features.
“The m best features are not the best m features”
5
![Page 6: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/6.jpg)
Examples:1.Scatter criterion 2.Symmetric uncertainty
6
![Page 7: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/7.jpg)
Select the features that have highest values of the scatter criterion kJ , which is a
ratio of the mixture scatter to the within-class scatter. The within-class scatter of
the kth feature is
C
jjkjkw SPS
1, ,
where Sjk is the variance of class j , and Pj is the priori probability of this class
and found by:
N
ijkijkjk xx
NS
1
2)(1
and C
Pj
1 .
7
![Page 8: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/8.jpg)
The between-class scatter is the variance of the class centers with respect to the
global center and is found by
C
jkjkjkb xxPS
1
2, )( .
And the mixture scatter is the sum of the within and between-class scatters, and
equals the variance of all values with respect to the global center.
C
j
N
ikijkkbkwkm xx
CNSSS
1 1
2,,, )(
1
8
![Page 9: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/9.jpg)
The scatter criterion Jk of the kth feature is
kw
kmk S
SJ
,
, .
Higher value of this ratio indicates that the feature has high ability in separating
the various classes into distinct clusters.
9
![Page 10: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/10.jpg)
10
First normalize the feature values for zero mean and unit variance by
k
kijkijk
xxx
ˆ ,
C
j
N
ikijkk xx
CN 1 1
22 )(1 .
Then the normalized values of continuous features are discretized into L finite
levels to facilitate finding probabilities. The corresponding discrete values are
ijkx~ . The mutual information of the kth feature is
L
l
C
j jljk
jljkjljkk PxP
xPxPI
1 12 )()~(
),~(log),~(),(
ωx
![Page 11: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/11.jpg)
11
The symmetric uncertainty (SU) is derived from the mutual information by
normalizing it to the entropies of the feature values and target classes.
)()(
),(2),(
ωx
ωxωx
HH
ISU
k
kk ,
where the entropy of variable X is found by )(log)()( 2 ii
i xPxPXH .
![Page 12: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/12.jpg)
Sequential < O(M2)◦ Forward selection, e.g.,
Fast correlation-based filter (FCBF) Minimal-redundancy-maximal-relevance
(mRMR) ◦ Backward selection◦ Bidirectional
Random◦ Genetic algorithm, e.g.,
Multi-objective genetic algorithms (MOGA)
12
![Page 13: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/13.jpg)
13
Selects a subset of relevant features and exclude redundant features.
Uses the symmetric uncertainty ),( ωxkSU to estimate the relevance of
feature k to the target classes.
Uses the symmetric uncertainty between two features k and o ),( okSU xx
to approximate the redundancy between the two features.
![Page 14: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/14.jpg)
14
Grows a subset of predominant features by adding the relevant features to
the empty set in descending ),( ωxkSU order.
Whenever feature k is added, FCBF excludes from consideration for
addition to the subset all remaining redundant features o that have
),(),( ωxxx ook SUSU .
In other words, it excludes all features that their respective correlation
with already selected features is larger than or equals their correlation with
the target classes.
![Page 15: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/15.jpg)
15
For the complete set of features X, the subset S of m features that has the maximal
relevance criterion is the subset that satisfies the maximal mean value of all
mutual information values between individual features ix and class ω .
S
i
i
Im
DSDx
ωxω ),(1
),,(max
![Page 16: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/16.jpg)
16
The subset S of m features that has the minimal redundancy criterion is the subset
that satisfies the minimal mean value of all mutual information values between all
pairs of features ix and jx .
S
ji
ji
Im
RSRxx
xx,
2),(
1),(min
![Page 17: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/17.jpg)
17
In the mRMR algorithm, the subset S of m best features is grown iteratively using
forward search algorithm. The following criterion is used to add the jx feature to
the previous subset of 1m features:
1
1
),(1
1),(max
mimj S
jijSX
Im
Ix
xxxωx
![Page 18: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/18.jpg)
Use NSGA to search for optimal set of solutions with two objectives:1. Minimize the number of features used in
classification.2. Minimize the classification error.
18
![Page 19: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/19.jpg)
19
![Page 20: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/20.jpg)
20
![Page 21: Feature Selection](https://reader035.vdocuments.site/reader035/viewer/2022062314/568149db550346895db700d1/html5/thumbnails/21.jpg)
21