![Page 1: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/1.jpg)
Effective and
Unsupervised
Fractal-based
Feature Selection
for Very Large
Datasets
Removing linear and non-linear attribute correlations
Antonio Canabrava Fraideinberze
Jose F Rodrigues-Jr
Robson Leonardo Ferreira Cordeiro
Databases and Images Group
University of São Paulo
São Carlos - SP - Brazil
![Page 2: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/2.jpg)
2
Terabytes ?
…
How to analyze that data?
![Page 3: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/3.jpg)
3
Terabytes?
Parallel processing
and dimensionality
reduction, for
sure...
…
How to analyze that data?
![Page 4: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/4.jpg)
How to analyze that data?
4
Terabytes?
, but how to remove
linear and non-linear
attribute correlations,
besides irrelevant
attributes?
…
![Page 5: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/5.jpg)
How to analyze that data?
5
Terabytes?
, and how to reduce
dimensionality without
human supervision
and being task
independent?
…
![Page 6: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/6.jpg)
6
Terabytes?
Curl-RemoverMedium-
dimensionality
…
How to analyze that data?
![Page 7: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/7.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
7
![Page 8: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/8.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
8
![Page 9: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/9.jpg)
Fundamental Concepts
Fractal Theory
...
...
...
...9
![Page 10: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/10.jpg)
Fundamental Concepts
Fractal Theory
...
...
...
...10
![Page 11: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/11.jpg)
Fundamental Concepts
Fractal Theory
11
![Page 12: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/12.jpg)
Fundamental Concepts
Fractal Theory
12
![Page 13: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/13.jpg)
Fundamental Concepts
Fractal Theory
Embedded, Intrinsic and Fractal Correlation Dimension
Fractal Correlation Dimension ≅ Intrinsic Dimension
13
![Page 14: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/14.jpg)
Fundamental Concepts
Fractal Theory
Embedded, Intrinsic and Fractal Correlation Dimension
Embedded dimension ≅ 3
Intrinsic dimension ≅ 1
Embedded dimension ≅ 3
Intrinsic dimension ≅ 2
14
![Page 15: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/15.jpg)
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
15
![Page 16: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/16.jpg)
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
16
![Page 17: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/17.jpg)
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
log(r)17
![Page 18: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/18.jpg)
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
log(r)18
![Page 19: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/19.jpg)
Fundamental Concepts
Fractal Theory
Fractal Correlation Dimension - Box Counting
19
Multidimensional
Quad-tree[Traina Jr. et al, 2000]
![Page 20: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/20.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
20
![Page 21: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/21.jpg)
Related Work
Dimensionality Reduction - Taxonomy 1
Dimensionality
Reduction
Supervised AlgorithmsUnsupervised
Algorithms
Principal Component
Analysis
Singular Vector
Decomposition
Fractal Dimension
Reduction
21
![Page 22: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/22.jpg)
Related Work
Dimensionality Reduction - Taxonomy 2
Dimensionality
Reduction
Feature ExtractionFeature Selection
Principal Component
Analysis
Singular Vector
Decomposition
Fractal Dimension
Reduction
EmbeddedFilterWrapper
22
![Page 23: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/23.jpg)
Related Work
23
Terabytes?
Existing methods need supervision,
miss non-linear correlations, cannot
handle Big Data or work for
classification only
…
![Page 24: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/24.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
24
![Page 25: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/25.jpg)
General Idea
25
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
![Page 26: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/26.jpg)
General Idea
26
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Builds partial trees
for the full dataset
and for its E
(E-1)-dimensional
projections
![Page 27: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/27.jpg)
General Idea
27
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
TreeID
+
cell
spatial
position
Partial
count of
points
![Page 28: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/28.jpg)
General Idea
28
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Sums partial point
counts and reports
log(r) and log(sum2)
for each tree
![Page 29: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/29.jpg)
General Idea
29
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Computes D2 for
the full dataset and
pD2 for each of its E
(E-1)-dimensional
projections
![Page 30: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/30.jpg)
General Idea
30
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
The least relevant
attribute, i.e., the one
not in the projection
that minimizes
| D2 - pD2 |
![Page 31: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/31.jpg)
General Idea
31
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Spots the second
least relevant
attribute …
![Page 32: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/32.jpg)
General Idea
3 Main Issues
32
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
![Page 33: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/33.jpg)
General Idea
3 Main Issues
33
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.1° Too much data to
be shuffled – one
data pair per cell/tree
![Page 34: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/34.jpg)
General Idea
3 Main Issues
34
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.2° One
data pass
per
irrelevant
attribute
![Page 35: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/35.jpg)
General Idea
3 Main Issues
35
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
3° Not enough
memory for mappers
![Page 36: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/36.jpg)
Proposed Method
Curl-Remover
36
1° Issue - Too much data to be shuffled; one data pair per
cell/tree;
Our solution - Two-phase dimensionality reduction:
a) Serial feature selection in a tiny data sample (one reducer). Used to
speed-up processing only;
b) All mappers project data into a fixed subspace
![Page 37: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/37.jpg)
37
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Builds/reports N (2 or
3) tree levels of
lowest resolution…
Proposed Method
Curl-Remover
![Page 38: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/38.jpg)
38
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.… plus the points
projected into the M (2
or 3) most relevant
attributes of sample
Proposed Method
Curl-Remover
![Page 39: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/39.jpg)
39
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Builds the full trees from
their low resolution level
cells and the projected
points
Proposed Method
Curl-Remover
![Page 40: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/40.jpg)
40
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
High resolution cells
are never shuffled
![Page 41: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/41.jpg)
Proposed Method
Curl-Remover
41
2° Issue - One data pass per irrelevant attribute;
Our solution – Stores/reads the tree level of highest
resolution, instead of the original data.
![Page 42: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/42.jpg)
42
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Rdb = cost to read dataset;
TWRtree = cost to transfer,
write and read the last tree
level in next reduce step;
If (Rdb > TWRtree)
then writes tree;
Proposed Method
Curl-Remover
![Page 43: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/43.jpg)
43
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
![Page 44: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/44.jpg)
44
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Writes tree’s last level in
HDFS
Proposed Method
Curl-Remover
![Page 45: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/45.jpg)
45
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Reads tree’s last level
from HDFS
Proposed Method
Curl-Remover
![Page 46: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/46.jpg)
46
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.
Proposed Method
Curl-Remover
Reads dataset
only twice
![Page 47: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/47.jpg)
Proposed Method
Curl-Remover
47
3° Issue - Not enough memory for mappers;
Our solution – Sorts data in mappers and reports “tree slices”
whenever needed.
![Page 48: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/48.jpg)
48
Removes the E - ⌈D2⌉ least relevant attributes, one at a time
in ascending order of relevance.Sorts its local points and
builds “tree slices”
monitoring memory
consumption
Proposed Method
Curl-Remover
![Page 49: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/49.jpg)
Proposed Method
Curl-Remover
49
Y
X
![Page 50: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/50.jpg)
Proposed Method
Curl-Remover
50
Reports “tree slices”
with very little overlap
![Page 51: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/51.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
51
![Page 52: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/52.jpg)
Evaluation
Datasets
Sierpinski - Sierpinski Triangle + 1 attribute linearly correlated + 2 attributes non-
linearly correlated. 5 attributes, 1.1 billion points;
Sierpinski Hybrid - Sierpinski Triangle + 1 attribute non-linearly correlated + 2
random attributes. 5 attributes, 1.1 billion points;
Yahoo! Network Flows - communication patterns between end-users in the web. 12
attributes, 562 million points;
Astro - high-resolution cosmological simulation. 6 attributes, 1 billion points;
Hepmass - physics-related dataset with particles of unknown mass. 28 attributes, 10.5
million points;
Hepmass Duplicated – Hepmass + 28 correlated attributes. 56 attributes, 10.5
million points.
52
![Page 53: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/53.jpg)
Evaluation
Fractal Dimension
Hepmass
53
![Page 54: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/54.jpg)
Evaluation
Fractal Dimension
Hepmass Duplicated
54
![Page 55: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/55.jpg)
Evaluation
Comparison with sPCA - Classification
55
![Page 56: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/56.jpg)
Evaluation
Comparison with sPCA - Classification
56
8% more accurate,
7.5% faster
![Page 57: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/57.jpg)
Evaluation
Comparison with sPCA
Percentage of Fractal Dimension after selection
57
![Page 58: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/58.jpg)
Agenda
Fundamental Concepts
Related Work
Proposed Method
Evaluation
Conclusion
58
![Page 59: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/59.jpg)
Conclusions
Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of attributes
to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
59
![Page 60: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/60.jpg)
Conclusions
Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of attributes
to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
60
![Page 61: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/61.jpg)
Conclusions
Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics of
the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
61
![Page 62: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/62.jpg)
Conclusions
Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics
of the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
62
![Page 63: Effective and Unsupervised Fractal-based Feature Selection for Very Large Datasets: removing linear and non-linear attribute correlations](https://reader031.vdocuments.site/reader031/viewer/2022030214/5899eac51a28ab96418b640d/html5/thumbnails/63.jpg)
Conclusions
Accuracy - eliminates both linear and non-linear attribute correlations,
besides irrelevant attributes; 8% better than sPCA;
Scalability – linear scalability on the data size (theoretical analysis);
experiments with up to 1.1 billion points;
Unsupervised - it does not require the user to guess the number of
attributes to be removed neither requires a training set;
Semantics - it is a feature selection method, thus maintaining the semantics
of the attributes;
Generality - it suits for analytical tasks in general, and not only for
classification;
63