unsupervised learning techniques to diversifying and pruning random forest
TRANSCRIPT
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Unsupervised Learning Techniques to Diversifyingand Pruning Random Forest
Dr Mohamed Medhat Gaber
School of Computing Science and Digital MediaRobert Gordon University
27 January 2015
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Acknowledgement
Work done in collaboration with PhD student Khaled Fawagrehand co-supervisor Dr Eyad Elyan
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
1 BackgroundData ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
2 Clustering and Ensemble DiversityCLUB-DRFExperimental Study
3 Outlier Scoring and Ensemble DiversityLOFB-DRFExperimental Study
4 Summary and Future WorkSummaryFuture Work
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
What is Data Classification?
Data classification is the process of assigning a class(labelling) to a data instance, based on the values of a set ofpredictive attributes (features).
The process has two stages:1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model(classifier).
2 Model usage: once the model is constructed, it can bedeployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressingthe data classification process (e.g., decision trees, artificialneural networks, and support vector machine).
Predictive accuracy has been the major concern whendesigning a new classification technique, followed by timeneeded for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
What is Data Classification?
Data classification is the process of assigning a class(labelling) to a data instance, based on the values of a set ofpredictive attributes (features).The process has two stages:
1 Model construction: potentially a large number of “labelled”instances are fed to a classification technique to build a model(classifier).
2 Model usage: once the model is constructed, it can bedeployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressingthe data classification process (e.g., decision trees, artificialneural networks, and support vector machine).
Predictive accuracy has been the major concern whendesigning a new classification technique, followed by timeneeded for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
What is Data Classification?
Data classification is the process of assigning a class(labelling) to a data instance, based on the values of a set ofpredictive attributes (features).The process has two stages:
1 Model construction: potentially a large number of “labelled”instances are fed to a classification technique to build a model(classifier).
2 Model usage: once the model is constructed, it can bedeployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressingthe data classification process (e.g., decision trees, artificialneural networks, and support vector machine).
Predictive accuracy has been the major concern whendesigning a new classification technique, followed by timeneeded for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
What is Data Classification?
Data classification is the process of assigning a class(labelling) to a data instance, based on the values of a set ofpredictive attributes (features).The process has two stages:
1 Model construction: potentially a large number of “labelled”instances are fed to a classification technique to build a model(classifier).
2 Model usage: once the model is constructed, it can bedeployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressingthe data classification process (e.g., decision trees, artificialneural networks, and support vector machine).
Predictive accuracy has been the major concern whendesigning a new classification technique, followed by timeneeded for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similarprocedure
Attributes (features) represented in internal nodes with theirvalues given on the links for tree traversal (a variation of thisexists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used tofind the best attribute to split on (e.g., information gain, gainratio, Gini index, and Chi-square)
The first attribute which is called the root is the bestattribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with findingthe best attribute (attribute = value) to split on at eachiteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winningclass has been thoroughly investigated by machine learningand data mining communities.
Bagging, boosting and stacking are among the majorapproaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse numberof samples in the dataset.
Boosting builds classifiers in a sequence encouraging laterbuilt classifiers to be expert in classifying incorrectly classifiedinstances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a newdataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winningclass has been thoroughly investigated by machine learningand data mining communities.
Bagging, boosting and stacking are among the majorapproaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse numberof samples in the dataset.
Boosting builds classifiers in a sequence encouraging laterbuilt classifiers to be expert in classifying incorrectly classifiedinstances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a newdataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winningclass has been thoroughly investigated by machine learningand data mining communities.
Bagging, boosting and stacking are among the majorapproaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse numberof samples in the dataset.
Boosting builds classifiers in a sequence encouraging laterbuilt classifiers to be expert in classifying incorrectly classifiedinstances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a newdataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winningclass has been thoroughly investigated by machine learningand data mining communities.
Bagging, boosting and stacking are among the majorapproaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse numberof samples in the dataset.
Boosting builds classifiers in a sequence encouraging laterbuilt classifiers to be expert in classifying incorrectly classifiedinstances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a newdataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winningclass has been thoroughly investigated by machine learningand data mining communities.
Bagging, boosting and stacking are among the majorapproaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse numberof samples in the dataset.
Boosting builds classifiers in a sequence encouraging laterbuilt classifiers to be expert in classifying incorrectly classifiedinstances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a newdataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictiveaccuracy
There are many ways to measuring such diversity; it is not astraightforward process
Regardless of the used measure, diversity has been the targetof a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number ofdifferent classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forests: An Overview
An ensemble classification and regression techniqueintroduced by Leo Breiman
It generates a diversified ensemble of decision trees adoptingtwo methods:
A bootstrap sample is used for the construction of each tree(bagging), resulting in approximately 63.2% unique samples,and the rest are repeatedAt each node split, only a subset of features are drawnrandomly to assess the goodness of each feature/attribute (
√F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of theexperiment)
Random Forest was the first ranked, followed by SVM withGaussian kernel
Reference
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D.(2014). Do we need hundreds of classifiers to solve real worldclassification problems?. The Journal of Machine LearningResearch, 15(1), 3133-3181.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of theexperiment)
Random Forest was the first ranked, followed by SVM withGaussian kernel
Reference
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D.(2014). Do we need hundreds of classifiers to solve real worldclassification problems?. The Journal of Machine LearningResearch, 15(1), 3133-3181.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of theexperiment)
Random Forest was the first ranked, followed by SVM withGaussian kernel
Reference
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D.(2014). Do we need hundreds of classifiers to solve real worldclassification problems?. The Journal of Machine LearningResearch, 15(1), 3133-3181.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
Data ClassificationEnsemble ClassificationEnsemble DiversityRandom Forests
Improving Random Forests
Source: Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from earlydevelopments to recent advancements. Systems Science & Control Engineering: AnOpen Access Journal, 2(1), pp. 602-609.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesiveclusters that are well separated
A good clustering model diversifies among members ofdifferent clusters
Inspired by this observation, we hypothesised that if trees inthe Random Forest are clustered, we can use a small subset(typically one tree) from each cluster to produce a diversifiedRandom Forest
The benefits are two fold
An increased diversificationA smaller ensemble, leading to faster classification ofunlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesiveclusters that are well separated
A good clustering model diversifies among members ofdifferent clusters
Inspired by this observation, we hypothesised that if trees inthe Random Forest are clustered, we can use a small subset(typically one tree) from each cluster to produce a diversifiedRandom Forest
The benefits are two fold
An increased diversificationA smaller ensemble, leading to faster classification ofunlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesiveclusters that are well separated
A good clustering model diversifies among members ofdifferent clusters
Inspired by this observation, we hypothesised that if trees inthe Random Forest are clustered, we can use a small subset(typically one tree) from each cluster to produce a diversifiedRandom Forest
The benefits are two fold
An increased diversificationA smaller ensemble, leading to faster classification ofunlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesiveclusters that are well separated
A good clustering model diversifies among members ofdifferent clusters
Inspired by this observation, we hypothesised that if trees inthe Random Forest are clustered, we can use a small subset(typically one tree) from each cluster to produce a diversifiedRandom Forest
The benefits are two fold
An increased diversificationA smaller ensemble, leading to faster classification ofunlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF
We termed the method CLUsterBased Diversified Random Forests(CLUB-DRF)
Three stages are followed:
A Random Forest is inducedusing the traditional methodTrees are clustered according totheir classification patternOne or more representative arechosen from each cluster to formthe pruned Random Forest
…....
…....
C(t1, T) C(tn, T)
t1 ……. tn
Parent RF
Training Set
Random Forest Algorithm
Clustering Algorithm
Cluster 1 Cluster k
Representative Selection
Testing Set
t1 ……. tk
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF
We termed the method CLUsterBased Diversified Random Forests(CLUB-DRF)
Three stages are followed:
A Random Forest is inducedusing the traditional methodTrees are clustered according totheir classification patternOne or more representative arechosen from each cluster to formthe pruned Random Forest
…....
…....
C(t1, T) C(tn, T)
t1 ……. tn
Parent RF
Training Set
Random Forest Algorithm
Clustering Algorithm
Cluster 1 Cluster k
Representative Selection
Testing Set
t1 ……. tk
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
RandomBest performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
RandomBest performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
RandomBest performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
RandomBest performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k : 5, 10, 15, 20, 25, 30, 35,and 40
We used one representative tree per cluster based on the OutOf Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Summarised Results
0
3
6
9
10 20 30 40Size (Number of Trees)
Num
ber
of D
atas
ets
Method
CLUB−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
CLUB-DRFExperimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to begenerated by a different mechanism
By analogy, trees that are significantly different (diverse) fromthe set of other trees in the Random Forest can be seen asoutliers
Local Outlier Factor (LOF) assigns a real number to eachinstance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverseensemble of trees can be formed using outlier detectionmethod
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to begenerated by a different mechanism
By analogy, trees that are significantly different (diverse) fromthe set of other trees in the Random Forest can be seen asoutliers
Local Outlier Factor (LOF) assigns a real number to eachinstance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverseensemble of trees can be formed using outlier detectionmethod
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to begenerated by a different mechanism
By analogy, trees that are significantly different (diverse) fromthe set of other trees in the Random Forest can be seen asoutliers
Local Outlier Factor (LOF) assigns a real number to eachinstance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverseensemble of trees can be formed using outlier detectionmethod
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to begenerated by a different mechanism
By analogy, trees that are significantly different (diverse) fromthe set of other trees in the Random Forest can be seen asoutliers
Local Outlier Factor (LOF) assigns a real number to eachinstance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverseensemble of trees can be formed using outlier detectionmethod
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF
We termed the methodLocal Outlier Factor BasedDiversified Random Forests(LOFB-DRF)
It follows similar steps toCLUB-DRF
Each tree is assigned LOFvalue
Trees are then chosenaccording to two criteria
Predictive accuracyLOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF
We termed the methodLocal Outlier Factor BasedDiversified Random Forests(LOFB-DRF)
It follows similar steps toCLUB-DRF
Each tree is assigned LOFvalue
Trees are then chosenaccording to two criteria
Predictive accuracyLOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF
We termed the methodLocal Outlier Factor BasedDiversified Random Forests(LOFB-DRF)
It follows similar steps toCLUB-DRF
Each tree is assigned LOFvalue
Trees are then chosenaccording to two criteria
Predictive accuracyLOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF
We termed the methodLocal Outlier Factor BasedDiversified Random Forests(LOFB-DRF)
It follows similar steps toCLUB-DRF
Each tree is assigned LOFvalue
Trees are then chosenaccording to two criteria
Predictive accuracyLOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Options of combining LOF with predictive accuracy
Using LOF only ruling out predictive accuracyUsing a combination strategy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Options of combining LOF with predictive accuracy
Using LOF only ruling out predictive accuracyUsing a combination strategy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCIrepository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF )× accuracy ] for each tree,where normal(LOF ), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Summarised Results
0
2
4
6
10 20 30 40Size (Number of Trees)
Num
ber
of D
atas
ets
Method
LOF−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
LOFB-DRFExperimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming atdiversifying and pruning Random Forests
Results showed the potential of these two methods to furtherenhance the predictive accuracy of the method
The high level of pruning makes these techniques candidatesfor real-time applications, as the number of trees to betraversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming atdiversifying and pruning Random Forests
Results showed the potential of these two methods to furtherenhance the predictive accuracy of the method
The high level of pruning makes these techniques candidatesfor real-time applications, as the number of trees to betraversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming atdiversifying and pruning Random Forests
Results showed the potential of these two methods to furtherenhance the predictive accuracy of the method
The high level of pruning makes these techniques candidatesfor real-time applications, as the number of trees to betraversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming atdiversifying and pruning Random Forests
Results showed the potential of these two methods to furtherenhance the predictive accuracy of the method
The high level of pruning makes these techniques candidatesfor real-time applications, as the number of trees to betraversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives fromeach cluster (e.g., varying the number of representatives percluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value andpredictive accuracy
Using LOF and predictive accuracy for the choice of treerepresentatives in each cluster
Applying both methods to other ensemble classificationtechniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives fromeach cluster (e.g., varying the number of representatives percluster)Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value andpredictive accuracy
Using LOF and predictive accuracy for the choice of treerepresentatives in each cluster
Applying both methods to other ensemble classificationtechniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives fromeach cluster (e.g., varying the number of representatives percluster)Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value andpredictive accuracy
Using LOF and predictive accuracy for the choice of treerepresentatives in each cluster
Applying both methods to other ensemble classificationtechniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives fromeach cluster (e.g., varying the number of representatives percluster)Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value andpredictive accuracy
Using LOF and predictive accuracy for the choice of treerepresentatives in each cluster
Applying both methods to other ensemble classificationtechniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives fromeach cluster (e.g., varying the number of representatives percluster)Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value andpredictive accuracy
Using LOF and predictive accuracy for the choice of treerepresentatives in each cluster
Applying both methods to other ensemble classificationtechniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
BackgroundClustering and Ensemble Diversity
Outlier Scoring and Ensemble DiversitySummary and Future Work
SummaryFuture Work
Q & A
Thanks for listening!
Contact Details
Dr Mohamed Medhat GaberE-mail: [email protected]
Webpage: http://mohamedmgaber.weebly.com/LinkedIn: https://www.linkedin.com/profile/view?id=21808352
Twitter: https://twitter.com/mmmgaberResearchGate:
https://www.researchgate.net/profile/Mohamed Gaber16?ev=prf highl
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest