classifier inspired scaling for training set...

Classifier Inspired Scaling forTraining Set SelectionWalter Bennette

DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511

Outline

Instance-based classification

Training set selection

Scaling approaches

Experimental results

Stratified

Classifier inspired

What are they used for?

Classification of gene expression

Content-based image retrieval

Text categorization

Load forecasting assistant for power company

What if there is a large amount of data?

What if there is a huge amount of data?

What if there is a serious amount of data?

Training set selection (TSS)

Instead of maintaining all of the training data

Keep only certain necessary data points

Edited Nearest Neighbors (ENN)

Formulation:

Effect:

An instance is removed from the training data if its does not agree with themajority of it nearest neighbors

Makes decision boundaries smoother

Doesn't remove much data

Edited Neares Neighbors (ENN)

Formulation:

DROP3 (Training set TR): Selection set S. Let S = TR after applying ENN. For each instance Xi in S: Find the k +1 nearest neighbors of Xi in S. Add Xi to each of its lists of associates. For each instance Xi in S: Let with = # of associates of Xi classified correctly with Xi as a neighbor. Let without = # of associates of Xi classified correctly without Xi. If without ≥ with Remove Xi from S. For each associate a of Xi Remove Xi from a’s list of neighbors. Find a new nearest neighbor for a. Add a to its new list of associates. Endif Return S.

Formulation:

Effect:

Iterative procedure that compares accuracy of neighbors with and withoutmembers

Removes much more data than ENN

Maintains acceptable accuracy

Genetic algorithm (CHC)

Formulation:

Effectiveness:

A chromosome is a subset of the training data

A binary gene represents each instance

· Fitness = α 0 Accuracy + (1 + α) 0 Reduction

Removes a large amount of data

Achieves acceptable accuracy

Genetic algorithm (CHC)

Scaling

As datasets grow, TSS becomes more and more expensive

May be prohibitive

The vast majority of scaling approaches rely on a stratified approach

No scaling

Stratified scaling

Representative Data Detection (ReDD)

Lin et al. 2015

Used for support vector machines and did not consider data reduction

Our approach

Classifier inspired approach

Based heavily on ReDD

Used for kNN and monitor data reduction

The filter

The "Balance"" dataset

Determine scale positions

Attributes

Balanced

Leaning right

Leaning left

Left weight

Left distance

Right weight

Right distance

The filter

Experimentation

Parameters:

Design:

Learn a Random Forest for the filter

Split data into 1/3rd, 2/3rd

Perform for ENN, CHC, and DROP3 with 3-NN

Compare no scaling, stratified, and classifier inspired

Calculate reduction, accuracy, and computation time with 10-fold CV

Datasets

10 experimental datasets from KEEL·

Reduction

Accuracy

Results

Maintains accuracy (mostly)

Maintains data reduction

Slower than stratified approach, but may improve for larger datasets

Future work

Perform for many more datasets

Apply to very large datasets

Investigate if damage can be spotted apriori

Conclusion

Promising candidate for scaling Training Set Selection to large datasets

Questions

Walter Bennette walter.bennette.1@us.af.mil 315-330-4957

classifier inspired scaling for training set...

Documents

inspired needle ltd inspired needle news

info biologically-inspired computing · biologically...

2016 ieee international electron devices meeting · memory...

scaling strength distributions in quasi-brittle materials...

scaling efforts to counter- wildlife trafficking … ·...

dimensionality reduction for seismic attribute...

scaling factors and scaling parameters

color glass condensate inspired scaling and gluon saturation

uk ffag plans introduction to ffags scaling vs non-scaling...

chapter nine measurement and scaling: noncomparative scaling...

welcome the second spring of dataflow and parallel...

mp overview: experience and scaling to 2010 110/29/2015jw -...

inspired moments, inspired life

scaling inblast neurotrauma - ircobi · strain scaling...

scaling up // design inspired applications of 3d printing

who is inspired 2be inspired slide show

storage: scaling out > scaling up?

bio inspired design - lecture5. bio scaling

the scaling scan - cimmyt · the scaling scan 4 what is...

scaling biologically inspired computer vision algorithms...