20141030 feature-value dsaa 2014 v6_final
TRANSCRIPT
![Page 1: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/1.jpg)
On Selecting Feature-Value Pairs on Smart Phonesfor Activity InferencesGunarto Sindoro Njoo, Yu-Hsiang Peng, Kuo-Wei Hsu, Wen-Chih Peng
![Page 2: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/2.jpg)
Introduction• Smart phone is getting smarter and smarter
2
![Page 3: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/3.jpg)
Introduction• Computer • RAM: 2~8 GB on average• Storage: >500 GB• Power: Hundreds Watts
• Smartphone • RAM: 512 MB~3 GB• Storage: >4 GB• Power: a few Watts
• Sensor Hub • RAM: 16~64 KB• Storage: 64 KB~256 KB• Power: mW
3
![Page 4: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/4.jpg)
Activity Inference Process
Raw data
Discretization
• MDLP• LGD
Classifier construction
• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM
4
![Page 5: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/5.jpg)
Activity Inference Process
Raw data
Discretization
• MDLP• LGD
Feature-Value
selection
• ONEFVAS• GIFVAS• CBFVAS
Classifier construction
• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM
5
![Page 6: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/6.jpg)
Feature-Value Selection• What is feature-value?• A range of sensor reading• e.g. Accelerometer magnitude high, GPS at home, light bright
• Why using feature-value?• Sensor reading relation with activity: relevant or not• e.g. Accelerometer magnitude value reading
Accelerometer: LowAccelerometer: Low Accelerometer: high 6
![Page 7: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/7.jpg)
FEATURE-VALUE METHODS
One-CutIteration-basedCorrelated-based
7
![Page 8: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/8.jpg)
One-Cut (ONEFVAS)• Entropy-based selection using threshold
Entropy <= 0.5
8
![Page 9: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/9.jpg)
Iteration-based (GIFVAS)• Looping on the threshold, selecting feature-values iteratively.• Evaluating accuracy for each iteration• If accuracy reduction is big• Then cancel the selection on this iteration, tag any feature-value
as special• Any special feature-value will be remained until the last iteration
• Special feature-value• Frequent but confusing • Pure but infrequent
1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.0800.00%
50.00%
100.00%
Accuracy
Entropy Threshold1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.080
0.00%
50.00%
100.00%
Feature-Value Pairs
Entropy Threshold
9
![Page 10: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/10.jpg)
Correlation-based• Using Pearson correlation in the feature level• Using entropy in the feature-value level• For each feature-value pair• Generate correlated feature-value • Sort the correlated feature-value using entropy
• Pick only the best-N feature-value from it• Discard other feature-value
1 3 5 7 9 11 13 15 1780.00%
85.00%
90.00%
95.00%
100.00%
Accuracy
CorrelationOriginal
Best-N feature-value remained
1 3 5 7 9 11 13 15 17350.00 KB
550.00 KB
750.00 KB
950.00 KB
1150.00 KB
Model Size
CorrelationOriginal
Best-N feature-value remained
10
![Page 11: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/11.jpg)
Experiments• Environments:• Intel Quad Core 2.66GHz• RAM 8 GB• Java 7• Weka 3.6.11 (all default parameter)
• Datasets:• Collect from 11 participants• At least 2 different activities, up to 6 activities• Average 3 weeks, maximum 2 months
• Classifier Algorithm:• Naïve Bayesian• Decision Tree (J48)• SVM (SMO)• k-Nearest Neigbor (kNN)
11
![Page 12: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/12.jpg)
Experiments (Model Size)
• Feature-value selection is not effective on Naïve Bayesian• In general, feature-value selection works best on decision tree
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (LGD)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
Naïve BayesDecision TreekNNSVM
12
![Page 13: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/13.jpg)
Experiments (LGD)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (LGD)
Original ONEFVAS GIFVAS CBFVAS40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (LGD)
Decision TreekNNSVM
• ONEFVAS gives the biggest saving on model size, but accuracy is low• Most stable accuracy is on CBFVAS, while reducing model size well
13
![Page 14: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/14.jpg)
Experiments (MDLP)
Original ONEFVAS GIFVAS CBFVAS0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
Original ONEFVAS GIFVAS CBFVAS40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (MDLP)
Decision TreekNN
• Accuracy are more stable, while reductions on model size are well• In most cases, decision tree gets the most benefit.
14
![Page 15: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/15.jpg)
Conclusions• Proposed feature-value selection for reducing model size• ONEFVAS – Using entropy threshold• GIFVAS – Using iteration on entropy threshold• CBFVAS – Using correlation and entropy
• Proposed method is able to reduce model size while maintaining accuracy performance• Performance varies on discretization and classification algorithms
• Decision Tree gets the most benefit
15
![Page 16: 20141030 Feature-Value DSAA 2014 V6_FINAL](https://reader036.vdocuments.site/reader036/viewer/2022062412/587573891a28ab78498b45c1/html5/thumbnails/16.jpg)
Thank youOn Selecting Feature-Value Pairs on Smart Phones for Activity Inferences
Presented by: Gunarto Sindoro [email protected]
16