20141030 feature-value dsaa 2014 v6_final

On Selecting Feature-Value Pairs on Smart Phonesfor Activity InferencesGunarto Sindoro Njoo, Yu-Hsiang Peng, Kuo-Wei Hsu, Wen-Chih Peng

Introduction• Smart phone is getting smarter and smarter

2

Introduction• Computer • RAM: 2~8 GB on average• Storage: >500 GB• Power: Hundreds Watts

• Smartphone • RAM: 512 MB~3 GB• Storage: >4 GB• Power: a few Watts

• Sensor Hub • RAM: 16~64 KB• Storage: 64 KB~256 KB• Power: mW

3

Activity Inference Process

Raw data

Discretization

• MDLP• LGD

Classifier construction

• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM

4

Activity Inference Process

Raw data

Discretization

• MDLP• LGD

Feature-Value

selection

• ONEFVAS• GIFVAS• CBFVAS

Classifier construction

• Decision Tree• Naïve Bayesian• K-Nearest Neighbor• SVM

5

Feature-Value Selection• What is feature-value?• A range of sensor reading• e.g. Accelerometer magnitude high, GPS at home, light bright

• Why using feature-value?• Sensor reading relation with activity: relevant or not• e.g. Accelerometer magnitude value reading

Accelerometer: LowAccelerometer: Low Accelerometer: high 6

FEATURE-VALUE METHODS

One-CutIteration-basedCorrelated-based

7

One-Cut (ONEFVAS)• Entropy-based selection using threshold

Entropy <= 0.5

8

Iteration-based (GIFVAS)• Looping on the threshold, selecting feature-values iteratively.• Evaluating accuracy for each iteration• If accuracy reduction is big• Then cancel the selection on this iteration, tag any feature-value

as special• Any special feature-value will be remained until the last iteration

• Special feature-value• Frequent but confusing • Pure but infrequent

1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.0800.00%

50.00%

100.00%

Accuracy

Entropy Threshold1.000 0.885 0.770 0.655 0.540 0.425 0.310 0.195 0.080

0.00%

50.00%

100.00%

Feature-Value Pairs

Entropy Threshold

9

Correlation-based• Using Pearson correlation in the feature level• Using entropy in the feature-value level• For each feature-value pair• Generate correlated feature-value • Sort the correlated feature-value using entropy

• Pick only the best-N feature-value from it• Discard other feature-value

1 3 5 7 9 11 13 15 1780.00%

85.00%

90.00%

95.00%

100.00%

Accuracy

CorrelationOriginal

Best-N feature-value remained

1 3 5 7 9 11 13 15 17350.00 KB

550.00 KB

750.00 KB

950.00 KB

1150.00 KB

Model Size

CorrelationOriginal

Best-N feature-value remained

10

Experiments• Environments:• Intel Quad Core 2.66GHz• RAM 8 GB• Java 7• Weka 3.6.11 (all default parameter)

• Datasets:• Collect from 11 participants• At least 2 different activities, up to 6 activities• Average 3 weeks, maximum 2 months

• Classifier Algorithm:• Naïve Bayesian• Decision Tree (J48)• SVM (SMO)• k-Nearest Neigbor (kNN)

11

Experiments (Model Size)

• Feature-value selection is not effective on Naïve Bayesian• In general, feature-value selection works best on decision tree

Original ONEFVAS GIFVAS CBFVAS0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Model Size (LGD)


10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Model Size (MDLP)

Naïve BayesDecision TreekNNSVM

12

Experiments (LGD)


10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Model Size (LGD)


50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Accuracy (LGD)

Decision TreekNNSVM

• ONEFVAS gives the biggest saving on model size, but accuracy is low• Most stable accuracy is on CBFVAS, while reducing model size well

13

Experiments (MDLP)


10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Model Size (MDLP)


50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Accuracy (MDLP)

Decision TreekNN

• Accuracy are more stable, while reductions on model size are well• In most cases, decision tree gets the most benefit.

14

Conclusions• Proposed feature-value selection for reducing model size• ONEFVAS – Using entropy threshold• GIFVAS – Using iteration on entropy threshold• CBFVAS – Using correlation and entropy

• Proposed method is able to reduce model size while maintaining accuracy performance• Performance varies on discretization and classification algorithms

• Decision Tree gets the most benefit

15

Thank youOn Selecting Feature-Value Pairs on Smart Phones for Activity Inferences

Presented by: Gunarto Sindoro [email protected]

16

20141030 feature-value dsaa 2014 v6_final

Documents