boosting neural networks published by holger schwenk and yoshua benggio neural computation,...

16
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8):1869-1887, 2000. Presented by Yong Li

Upload: jesse-hunt

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Boosting Neural Networks

Published by Holger Schwenk and Yoshua BenggioNeural Computation, 12(8):1869-1887, 2000.

Presented by Yong Li

Outline

1. Introduction

2. AdaBoost

3. 3 versions of AdaBoost for Neural Network

4. Results

5. Conclusions

6. Discussions

Introduction

• Boosting – a general method to improve the performance of a learning method.

• AdaBoost is a relatively new one of Boosting algorithms.

• Many empirical studies for AdaBoost using decision tree as base classifiers. (Breiman 1996, Drucker and cortes, 1996, et al)

• Also theoretically understanding. (Schapire et al 1997, Breidman 1998, Schapire 1999)

Introduction

• But applications have all been to decision trees. No applications to multi-layer artificial neural networks. (At that time)

• The questions which this paper try to answer– Does AdaBoost work as well for neural networks as for

decision tree?– Does it behave in a similar way?– And more?

AdaBoost (Adaptive Boosting)• It is often possible to increase the accuracy

of a classifier by averaging the decisions of an ensemble of classifiers.

• Two popular ensemble methods. Bagging and Boosting.– Bagging improves generation performance due to a

reduction in variance while maintaining or only slightly increasing bias.

– AdaBoost constructs a composite classifier by sequentially training classifier while putting more and more emphasis on certain patterns.

AdaBoost

• AdaBoost M2 is used in the experiments

Applying AdaBoost to neural networks

• Three versions of AdaBoost are compared in this paper.– (R) Training the t-th classifier with a fixed

training set– (E) Training the t-th classifier using a different

training set at each epoch– (W) Training the t-th calssifier by directly

weighting the cost function of the t-th neural network.

Results

• Experiments are performed on three data sets.– The online data set collected at Paris 6 university

• 22 attributes([-1 1]22), 10 classes.

• 1200 examples for learning and 830 examples for testing

– UCI letter• 16 attributes and 26 classes

• 16000 for training and 4000 for testing

– Satimage Data set• 36 attributes and 6 classes

• 4435 for training and 2000 for testing

Results of online data

Results of online data

•Some conclusions–Boosting is better than Bagging–AdaBoost is less useful for very big networks.–(E) and (W) versions are better than (R)

Results of online data

•The generation errors continue decrease after the training error reach zero.

Results of online data

The number of examples with high margin increases when more classifier are combined by boosting

Note: There are opposite results about the margin cumulative distribution.

Results of online data

Bagging has no significant influence on the margin distribution

The results for UCI letters and Satimage data sets

•Only E and W version are applied. They obtain same results.

•The same conclusions are drawn as those of online data. (Some results are omitted)

Conclusion

• AdaBoost can significantly improve the neural classifiers.– Does AdaBoost work as well for neural networks as for decision

tree?• Answer Yes

– Does it behave in a similar way?• Answer Yes

– Overfitting • Still there

– Other questions• Short answers

Discussions

• Empirically shows AdaBoost works well for neural networks

• The algorithm description is misleading.– Dt(i), Dt(i, y)