results of the wcci 2006 performance prediction challenge isabelle guyon

33
RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror

Upload: deon

Post on 31-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror. Part I. INTRODUCTION. Model selection. Selecting models (neural net, decision tree, SVM, …) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION

CHALLENGE

Isabelle GuyonAmir Reza Saffari Azar Alamdari

Gideon Dror

Page 2: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Part I

INTRODUCTION

Page 3: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Model selection

• Selecting models (neural net, decision tree, SVM, …)

• Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …)

• Selecting variables or features (space dimensionality reduction.)

• Selecting patterns (data cleaning, data reduction, e.g by clustering.)

Page 4: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Performance prediction

How good are you at predicting

how good you are?

• Practically important in pilot studies.

• Good performance predictions render model selection trivial.

Page 5: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Why a challenge?

• Stimulate research and push the state-of-the art.

• Move towards fair comparisons and give a voice to methods that work but may not be backed up by theory (yet).

• Find practical solutions to true problems.• Have fun…

Page 6: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

History

• USPS/NIST.• Unipen (with Lambert Schomaker): 40 institutions

share 5 million handwritten characters. • KDD cup, TREC, CASP, CAMDA, ICDAR, etc.• NIPS challenge on unlabeled data.• Feature selection challenge (with Steve Gunn):

success! ~75 entrants, thousands of entries.• Pascal challenges.• Performance prediction challenge …

1980

1990

2000

2001

2002

2003

2004

2005

Page 7: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Challenge

• Date started: Friday September 30, 2005.

• Date ended: Monday March 1, 2006

• Duration: 21 weeks.

• Estimated number of entrants: 145.

• Number of development entries: 4228.

• Number of ranked participants: 28.

• Number of ranked submissions: 117.

Page 8: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Datasets

Dataset Domain Type Feat-ures

Training Examples

Validation Examples

Test Examples

ADA Marketing Dense 48 4147 415 41471

GINA Digits Dense 970 3153 315 31532

HIVADrug discovery

Dense 1617 3845 384 38449

NOVAText classif.

Sparse binary 16969 1754 175 17537

SYLVA Ecology Dense 216 13086 1308 130858

http://www.modelselect.inf.ethz.ch/

Page 9: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

BER distribution

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

ADA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

GINA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

HIVA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

NOVA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50100150

SYLVA

BERTest BER

Page 10: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Results

Overall winners for ranked entries:

Ave rank: Roman Lutz with LB tree mix cut adaptedAve score: Gavin Cawley with Final #2

ADA: Marc Boullé with SNB(CMA)+10k F(2D) tv or SNB(CMA) + 100k F(2D) tv

GINA: Kari Torkkola & Eugene Tuv with ACE+RLSCHIVA: Gavin Cawley with Final #3 (corrected)NOVA: Gavin Cawley with Final #1SYLVA: Marc Boullé with SNB(CMA) + 10k F(3D) tv

Best AUC: Radford Neal with Bayesian Neural Networks

Page 11: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Part II

PROTOCOL and

SCORING

Page 12: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Protocol

• Data split: training/validation/test.• Data proportions: 10/1/100.• Online feed-back on validation data.• Validation label release one month before

end of challenge.• Final ranking on test data using the five

last complete submissions for each entrant.

Page 13: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Performance metrics

• Balanced Error Rate (BER): average of error rates of positive class and negative class.

• Guess error: BER = abs(testBER – guessedBER)

• Area Under the ROC Curve (AUC).

Page 14: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Optimistic guesses

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Test BER

Gu

esse

d B

ER

ADA

GINA

HIVA

NOVA

SYLVA

Page 15: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Scoring method

E = testBER + BER [1-exp(- BER/)] BER = abs(testBER – guessedBER)

Guessed BER

Cha

lleng

e sc

ore

Test BER

Test BER

Page 16: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

BER/

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.510

-4

10-3

10-2

10-1

100

101

102

103

104

BER

DE

LTA

/SIG

MA

BER/

Test BER

E testBER + BER

ADA

GINA

HIVA

NOVA

SYLVA

Page 17: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Score

-10 -8 -6 -4 -2 0 2

0.04

0.045

0.05

0.055

0.06

0.065

log(gamma)

score

GINA

Roman LutzGavin Cawley

Radford Neal

Corinne Dahinden

Wei ChuNicolai Meinshausen

E

testBER testBER+BER

E = testBER + BER [1-exp(- BER/)]

Page 18: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Score (continued)

-10 -8 -6 -4 -2 0 2

0.2

0.25

0.3

0.35

0.4

log(gamma)

scor

e

ADA

-10 -8 -6 -4 -2 0 20.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

log(gamma)

scor

e

GINA

-10 -8 -6 -4 -2 0 20.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

log(gamma)

scor

e

HIVA

-10 -8 -6 -4 -2 0 20

0.05

0.1

0.15

0.2

0.25

log(gamma)

scor

e

NOVA

-10 -8 -6 -4 -2 0 20

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

log(gamma)

scor

e

SYLVAADA GINA SYLVA

HIVA NOVA

Page 19: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Part III

RESULT ANALYSIS

Page 20: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

What did we expect?

• Learn about new competitive machine learning techniques.

• Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice.)

• Drive research in the direction of refining such methods (on-going benchmark.)

Page 21: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Method comparison

0 0.05 0.1 0.15 0.2 0.25 0.3 0.3510

-4

10-3

10-2

10-1

100

BER

Del

ta B

ER

X

TREE

NN/BNNNB

LD/SVM/KLS/GP

SYLVA

GINA

NOVA

ADA

HIVA

BER

Test BER

Page 22: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Danger of overfitting

0 20 40 60 80 100 120 140 1600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5B

ER

Time (days)

ADA

GINA

HIVA

NOVA

SYLVA

Full line: test BER

Dashed line: validation BER

Page 23: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

How to estimate the BER?

• Statistical tests (Stats): Compute it on training data; compare with a “null hypothesis” e.g. the results obtained with a random permutation of the labels.

• Cross-validation (CV): Split the training data many times into training and validation set; average the validation data results.

• Guaranteed risk minimization (GRM): Use of theoretical performance bounds.

Page 24: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Stats / CV / GRM ???

Page 25: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Top ranking methods

• Performance prediction:– CV with many splits 90% train / 10% validation– Nested CV loops

• Model selection:– Use of a single model family– Regularized risk / Bayesian priors– Ensemble methods– Nested CV loops, computationally efficient with

with VLOO

Page 26: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Other methods

• Use of training data only:– Training BER.– Statistical tests.

• Bayesian evidence.

• Performance bounds.

• Bilevel optimization.

Page 27: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Part IV

CONCLUSIONS AND FURTHER WORK

Page 28: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Open problems

Bridge the gap between theory and practice…• What are the best estimators of the variance of CV?• What should k be in k-fold?• Are other cross-validation methods better than k-

fold (e.g bootstrap, 5x2CV)?• Are there better “hybrid” methods?• What search strategies are best?• More than 2 levels of inference?

Page 29: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Future work

• Game of model selection.

• JMLR special topic on model selection.

• IJCNN 2007 challenge!

Page 30: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Benchmarking model selection?

• Performance prediction: Participants just need to provide a guess of their test performance. If they can solve that problem, they can perform model selection efficiently. Easy and motivating.

• Selection of a model from a finite toolbox: In principle a more controlled benchmark, but less attractive to participants.

Page 31: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

CLOP

• CLOP=Challenge Learning Object Package.

• Based on the Spider developed at the Max Planck Institute.

• Two basic abstractions:– Data object– Model object

http://clopinet.com/isabelle/Projects/modelselect/MFAQ.html

Page 32: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

CLOP tutorial

D=data(X,Y);hyper = {'degree=3', 'shrinkage=0.1'};

model = kridge(hyper); [resu, model] = train(model, D);tresu = test(model, testD);model = chain({standardize,kridge(hyper)});

At the Matlab prompt:

Page 33: RESULTS OF THE WCCI 2006  PERFORMANCE PREDICTION  CHALLENGE Isabelle Guyon

Conclusions

• Twice as much volume of participation as in the feature selection challenge

• Top methods as before (different order):– Ensembles of trees– Kernel methods (RLSC/LS-SVM, SVM)– Bayesian neural networks– Naïve Bayes.

• Danger of overfitting.• Triumph of cross-validation?