competitions in machine learning: the fun, the art, and the science isabelle guyon clopinet,...
TRANSCRIPT
Competitions in machine learning: the fun, the art, and the science
Isabelle GuyonClopinet, Berkeley, California
http://clopinet.com/challenges [email protected]
My Itinerary
My Kids
My Company
Recent Projects
Melanoma AppDrug toxicity via
flow cytometry
How to keep up with the s.o.a?
Organize Challenges!
This Year:
Unsupervised and Transfer Learning Challenges
http://clopinet.com/ul [email protected] http://clopinet.com/gesture [email protected]
Gesture and action recognition
Image or video indexing/retrieval
Recognition of sign languages
Handwriting recognition
Text processing
Ecology
Applications
Free registrations, cash prizes, 2 workshops (IJCNN 2011, ICML2011),
proceedings in JMLR W&CP, much fun! http://clopinet.com/ulhttp://clopinet.com/ul
Just starting …
First ULT challenge
Large database of American Sign Language developed at Boston University (http://www.bu.edu/asllrp/), including 3,800 signs with 3,000 unique class labels produced by native signers and 15 short narratives for a total of over 11,000 sign tokens altogether.
Gesture Recognition Challenge
STEP 1: Develop a system that can learn new signs with a few examples. First screening of the competitors based on their performance of a validation dataset.
STEP 2: On the site of the life competition, train the system with given “flash cards” to recognize new signs. Second screening of the competitors based on the learning curve of their system.
STEP 3: Perform short sequences of signs (charades) in front of an audience. Win if your system gets the best recognition rate.
Live Competition
DECEMBER 2011DECEMBER 2011
NIPS 2001: Launching of UTL challenge.
JUNE 2011JUNE 2011
Workshop at CVPR 2011 (accepted) Launching Gesture Recognition
challenge.
JULY 2011JULY 2011
Workshop at ICML 2011 (planned).
AUGUST 2011AUGUST 2011
Workshop at IJCNN 2011 (accepted). Results of UTL challenge.
NOVEMBER 2011 NOVEMBER 2011
Live Gesture Recognition Competition: ICCV 2011 (planned).
When and Where?
Lessons learned!
• Thousands to millions of low level features: select the most relevant one to build better, faster, and easier to understand learning machines.
X
n
m
n’
NIPS 2003 Feature Selection Challenge
Bioinformatics
Quality control
Machine vision
Customer knowledge
variables/features
examples
10
102
103
104
105
OCRHWR
MarketAnalysis
TextCategorization
Syst
em
diag
nosi
s
10 102 103 104 105
106
Applications of Feature Selection
All features FilterFeature subset Predictor
All features
Wrapper
Multiple Feature subsets
Predictor
All featuresEmbedded
method
Feature subset
Predictor
Filters, Wrappers, and Embedded Methods
1) For each feature subset, train predictor on training data.
2) Select the feature subset, which performs best on validation data.
–Repeat and average if you want to reduce variance (cross-validation).
3) Test on test data.
N variables/features
M s
ampl
es
m1
m2
m3
Split data into 3 sets:training, validation, and test set.
Bilevel Optimization
Method Number of subsets tried
Complexity C
Exhaustive search wrapper
2N N
Nested subsets Feature ranking
N(N+1)/2 or N
log N
Generalization_error Validation_error + (C/m2)
m2: number of validation examples, N: total number of features,n: feature subset size.
With high probability:
n
Error
Try to keep C of the order of m2.
Complexity of Feature Selection
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
WCCI 2008: Causation and Prediction Challenge
Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i wixi < v.
ng number of “good” (relevant) features
nb number of “bad” (irrelevant) features
m number of training examples.
Insensitivity to Irrelevant Features
Active Learning Challenge AISTATS & WCCI 2010
Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Thomas Fuchs, ETH Zurich. Webmaster: Olivier Guyon, MisterP.net, France.
Protocol review and advising: • David W. Aha, Naval Research Laboratory, USA.• Abe Schneider, Knexus Research, USA.• Graham Taylor, NYU, New-York. USA.• Andrew Ng, Stanford Univ., Palo Alto, California, USA• Vassilis Athitsos, University of Texas at Arlington, Texas, Usa.• Ivan Laptev, INRIA, France.• Jitendra Malik, UC Berkeley, California, USA• Christian Vogler, ILSP Athens, Greece.• Sudeep Sarkar, University of South Florida, USA.• Philippe Dreuw, RWTH Aachen University, Germany.• Richard Bowden, Univ. Surrey, UK.• Greg Mori, Simon Fraser University, Canada.
Data collection and preparation: • Vassilis Athitsos, University of Texas at Arlington, Texas, USA• Isabelle Guyon, Clopinet, California, USA.• Graham Taylor, NYU, New-York. USA.• Ivan Laptev, INRIA, France.• Jitendra Malik, UC Berkeley, California, USA.
Baseline methods and beta testing:The following researchers experienced in the domain will be providing baseline results:• Vassilis Athitsos, University of Texas at Arlington, Texas, USA.• Graham Taylor, NYU, New-York. USA.• Andrew Ng, Stanford Univ., Palo Alto, California, USA.• Yann LeCun, NYU. New-York, USA.• Ivan Laptev, INRIA, France.
The following researchers were top ranking participants in past challenges but are not experienced in the domain will also give it a try:• Alexander Borisov (Intel, USA) • Hugo-Jair Escalante (INAOE, México) • Amir Saffari (Graz Univ., Austria)• Alexander Statnikov (NYU, USA)
Credits
1) Feature Extraction, Foundations and ApplicationsI. Guyon, S. Gunn, et al.Springer, 2006.
http://clopinet.com/fextract-book
2) Challenges in Machine LearningCollection published by Microtome.Papers on the challenges reprinted from JMLR and JMLR W&CP
Resources
Join the Hall of Frame!