openml for algorithm selection and configuration
TRANSCRIPT
![Page 1: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/1.jpg)
OpenMLN E T W O R K E D M A C H I N E L E A R N I N G
F O R A L G O R I T H M S E L E C T I O N A N D O P T I M I S A T I O N
J O A Q U I N VA N S C H O R E N , T U E I N D H O V E N , 2 0 1 5
![Page 2: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/2.jpg)
Meta-learning:
• Learn from experience how to select + optimize learning algorithms + workflows
Requires:
• Large amounts of real datasets
• Wide range of state-of-the-art algorithms
• Huge amounts of experimentation: explore how algorithms/params behave on many kinds of data
![Page 3: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/3.jpg)
Millions of real, open datasets are generated • Drug activity, gene expressions, astronomical observations, text,…
Extensive toolboxes exist to analyse data • MLR, SKLearn, RapidMiner, KNIME, WEKA, AmazonML, AzureML,…
Massive amounts of experiments are run, most of this information is lost forever (in people’s heads) • No learning across people, labs, fields • Start from scratch, slows research and innovation
![Page 4: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/4.jpg)
Let’s connect machine learning environments to network, so that we can organize, learn from experience
Connect minds, collaborate globally in real time
Train algorithms to automate data science
![Page 5: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/5.jpg)
F R I C T I O N L E S S , N E T W O R K E D M A C H I N E L E A R N I N G
Easy to use: Integrated in ML environments. Automated, reproducible sharingOrganized data: Experiments connected to data, code, people anywhereEasy to contribute: Post single dataset, algorithm, experiment, commentReward structure*: Build reputation and trust (e-citations, social interaction)
OpenML
![Page 6: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/6.jpg)
Data (ARFF) uploaded or referenced, versioned analysed, characterized, organised online
![Page 7: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/7.jpg)
analysed, characterized, organised online
![Page 8: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/8.jpg)
Tasks contain data, goals, procedures. Readable by tools, automates experimentation
Train-test splits
Evaluation measure
Results organized online: realtime overview
![Page 9: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/9.jpg)
Results organized online: realtime overview
Train-test splits
Evaluation measure
![Page 10: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/10.jpg)
Flows (code) from various tools, versioned.Integrations + APIs (REST, Java, R, Python,…)
![Page 11: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/11.jpg)
Integrations + APIs (REST, Java, R, Python,…)
![Page 12: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/12.jpg)
reproducible, linked to data, flows and authorsExperiments auto-uploaded, evaluated online
![Page 13: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/13.jpg)
Experiments auto-uploaded, evaluated online
![Page 14: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/14.jpg)
ExploreReuseShare
![Page 15: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/15.jpg)
![Page 16: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/16.jpg)
• Search by keywords or properties
• Filters
• Tagging
Data
![Page 17: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/17.jpg)
• Wiki-like descriptions
• Analysis and visualisation of features
Data
![Page 18: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/18.jpg)
• Wiki-like descriptions
• Analysis and visualisation of features
• Auto-calculation of large range of meta-features
Data
![Page 19: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/19.jpg)
• Simple: Number/perc/log of instances, classes, features, dimensionality, NAs, …
• Statistical: Default acc, Min/max/mean distinct values, skewness, kurtosis, stdev, mutual information, …
• Information-theoretic: Entropy (class, num/nom feats), signal-noise ratio,…
• Landmarkers (AUC, ErrRate, Kappa): Decision stumps, Naive Bayes, kNN, RandomTree, J48(pruned), JRip, NBTree, lin.SVM, …
• Subsampling landmarkers (partial learning curves)
• Streaming landmarkers: Evaluations in previous window, previous best, per-algorithm significant win/loss
• Change detectors: HoeffdingAdwin, HoeffdingDDM
• Domain specific meta-features (e.g. molecular descriptors)
• Many not included yet
Meta-features (120+)
![Page 20: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/20.jpg)
• Example: Classification on click prediction dataset, using 10-fold CV and AUC
• People submit results (e.g. predictions)
• Server-side evaluation (many measures)
• All results organized online, per algorithm, parameter setting
• Online visualizations: every dot is a run plotted by score
Tasks
![Page 21: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/21.jpg)
• Leaderboards visualize progress over time: who delivered breakthroughs when, who built on top of previous solutions
• Collaborative: all code and data available, learn from others
• Real-time: clear who submitted first, others can improve immediately
![Page 22: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/22.jpg)
Classroom challenges
![Page 23: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/23.jpg)
• All results obtained with same flow organised online
• Results linked to data sets, parameter settings -> trends/comparisons
• Visualisations (dots are models, ranked by score, colored by parameters)
![Page 24: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/24.jpg)
• Detailed run info
• Author, data, flow, parameter settings, result files, …
• Evaluation details (e.g., results per sample)
![Page 25: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/25.jpg)
REST API (new) Requires OpenML accountPredictive URLs
![Page 26: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/26.jpg)
R API Idem for Java, Python Tutorial: http://www.openml.org/guide
List datasets List tasks
List flowsList runs and results (soon)
![Page 27: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/27.jpg)
Share
Explore
Reuse
![Page 28: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/28.jpg)
Website1) Search, 2) Table view 3) CVS export*
![Page 29: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/29.jpg)
R API Idem for Java, Python Tutorial: http://www.openml.org/guide
Download datasets Download tasks
Download flows
![Page 30: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/30.jpg)
R API
Download run
Download run with predictions
Idem for Java, Python Tutorial: http://www.openml.org/guide
![Page 31: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/31.jpg)
ReuseExplore
Share
![Page 32: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/32.jpg)
WEKAOpenML extension in plugin manager
![Page 33: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/33.jpg)
MOA
![Page 34: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/34.jpg)
RapidMiner: 3 new operators
![Page 35: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/35.jpg)
R API
Run a task (with mlr):
![Page 36: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/36.jpg)
R API
Run a task (with mlr):
And upload:
![Page 37: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/37.jpg)
Studies (e-papers) - Online counterpart of a paper, linkable- Add data, code, experiments (new or old)- Public or shared within circle
Circles Create collaborations with trusted researchersShare results within team prior to publication
Altmetrics - Measure real impact of your work - Reuse, downloads, likes of data, code, projects,…- Online reputation (more sharing)
Online collaboration (soon)
![Page 38: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/38.jpg)
OpenML Community
Jan-Jun 2015
Used all over the world400 7-day, 1700 30-day active users, growing1000s of datasets, flows, 450000+ runs
![Page 39: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/39.jpg)
Opportunities for automated algorithms selection + configuration• Many datasets, flows, experiments: much larger meta-learning
studies than possible before• More meta-features, better meta-models• Meta-data to speed up algorithm configuration, learn over time• APIs: Algorithm selection and configuration can be built on top
of OpenML, reusing and sharing data
![Page 40: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/40.jpg)
OpenML in drug discovery
ChEMBL databaseSM
ILEs'
Molecular'proper0es'
(e.g.'MW,'LogP)'
Fingerprints'(e.g.'FP2,'FP3,'FP4,'MACSS)'
MW #LogP #TPSA #b1 #b2 #b3 #b4 #b5 #b6 #b7 #b8 #b9#!!
377.435 !3.883 !77.85 !1 !1 !0 !0 !0 !0 !0 !0 !0 !!341.361 !3.411 !74.73 !1 !1 !0 !1 !0 !0 !0 !0 !0 !…!197.188 !-2.089 !103.78 !1 !1 !0 !1 !0 !0 !0 !1 !0 !!346.813 !4.705 !50.70 !1 !0 !0 !1 !0 !0 !0 !0 !0!
! ! ! ! ! ! !.!! ! ! ! ! ! !: !!
10.000+ regression datasets
1.4M compounds,10k proteins, 12,8M activities
Predict which compounds (drugs) will inhibit certain proteins (and hence viruses, parasites,…)
2750 targets have >10 compounds, x 4 fingerprints
![Page 41: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/41.jpg)
OpenML in drug discovery
cforest
ctree
earth
fnn
gbm
glmnetlassolmnnetpcrplsr
rforest
ridge
rtree
ECFP4_1024* FCFP4_1024* ECFP6_1024+ FCFP6_1024*
0.697* 0.427* 0.725+ 0.627*
|msdfeat< 0.1729
mutualinfo< 0.7182
skewresp< 0.3315
nentropyfeat< 1.926
netcharge1>=0.9999
netcharge3< 13.85
hydphob21< −0.09666
glmnet
fnn pcr
rforest
ridge rforest
rforest ridge
Predict best algorithm with meta-models
MW #LogP #TPSA #b1 #b2 #b3 #b4 #b5 #b6 #b7 #b8 #b9#!!
377.435 !3.883 !77.85 !1 !1 !0 !0 !0 !0 !0 !0 !0 !!341.361 !3.411 !74.73 !1 !1 !0 !1 !0 !0 !0 !0 !0 !…!197.188 !-2.089 !103.78 !1 !1 !0 !1 !0 !0 !0 !1 !0 !!346.813 !4.705 !50.70 !1 !0 !0 !1 !0 !0 !0 !0 !0!
! ! ! ! ! ! !.!! ! ! ! ! ! !: !!
Metafeatures: - simple, statistical, info-theoretic, landmarkers - target: aliphatic index, hydrophobicity, net
charge, mol. weight, sequence length, …
![Page 42: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/42.jpg)
I. Olier et al. MetaSel@ECMLPKDD 2015
Best algorithm overall (10f CV)
![Page 43: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/43.jpg)
I. Olier et al. MetaSel@ECMLPKDD 2015
Random Forest meta-classifier (10f CV)
![Page 44: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/44.jpg)
I. Olier et al. MetaSel@ECMLPKDD 2015
Random Forest stacker (10f CV)
![Page 45: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/45.jpg)
Frugal learningWhich algorithms are fast (and low-memory) but accurate?
In Python Notebook:
![Page 46: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/46.jpg)
Frugal learningWhich algorithms are fast (and low-memory) but accurate?
![Page 47: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/47.jpg)
Frugal learningWhich algorithms are fast (and low-memory) but accurate?
![Page 48: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/48.jpg)
Meta-learning on streamsStream data in OpenML: ‘best’ algorithm changes over time Concept drift
• Use meta-learning to select the best models at each point in time
![Page 49: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/49.jpg)
Meta-learning on streamsStream data in OpenML: ‘best’ algorithm changes over time
Evaluation
• Base-level:
• Interleaved train-then-test (prequential evaluation)
• Accuracy on data streams using predicted method [default, best]
• Meta-level:
• Leave one stream out
• Accuracy of predicting best model [0,1]
![Page 50: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/50.jpg)
Meta-learning on streams
• Streaming ensembles (current work)
• Train multiple models, use meta-learning to weight their votes
• Stacking / Cascading
• BLast (Best-Last), J. van Rijn et al. ICDM 2015
• Simply choose model that performed best in previous window. Equivalent to state-of-the-art (Leveraging Bagging)
![Page 51: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/51.jpg)
Algorithm selection (ranking)
J. van Rijn et al. IDA 2015
Learning curves in OpenML
• Identify k nearest prior datasets by distance between partial curves:
• Build a ranking of algorithms to run, start with overall best abest • Draw random algorithm acompetitor
• If acompetitor wins most, add to ranking, repeat
For new dataset: evaluate learning curves up to T (e.g. 256 instances) Fraction of full CV No other meta-features
![Page 52: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/52.jpg)
0
0.02
0.04
0.06
0.08
0.1
1 4 16 64 256 1024 4096 16384 65536
Accu
racy
Los
s
Time (seconds)
Best on SampleAverage Rank
PCCPCC (A3R, r = 1)
For 53 classifiers on 39 datasets Multi-objective function A3R:
J. van Rijn et al. IDA 2015
Algorithm selection (ranking)
![Page 53: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/53.jpg)
Towards automating machine learning
Human scientists
meta-data, models, evaluations
Automated processes
Data cleaning Algorithm Selection
Parameter Optimization Workflow construction
Post processing
API
Connect your tools and services to OpenML, so they may learn
![Page 54: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/54.jpg)
Bruce Mau
When everything is connected to everything else, for better or for worse, everything matters.
![Page 55: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/55.jpg)
- Open Source, on GitHub- Regular workshops, hackathons
Join OpenML
Next workshop: - Lorentz Center (Leiden),
14-18 March 2016
![Page 56: OpenML for Algorithm Selection and Configuration](https://reader030.vdocuments.site/reader030/viewer/2022021502/58a123581a28abb91b8b5f01/html5/thumbnails/56.jpg)
T H A N K Y O U
Joaquin Vanschoren
Jan van Rijn
Bernd Bischl
Matthias Feurer
Michel Lang
Nenad Tomašev
Giuseppe Casalicchio
Luis Torgo
You?
#OpenMLFarzan Majdani
Jakob Bossek