machine learning software + intro weka · 2006-02-14 · weka: clusterers weka contains...

21
Machine Learning Software + Intro WEKA Oliver Brdiczka Equipe PRIMA INRIA Rhône-Alpes

Upload: others

Post on 25-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software +Intro WEKA

Oliver Brdiczka

Equipe PRIMAINRIA Rhône-Alpes

Page 2: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Outline

� Machine Learning Software� MATLAB

� Orange

� Torch3

� R language

� WEKA

� YALE

� Short introduction to WEKA

Page 3: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� MATLAB toolboxes:

� Many toolboxes for different machine learning areas

� E.g: SPIDER, PRTools (pattern recognition), BNT (bayesian networks) …

� Need a license of MATLAB! (or use scilab)

Page 4: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� Orange (University of Ljubljana)

� Focus on data mining + visualization

� C++ components + Python scripting

� GUI

� Linux, MS Windows, Macintosh

� GNU General Public license

Page 5: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� TORCH3 (IDIAP)� (Statistical) machine learning library� C++ library� Linux, MS Windows� BSD license

� R language� Language/environment for statistical computingand graphics (free impl. of S language)

� C++� Linux, MS Windows, Macintosh� GNU General Public license

Page 6: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� WEKA (University Waikato,New Zealand)

� Machine learning/data mining software

� Java-based

� GUI

� GNU General Public license

Page 7: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� YALE (University of Dortmund)

� Environment for machine learning experiments (Experiment editor)

� Java-based

� GUI

� Integration of WEKA learners

� GNU General Public license

Page 8: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Machine Learning Software

� Others…

� Libraries for specific learning algorithms:� HMM: ghmm (C++), jahmm (Java)

� Graphical Bayesian Models: gmtk (C++)

� …

Page 9: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Short introduction to WEKA

Page 10: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: main features

� Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

� Graphical user interfaces (incl. data visualization)

� Environment for comparing learning algorithms

Page 11: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: data format ARFF

@relation weather

@attribute outlook {sunny, overcast, rainy}@attribute temperature numeric@attribute humidity numeric@attribute windy {TRUE, FALSE}@attribute play {yes, no}

@datasunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yes

Page 12: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: data import

� Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

� Data can also be read from a URL or from an SQL database (using JDBC)

� Pre-processing tools in WEKA are called “filters”

Page 13: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: filters

� WEKA contains filters for:

� Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

Page 14: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: classifiers

� Classifiers in WEKA are models for predicting nominal or numeric quantities

� Implemented learning schemes include:� Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

� “Meta”-classifiers include:� Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

Page 15: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: clusterers

� WEKA contains “clusterers” for finding groups of similar instances in a dataset

� Implemented schemes are:� k-Means, EM, Cobweb, FarthestFirst …

� Clusters can be visualized and compared to “true” clusters (if given)

� Evaluation based on loglikelihood if clustering scheme produces a probability distribution

Page 16: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: API documentation

� javadoc

Page 17: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

WEKA: User Interfaces

� Simple Command Line Interface

� Explorer

� Filters, classifiers, clusterers, visualization

� Experimenter

� Comparing different learning algorithms

� Knowledge Flow

� Graphical programming tool

Page 18: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

Conclusion

� Important tool for machine learningproblems

� Used by many research groups

� Many extensions are available for WEKA:

� Spectral clustering, time series mining, gridcomputing, document classification and clustering, vector quantization, rulediscovery, parallel processing …

Page 19: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

References (web)

� MATAB toolboxes:

� SPIDER: http://www.kyb.tuebingen.mpg.de/bs/people/spider/main.html

� PRTools:

http://www.prtools.org/

� BNT:

http://bnt.sourceforge.net/

Page 20: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

References (web)

� Orange

http://www.ailab.si/orange

� TORCH3

http://www.torch.ch/

� R language

http://www.r-project.org/

Page 21: Machine Learning Software + Intro WEKA · 2006-02-14 · WEKA: clusterers WEKA contains “clusterers”for finding groups of similar instances in a dataset Implemented schemes are:

References (web)

� WEKA

http://www.cs.waikato.ac.nz/ml/weka/

� YALE

http://www-ai.cs.uni-dortmund.de/SOFTWARE/YALE/index.html

� Other:� Jahmm:

http://www.run.montefiore.ulg.ac.be/~francois/software/jahmm/

� Ghmm: http://www.ghmm.org/

� Gmtk: http://ssli.ee.washington.edu/~bilmes/gmtk/