mex vocabulary - a lightweight interchange format for machine learning experiments

30
MEX Vocabulary A Lightweight Interchange Format for Machine Learning Experiments Diego Esteves et al. Department of Computer Science, AKSW University of Leipzig 17 Sep 2015 - SEMANTiCS Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 1 / 30

Upload: universitaet-leipzig

Post on 16-Feb-2017

245 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEX VocabularyA Lightweight Interchange Format for Machine Learning Experiments

Diego Esteves et al.

Department of Computer Science, AKSWUniversity of Leipzig

17 Sep 2015 - SEMANTiCS

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 1 / 30

Page 2: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

Outline

1 IntroductionProblemMotivationChallengesState of the Art

2 MEXThe InspirationThe ArchitectureExamples

3 Conclusion and Future Work

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 2 / 30

Page 3: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationThe Problem

The Problem

How should we represent results of machine learning experiments in acommon, comprehensive and interoperable format?

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 3 / 30

Page 4: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationExample 1: Collaborative Project

Three Universities are working collaboratively in a research project

How to achieve a high level of interoperability?

A uses the Weka1 toolkit.

B uses DL-Learner2

C uses the Accord Framework3

1http://www.cs.waikato.ac.nz/ml/weka/2http://dl-learner.org/3http://accord-framework.net/

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 4 / 30

Page 5: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationExample 2: hands on...

A complex script-based scenario

You are working on your research about stock market predictions and wantto store the data for further analysis?

eg.: a script which takes 2 days to run a multi-level machine learningalgorithm.

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 5 / 30

Page 6: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationExample 3: Reading or Reviewing a paper

You are a reviewer or scientist...

sometimes it’s hard to understand the proposed solution of a researchpaper.

.The ACL POS Tagging website (State of the art)exemplifies a good use case for MEX on the web 1.

Furthermore, in both cases the task/reading is error-prone andtime-consuming.

1http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_

the_art)

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 6 / 30

Page 7: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationSolution

Machine-readable data

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 7 / 30

Page 8: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationSolution

Existing Standards:

Comma-Separated Values (CSV)

eXtensible Markup Language (XML)

JavaScript Object Notation (JSON)

Value-Object (VO)

Data-Transfer-Objects (DTO)

Database Management System (DBMS)

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 8 / 30

Page 9: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationSolution

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 9 / 30

Page 10: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

Motivation3 drawbacks

1 The lack of schema definition: you always have to define theschema by yourself and share your model afterwards.

2 DBMS is technology-dependent and does not provides reasoningand inference capabilities.

3 the lack of semantic information.

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 10 / 30

Page 11: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationProblem: an example

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 11 / 30

Page 12: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MotivationThe Problem

The Optimal Scenario

How should we represent results of machine learning experiments in acommon1, comprehensive (but not complex)2, lightweight3,interoperable4 and flexible5 format, taking into consideration a low

effort-level6 for implementation?

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 12 / 30

Page 13: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

State of the Art

Related Work

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 13 / 30

Page 14: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

State of the ArtPlatforms for e-science workflows

Name Description

MyExperiment[DeRoure2009 ]

A collaborative environment

where scientists can

publish their workflows and

experiment plans

Wings[Gil2011 ]

A Semantic Approach to

creating very large

scientific workflows

OpenTOX[Tcheremenskaia2012 ]

An interoperable predictive

toxicology framework

OpenML[Vanschoren2014 ]

A frictionless,

collaborative environment

for exploring machine

learning

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 14 / 30

Page 15: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

State of the ArtOntologies

Name Description

Expose[Vanschoren2010 ]

Data mining experiments

used in conjunction with

Experiment Databases

OntoDM[Panov2013 ]

Data mining investigations

DMOP[Keet2015 ]

Data Mining OPtimization

Ontology: It supports

informed decision-making

at various choice points of

the data mining process

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 15 / 30

Page 16: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEX.aksw.org

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 16 / 30

Page 17: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

The abstractionWhat we want to describe

Machine Learning Definition by T.Mitchell

“A computer program is said to learn from experience E with respect tosome task T and some performance measure P, if its performance onT, as measured by P, improves with experience E” – Tom Mitchell

ML Concepts MEX Classes

experience E mexcore:ExecutionCollection

task T mexalgo:Algorithm

performance measure P mexperf:ExecutionPerformance

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 17 / 30

Page 18: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEX3 Vocabularies

MEX Core

formalizes the key entities for representing the basic

steps on machine learning executions

MEX Algorithm

representing the context of machine learning algorithms and

their associated characteristics

MEX Performance

provides the basic entities for representing the

experimental results of executions of machine learning

algorithms

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 18 / 30

Page 19: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEX Vocabulary (:mexalgo + :mexcore + :mexperf)

and Related Ontologies

402

778

858

757

MEX (7+14+10=31)

ONTO-DMExpose

DMOP

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 19 / 30

Page 20: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEXInterlinking the 3 layers: mexalgo, mexcore and mexperf

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 20 / 30

Page 21: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

:mexalgo

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 21 / 30

Page 22: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

:mexcore

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 22 / 30

Page 23: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

:mexperf

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 23 / 30

Page 24: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEXACL POS Tagging website metadata

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 24 / 30

Page 25: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

Next chapter ;-)

RDF? Ontology? Jena?Dublin Core...? SPARQL?OWL? PROV-O, What?

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 25 / 30

Page 26: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

p u b l i c s t a t i c vo i d main ( S t r i n g [ ] a r g s ) {

MyMEX 10 mex = new MyMEX 10 ( ) ;mex . setAuthorName (”D Es t e v e s ” ) ;S t r i n g e i d = ”E001S001 ” ;

mex . addConf ( e i d ) . s e t D e s c r i p t i o n (” h e l l o wor ld expe r imen t ” ) ;mex . Conf ( e i d ) . addFeature (”min ; max ; op ; c l o s e ” ) ;mex . Conf ( e i d ) . Imp l ementa t i on ( ) . s e t ( enumImplementat ion .Weka ) ;mex . Conf ( e i d ) . addAlgor i thm ( enumAlgorithm . Suppor tVectorMach ines ) ;mex . Conf ( e i d ) . addAlgor i thm ( enumAlgorithm . NaiveBayes ) ;mex . Conf ( e i d ) . A lgo r i thm ( enumAlgorithm . Suppor tVectorMach ines ) . addParameter (”C” , ”10ˆ3”) ;mex . Conf ( e i d ) . A lgo r i thm ( enumAlgorithm . Suppor tVectorMach ines ) . addParameter (” a lpha ” , ” 0 . 2 ” ) ;

. . .}

/∗ your code he r e ∗/. . .

S t r i n g e x i d = mex . Conf ( e i d ) . a ddEx e cu t i o nOve r a l l . addPer formance ( enumMeasures .ACCURACY, . 9 6 ) ;S t r i n g e x i d = mex . Conf ( e i d ) . E x e c u t i o nOv e r a l l ( e x i d ) . addPerformance ( enumMeasures .TPR, . 7 8 ) ;. . .

MEXSe r i a l i z e r 10 . g e t I n s t a n c e ( ) . p a r s e (mex ) ;

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 26 / 30

Page 27: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

ConclusionD.Esteves et al.

Requirement Argumentation

lightweight 7: this is the minimal number of

classes you need for representing a

basic execution. 31: this is the

number of the most important entities

in the 3 layers

flexible Single or Overall Executions

Choose your inputs/outputs

low

effort-level

MEX provides APIs which encapsulate

the semantic knowledge. So you can

avoid extra implementation-effort and

just log your inputs and outputs

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 27 / 30

Page 28: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

ConclusionD.Esteves et al.

Requirement Argumentation

common The concepts behind vocabularies

allow us to achieve a high level

of abstraction, generalization and

formalization of concepts

interoperable Vocabularies are the current best

choice for representing real-world

entities

comprehensive classification, regression and

clustering problems are covered

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 28 / 30

Page 29: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

ConclusionD.Esteves et al.

1 Produces Provenance Metadata.2 Allows Querying Results.3 Defines an Interoperable Format for Sharing Machine Learning

Experiments.4 Benefits Meta-Learning [Vilalta2002 ] Approaches.5 Tends to minimize the misinterpretation probability rate

on persuasive and informative aspects [Gillen2006 ].6 MEX is flexible and lightweight.7 Experiment Databases [Blockeel2007 ][Vanschoren2012 ] need

an interchange format for experiments.8 MEX provides APIs which facilitate the file generation process.9 Benchmark Systems[Usbeck2014 ] can benefit from a standard

format.10 Generate your LaTeX table automatically.

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 29 / 30

Page 30: MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experiments

MEXD.Esteves et al.

Thank you so much for your attention!mex.aksw.org

Diego Esteves et al. (University of Leipzig) MEX Vocabulary 17 Sep 2015 - SEMANTiCS 30 / 30