debellor data mining platform with stream architecture marcin wojnarski warsaw university, poland

Post on 02-Jan-2016

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DebellorData Mining Platform with Stream Architecture

Marcin Wojnarski

Warsaw University, Poland

2

Outline

Debellor – data mining platform

Motivation

Main features

Architecture: Cell data streaming multi-threading

Available in ver. 0.6

Future releases

Summary

3

Language: Java

Licence: open source (GPL)

Download: www.debellor.org

Debello – to conquer (latin). Debellor – conqueror of data

Debellor

4

Rseslib

Debellor – data mining platform

Weka TA-Lib

Lib

SVM

own…

own…

Debellor

5

Motivation

Demand for more complex algorithms.

Necessity to combine elementary algorithms.

6

Motivation

1. Data Processing Network (DPN)

Load Preprocess PredictPreprocess

Save

Load

Visualize

7

Motivation

2. Committee of algorithms

Classifier B Voting

Classifier A

Classifier C

8

Motivation

3. Nested algorithms

RBF neural network

K-means

9

Requirements

Versatile Efficient

Simple

10

All types of data processing algorithms

Extendible data types

Stream architecture large data sets

Multi-threading

Immutability of data objects safety

Features of Debellor

11

Debellor

12

Algorithm Cell

cell

Cell cell = new RseslibClassifier("C45");

cell.set("pruning", "true");

13

Cell – data source

cell

cell.open();

Sample s1 = cell.next(),

s2 = cell.next(),

...

cell.close();

14

Cell – data receiver

cell

cell.setSource(anotherCell);

anotherCell

15

Trainable Cell

cell

cell.setSource(…);

cell.learn();

cell

EMPTY

TRAINED

16

Data Streaming

A B

A B

BATCH

STREAM

It’s the cell who is responsible for asking for data

17

Benefits of streaming

X X

crash!

training of k-means

18

Thread_1

Multi-threading

A B

19

Thread_1

Multi-threading

A.newThread();

A B

Thread_2

20

Available in version 0.6

Rseslib algorithms: classifiers (~20 algorithms)

Weka algorithms: ARFF reader classifiers (~60) filters (47)

Debellor algorithms: Train&Test evaluation k-means for large data (stream-based)

Data types: numeric and symbolic features vectors of features, vectors of vectors of …

21

Future releases

Multi-input & multi-output cells

Composite cells (e.g. meta-learning)

Serialization and copying

22

Summary

Platform

Stream architecture

Extendible

Multi-threaded

Weka & Rseslib partially integrated

23

www.debellor.org

Home

24

top related