what is jubatus (short)

17
What is Jubatus? How it works for you? NTT SIC Hiroki Kumazaki

Upload: kumazaki-hiroki

Post on 22-Jun-2015

183 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: What is jubatus (short)

What is Jubatus?How it works for you?

NTT SICHiroki Kumazaki

Page 2: What is jubatus (short)

Jubatus is…• A Distributed Online Machine-Learning framework– An OSS developped in Japan

• GPL2.0

• Distributed– Fault-Tolerance– Scale out

• Online– Fixed time computation

• Machine-Learning– More than “word count”!

Page 3: What is jubatus (short)

Architecture• ML model is combined with feature-extractor

MachineLearningModel

FeatureExtractor

Jubatus Server

Jubatus RPC

Page 4: What is jubatus (short)

Architecture

• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.

• So you can use OCaml, Haskell, JavaScript, Go with your own risk.

Client

Jubatus RPC

Page 5: What is jubatus (short)

Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining

Useful!

Page 6: What is jubatus (short)

Classifier• Task: Classification of Datum

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end

Sample Task: Classify what programming language used

It’s It’s

Page 7: What is jubatus (short)

Classifier• Set configuration in the Jubatus server

ClassifierFreatureExtractor

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

Feature Extractor

Page 8: What is jubatus (short)

Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

setteings for extract feature from string

define function named “bigram”

original embedded function “ngram”

pass “2” to “ngram” to create “bigram”

for all dataapply “bigram”

feature weights based on tf/idfsee wikipedia/tf-idf

Page 9: What is jubatus (short)

Classifier• Feature Extractor becomes “bigram extractor”

Classifierbigramextractor

Page 10: What is jubatus (short)

Feature Extractor• What bigram extractor does?

bigramextractor

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

Feature Vector

Page 11: What is jubatus (short)

Classifier• Training model with feature vectors

key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...

Classifier

key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1

key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...

Page 12: What is jubatus (short)

Classifier• Set configuration in the Jubatus server

Classifier

"method" : "AROW","parameter" : { "regularization_weight" : 1.0}

Feature Extractor

bigramextractor Classifier Algorithms

• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d

Page 13: What is jubatus (short)

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuesi 1il 1... ...{| 1... ...

It’s

Page 14: What is jubatus (short)

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuere 1): 1

... ...s[ 1... ...

It’s

Page 15: What is jubatus (short)

Via RPC• invoke feature extraction and classification from

client via RPC

AROWbigramextractor

lang = client.classify([sourcecode])

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

It may be

Page 16: What is jubatus (short)

What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification

Page 17: What is jubatus (short)

How to use?• see examples in

http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection