jubatusの特徴変換と線形分類器の仕組み

41
Jubatusの特徴変換と 線形分類器の仕組み 2011/11/07 株式会社Preferred Infrastructure 海野 裕也 (@unnonouno)

Upload: jubatusofficial

Post on 04-Dec-2014

8.477 views

Category:

Documents


8 download

DESCRIPTION

 

TRANSCRIPT

  • 1. Jubatus 2011/11/07Preferred Infrastructure (@unnonouno)

2. l Jubatusl l 2 3. l l l 3 4. (0, 1, 0, 2.5, -1, ) /SVM, LogReg, (1, 0.5, 0.1, -2, 3, ) PA, CW, ALOW, Nave Bayes (0, 1, 0, 1.5, 2, ) CNB, DT, RF, ANN, K-means, Spectral Clustering, MMC, LSI, LDA, GM, HMM, MRF, CRF, 4 5. l (0, 1, 0, 2.5, -1, ) /SVM, LogReg, (1, 0.5, 0.1, -2, 3, ) PA, CW, ALOW, Nave Bayes (0, 1, 0, 1.5, 2, ) CNB, DT, RF, ANN, K-means, Spectral Clustering, MMC, LSI, LDA, GM, HMM, MRF, CRF, 5 6. Jubatus2l l 6 7. Jubatus7 8. Jubatusl xy x y or or TwitterTweet 8 9. l Jubatusl l OKNTTPFI*1Jubatus*21 / N-gram : 1 Ju: 1 : 2 ub: 2 : 1 ba:1 : 2at:1 NTT : 1 tu:1 : 1 9 us:1 10. l l l 1: 2: 3: -12-11 -1-1 21 -111 1 2 -1 -1 10 11. l l l l l l l Jubatus11 12. l l l l l l 12 13. l l 13 14. l l l Jubatusl Perceptron (1958)l Passive Aggressive (PA) (2003)l Confidence Weighted Learning (CW) (2008)l AROW (2009)l Normal HERD (NHERD) (2010)14 15. Jubatusl l (Iterative ParameterMixture)l l MapReduce15 16. Jubatusl l l 16 17. l l l l l 17 18. Jubatusl l l l 18 19. Jubatus19 20. l l l 1 1 0 1 20 21. l 21 22. l Jubatus Jubatus / / / / / / / / / / / 22 23. Jubatus l l 23 24. 24 25. ( [ ("user/id", "ippy"),("user/name", "Loren Ipsum"),("message", "Hello World") ],[ ("user/age", 29) ,("user/income", 100000) ] )l l l l 25 26. ( [ ("user/id", "ippy"),("user/name", "Loren Ipsum"),("message", "Hello World") ],[ ("user/age", 29) ,("user/income", 100000) ] )l 1l 26 27. ( [ ("user/id", "ippy"),("user/name", "Loren Ipsum"),("message", "Hello World") ],[ ("user/age", 29) ,("user/income", 100000) ] ) l 2l 27 28. ( [ ("user/id", "ippy"), ("user/name", "Loren Ipsum"),("message", "Hello World") ],[ ("user/age", 29) ,("user/income", 100000) ] )l l 28 29. {"string_filter_types": {},"string_filter_rules": [],"num_filter_types": {},"num_filter_rules": [], "string_types": {},"string_rules":[ { "key": "*", "type": "space", "sample_weight": "bin, "global_weight": "bin" }],"num_types": {},"num_rules": [{ "key": "*", "type": "num" }]}29 30. 1. string_rules: [{ "key": "*", 2. "type": "space","sample_weight": "bin, 3. "global_weight": "bin" } ]l 1. 2. space3. 1l l num_rules30 31. "string_types": {"bigram": { "method": "ngram","char_num": "2" } },"string_rules: [method{ "key": "message","type": "bigram", rule"sample_weight": "tf","global_weight": "bin" }, ]l XXX_types l l XXX_rules31 32. (message, "Hello World)RT(twit, Thanks RT: Here you are)(nationaliy, JAPAN) l l 32 33. string_filter_types: { detag: { method: regexp, pattern: ]*>, replace: }, }, string_filter_rules: [{ key: message, type: detag, suffix: -detagged}, ]l xxx_types l xxx_rules 33 34. string_types: {mecab: {method: dynamic,function: create,path: /usr/local/lib/libmecab_splitter.so}, }l XXX_typesl methoddynamicpath.sofunction 34 35. #include class my_splitter : public jubatus::word_splitter { public: void split(const string& string,vector >&ret_boundaries) {// do somehting }};extern "C" {my_splitter* create(const map&params) {return new my_splitter();}} 35 36. #include class my_splitter : public jubatus::word_splitter { public: void split(const string& string,vector >&ret_boundaries) {// do somehting }};extern "C" {my_splitter* create(const map&params) {return new my_splitter();}} 36 37. #include class my_splitter : public jubatus::word_splitter { public: void split(const string& string,vector >&ret_boundaries) {// do somehting } }; extern "C" {my_splitter* create(const map&params) {return new my_splitter();}} 37 38. #include class my_splitter : public jubatus::word_splitter { public: void split(const string& string,vector >&ret_boundaries) {// do somehting } };extern "C" {my_splitter* create(const map&params) {return new my_splitter();}} 38 39. l l l l l etc.l l 39 40. l 2l l l l l l l 40 41. 41