a brief introduction to machine learning
TRANSCRIPT
- 1. A brief introduction to Machine Learning https://ihower.tw RubyConf China 2017
- 2. a.k.a. ihower https://ihower.tw Ruby on Rails Ruby Developer user since 2006
- 3. Agenda Rails
- 4. 1.
- 5. ? ? ? ? ? ? ?
- 6.
- 7. Training Data Algorithm Model
- 8. Training Data Algorithm Model
- 9. Yes/No Binary Classication
- 10. 21 M 120 50 F 800 41 F 400 35 M 300 36 M 500 28 M 60 55 F 950
- 11. X1(Feature 1) X2(Feature 2) Y(Label ) 21 120 1 50 800 -1 41 400 1 35 300 1 36 500 -1 28 60 -1 55 950 1
- 12. (Perceptron) ( features ) 1 d (d features )
- 13. w ? feature Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 14. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/ w ? feature
- 15. Perceptron Learning Algorithm (PLA) w = 0 X_n w w Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 16. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 17. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 18. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 19. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 20. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 21. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 22. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 23. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 24. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 25. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 26. Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 27. Pocket PLA w Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 28.
- 29. (Supervised Learning) (Semi-supervised Learning) (Unsupervised Learning) (Reinforcement Learning)
- 30. (label) (Classication) (Regression)
- 31. (label) label
- 32. (Clustering) K-means (GAN) Credit: https://en.wikipedia.org/wiki/Cluster_analysis
- 33. (y) https://en.wikipedia.org/wiki/Reinforcement_learning
- 34. Linear Regression Linear Classier Logistic Regression Decision TreeRandom forest Naive Bayes classiers Support Vector Machine (SVM) Articial Neural Network (ANN)
- 35. (Linear Regression) Gradient Descent
- 36. Gradient Descent f(w) w ? Image Credit: edX BerkeleyX: CS190.1x Scalable Machine Learning
- 37. http://lukaszkujawa.github.io//gradient-descent.html
- 38. # https://github.com/daugaard/linear-regression # gem install ruby_linear_regression require 'ruby_linear_regression' x_data = [ [1,1], [2,2], [3,3], [4,4] ] y_data = [ 1,2,3,4 ] linear_regression = RubyLinearRegression.new linear_regression.load_training_data(x_data, y_data) linear_regression.train_gradient_descent(0.0005, 1000, true) prediction_x = [5, 5] prediction_y = linear_regression.predict(prediction_x) puts prediction_y
- 39. Nonlinear Transformation Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 40. Logistic Regression 0~1 "soft" binary classication
- 41. # gem install liblinear-ruby # https://github.com/kei500/liblinear-ruby # https://www.practicalai.io/implementing-classification-using- logistic-regression-in-ruby/ require 'liblinear' model = Liblinear.train( { solver_type: Liblinear::L2R_LR }, # L2-regularized logistic [-1, -1, 1, 1], # labels [[-2, -2], [-1, -1], [1, 1], [2, 2]], # training data ) puts Liblinear.predict(model, [0.5, 0.5])
- 42. Binary Mutilclass OVA (One-Versus-All) One-vs-Rest (OvR) binary classication
- 43. Decision Tree (feature)
- 44. http://www.r2d3.us/visual-intro- to-machine-learning-part-1/
- 45. # https://github.com/igrigorik/decisiontree # gem install decisiontree # gem install graphr require 'decisiontree' attributes = ['Age', 'Education', 'Income', 'Marital Status'] training = [ ['36-55', 'Masters', 'High', 'Single', 1], ['18-35', 'High School', 'Low', 'Single', 0], ['< 18', 'High School', 'Low', 'Married', 1] ] dec_tree = DecisionTree::ID3Tree.new(attributes, training, 1, :discrete) dec_tree.train test = ['< 18', 'High School', 'Low', 'Single', 0] decision = dec_tree.predict(test) puts "Predicted: #{decision} ... True decision: #{test.last}"; dec_tree.graph("discrete")
- 46. SVM (Support Vector Machine) (margin) ( support vector candidate) Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 47. # gem install rb-libsvm # https://www.practicalai.io/implementing-classification-using-a-svm- in-ruby/ require 'libsvm' # This library is namespaced. problem = Libsvm::Problem.new parameter = Libsvm::SvmParameter.new parameter.cache_size = 1 # in megabytes parameter.eps = 0.001 parameter.c = 10 examples = [ [1,0,1], [-1,0,-1] ].map {|ary| Libsvm::Node.features(ary) } labels = [1, -1] problem.set_examples(labels, examples) model = Libsvm::Model.train(problem, parameter) pred = model.predict(Libsvm::Node.features(1, 1, 1)) puts "Example [1, 1, 1] - Predicted #{pred}"
- 48. awesome-ruby https://github.com/arbox/machine-learning-with-ruby
- 49. FAQ ? ??
- 50. 2.
- 51. ?
- 52. .....
- 53. ? ? Features ? Features ? feature feature ? ? ? ?
- 54.
- 55. (Exploratory data analysis) (Train data, Test data) ()
- 56. (HTML)
- 57. : 0 1 features (categorical) one-hot encoding
- 58. One-Hot-Encoding { IE, Chrome, Safari } features IE => [ 1 0 0 ] Chrome => [ 0 1 0 ] Safari => [ 0 0 1 ] Feature Hashing
- 59. Feature Extraction () Domain Knowledge Geolocation( ) //
- 60. Bag of Words () (1) John likes to watch movies. Mary likes movies too. (2) John also likes to watch football games. [1, 2, 1, 1, 2, 0, 0, 0, 1, 1] [1, 1, 1, 1, 0, 1, 1, 1, 0, 0] [ "John", "likes", "to", "watch", "movies", "also", "football", "games", "Mary", "too" ] https://zh.wikipedia.org/zh-cn/
- 61. Feature 100% ?
- 62. Overtting Image Credit: edX BerkeleyX: CS190.1x Scalable Machine Learning
- 63. Overtting Credit: http://www.csie.ntu.edu.tw/~htlin/mooc/
- 64. !
- 65. KPI KPI ....
- 66.
- 67. Overtting ? features Overtting features regularization Early Stopetc
- 68. Undertting ? Training dataset Validation dataset features
- 69. Training Dataset Testing Dataset 7:3 Training dataset Validation dataset => Undertting Training dataset Validation dataset => Overtting ? model Gradient Descent
- 70. ....
- 71. Ruby ML Cross-Validation .... Toy Project
- 72. Sorry Ruby Python R
- 73. Python Scikit-learn: Python Machine Learning Pandas: Data frames Matplotlib Jupyter notebook PySpark
- 74. Scikit-learn Scikit-learn
- 75. http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
- 76. Jupyter notebook Demo
- 77.
- 78. 3.
- 79. Credit: http://www.cnblogs.com/gpcuster/archive/2008/06/03/1213008.html
- 80. Credit: Neuron
- 81. Credit:
- 82. Credit:
- 83. w? https://zh.wikipedia.org/wiki/
- 84. MNIST 28*28=784 pixel Hidden layer 100 Neuron784*100+100*10 = http://yann.lecun.com/exdb/mnist/
- 85. # https://github.com/tangledpath/ruby-fann require 'ruby-fann' train = RubyFann::TrainData.new(:inputs=>[[0.3, 0.4, 0.5], [0.1, 0.2, 0.3]], :desired_outputs=>[[0.7], [0.8]]) fann = RubyFann::Standard.new(:num_inputs=>3, :hidden_neurons=>[2, 8, 4, 3, 4], :num_outputs=>1) fann.train_on_data(train, 1000, 10, 0.1) # 1000 max_epochs, 10 errors between reports and 0.1 desired MSE (mean-squared-error) puts fann.run([0.3, 0.2, 0.4])
- 86. http://playground.tensorow.org
- 87. Credit:
- 88. feature 1. 2.
- 89. Convolutional Neural Network (CNN) ANN CNN Convolution Layer Max pooling Credit:
- 90. https://www.ted.com/talks/ fei_fei_li_how_we_re_teaching_computers_to_understand_
- 91. http://www.image-net.org
- 92. Credit:
- 93. Google Tensorow Facebook Cae CNTK MXNet (Amazon) PyTorch Theano
- 94. Nvidia GPU (CUDA ) Google TPU Intel Movidius Apple Neural Engine chip (iPhone X)
- 95. Keras https://keras.io API TensorFlowCNTK Theano
- 96. model = Sequential() model.add(Dense(512, input_shape=(784,))) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(512)) model.add(Activation('relu')) model.add(Dropout(0.2)) model.add(Dense(10)) model.add(Activation('softmax')) model.summary() model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy']) history = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test)) score = model.evaluate(X_test, Y_test, verbose=0) print('Test score:', score[0]) print('Test accuracy:', score[1])
- 97. 4. Rails
- 98. Ruby (are you serious?) Python R Apache Spark
- 99. model Ruby But. How about feature engineering part? Python Web API : ask RPC grpc pycall python function API (GCP,AWS,Azure)
- 100. Apple Core ML https://developer.apple.com/documentation/coreml
- 101. Cloud Solution Amazon Web Service Google Cloud Platform Microsoft Azure
- 102. Google Cloud Cloud Machine Learning Engine Machine Learning as API Cloud Vision Cloud Speech Cloud Natural Language Cloud Translation API Cloud Video Intelligence
- 103. # gem install google-cloud-vision # export GOOGLE_CLOUD_PROJECT= # export GOOGLE_CLOUD_KEYFILE= require "google/cloud/vision" vision = Google::Cloud::Vision.new image = vision.image "demo-image.jpg" puts image.labels https://cloud.google.com/vision/
- 104. Whats next?
- 105. (Data Scientist) v.s. (Data Engineer)A.I. is a featurewe will just use it as a gem
- 106. Top-down bottom-up ? http://www.csdn.net/article/2015-08-27/2825551 https://machinelearningmastery.com/dont-implement- machine-learning-algorithms/ Coursera
- 107. / http://www.csie.ntu.edu.tw/~htlin/mooc/ / https://www.slideshare.net/tw_dsconf/ss-62245351 edX BerkeleyX: CS190.1x Scalable Machine Learning SciRuby Machine Learning Current Status and Future https://speakerdeck.com/mrkn/sciruby-machine-learning-current-status-and-future / Feature Engineering in Machine Learning https://www.slideshare.net/tw_dsconf/feature-engineering-in-machine-learning Python Machine Learning, Packt Publishing Data Science from Scratch, OReilly