machine learning 101 dkom 2017

22
Machine Learning 101 Fred Verheul

Upload: fredverheul

Post on 15-Apr-2017

801 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Machine learning 101 dkom 2017

Machine Learning 101

Fred Verheul

Page 2: Machine learning 101 dkom 2017

2

Machine Learning

"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)

Page 3: Machine learning 101 dkom 2017

3

What is Machine Learning?

Computer

Computer

Traditional Programming

Machine Learning

Data

Data

Program Output

ProgramOutput

Page 4: Machine learning 101 dkom 2017

4

Prediction is hard…

Page 5: Machine learning 101 dkom 2017

5

Sweet spot for Machine Learning

• It’s impossible to write down the rules in code:• Too many rules• Too many factors influencing the rules• Too finely tuned• We just don’t know the rules (image recognition)

• Lots of labeled data (examples) available (e.g. historical data)

Page 6: Machine learning 101 dkom 2017

6

Basic Machine Learning ‘workflow’

Feature Vectors

Training data

Labels

Machine Learning Algorithm

Feature Vectors

New data Prediction

Training Phase

Operational Phase

Predictive Model

Page 7: Machine learning 101 dkom 2017

7

Training Phase in more detail

Raw dataData

preparation Feature Vectors

Training Data

Test data

Model Building (by ML

algorithm)

Model Evaluation

Predictive Model

Feedback loop

data cleansingdata transformation

normalizationfeature extraction

aka ‘learning’

Page 8: Machine learning 101 dkom 2017

8

Examples of ML tasksSupervised learning

Regression target is numeric

Classification target is categorical

Unsupervised learning

Clustering

Dimensionalityreduction

Page 9: Machine learning 101 dkom 2017

9

Modeling: so many algorithms…

Page 10: Machine learning 101 dkom 2017

10

ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space

Decision trees

Instance-based

Neural networks

Model ensembles

Page 11: Machine learning 101 dkom 2017

ML Algorithms: by Evaluation

Evaluation: Quality measure for a model

11

Regression

Example metric: Root Mean Squared Error

RMSE =

Binary classification: confusion matrix

Accuracy: 8 + 971 -> 97,9%

Example: medical test for a disease

Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)

Page 12: Machine learning 101 dkom 2017

12

Optimization: how the algorithm ‘learns’, depends on representation and evaluation

ML Algorithms: by Optimization

Greedy Search, ex. of combinatorial optimization

Gradient Descent (or in general: Convex Optimization)

Linear Programming (or in general:Constrained/Nonlinear Optimization)

Page 13: Machine learning 101 dkom 2017

13

Training error vs test error

Page 14: Machine learning 101 dkom 2017

14

Data Science for Business

• Focuses more on general principles than specific algorithms

• Not math-heavy, does contain some math

• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do

• Book website: http://data-science-for-biz.com/DSB/Home.html

Page 15: Machine learning 101 dkom 2017

15

What has NOT been covered (1)

• Deep learning / Neural Networks

• Covered in other presentations at DKOM

• Also recommended for further reading (deep dive):• http://neuralnetworksanddeeplearning.com/index.html

• Specifics of ML-algorithms

• All over the internet… e.g. at http://machinelearningmastery.com/

Page 16: Machine learning 101 dkom 2017

16

What has NOT been covered (2)

• Libraries (examples):• Tensorflow, Caffe, Theano, Keras• SciPy & scikit-learn• Spark MLLib (Scala/Java/Python)

• Programming languages:

Page 17: Machine learning 101 dkom 2017

17

What has NOT been covered (3)

• SAP products:

• SAP HANA, SAP HANA Vora, SAP BO Predictive Analytics(!), HCP Predictive Services

• New machine learning platform

• Hardware

• Nvidia talk about GPUs

Page 18: Machine learning 101 dkom 2017

18

What has NOT been covered (4)

• Ethics and algorithmic transparency:

Page 19: Machine learning 101 dkom 2017

19

What has NOT been covered (5)

• The Data Science & Data Mining Process:

Page 20: Machine learning 101 dkom 2017

20

What has NOT been covered (6)

• How to integrate ML into your business application

• I hope SAP is figuring that out as we speak ;-)

• Have a look at SAP Predictive Analytics Integrator

• https://help.sap.com/pai

Page 21: Machine learning 101 dkom 2017

21

Take-aways

• Goal of ML: generalize from training data (not optimization!!)

• No magic! Just some clever algorithms…

• Increasingly important non-technical aspects:• Ethics

• Algorithmic transparency

Page 22: Machine learning 101 dkom 2017

Thank [email protected]@SOAPEOPLE

Fred VerheulBig Data Consultant+31 6 3919 [email protected]