parallel and scalable machine learning€¦ · 1. parallel & scalable machine learning driven...

50
Parallel and Scalable Machine Learning Introduction to Machine Learning Models Prof. Dr. – Ing. Morris Riedel Associated Professor School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland Research Group Leader, Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany February 17, 2020 Juelich Supercomputing Centre, Germany Supervised Learning with a Simple Learning Model LECTURE 3 @MorrisRiedel @MorrisRiedel @Morris Riedel

Upload: others

Post on 05-Jun-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Parallel and Scalable Machine LearningIntroduction to Machine Learning Models

Prof. Dr. – Ing. Morris RiedelAssociated ProfessorSchool of Engineering and Natural Sciences, University of Iceland, Reykjavik, IcelandResearch Group Leader, Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany

February 17, 2020Juelich Supercomputing Centre, Germany

Supervised Learning with a Simple Learning Model

LECTURE 3 @MorrisRiedel@MorrisRiedel@Morris Riedel

Page 2: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Review of Lecture 2 – Introduction to Machine Learning Fundamentals

Lecture 3 – Supervised Learning with a Simple Learning Model 2 / 50

[11] C. Shearer, CRISP-DM model, Journal Data Warehousing, 5:13

[12] Image sources: Species Iris Group of North America Database, www.signa.org

x

[13] Perceptron Visualization

Page 3: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Outline of the Training Course

Lecture 3 – Supervised Learning with a Simple Learning Model 3 / 50

1. Parallel & Scalable Machine Learning driven by HPC

2. Introduction to Machine Learning Fundamentals

3. Supervised Learning with a Simple Learning Model

4. Artificial Neural Networks (ANNs)

5. Introduction to Statistical Learning Theory

6. Validation and Regularization

7. Pattern Recognition Systems

8. Parallel and Distributed Training of ANN

9. Supervised Learning with Deep Learning

10. Unsupervised Learning – Clustering

11. Clustering with HPC

12. Introduction to Deep Reinforcement Learning

Page 4: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Outline

Supervised Learning & Multi-Class Classification Problems Supervised Learning Revisited & Role of Deep Learning Frameworks Formalization of Machine Learning Fundamentals & Perceptron Model MNIST & Multi-Class Classification Problems Relevance of Data Exploration, Data Preparation & Normalization Multi-Output Perceptron Learning Model

Supervised Learning & Theory of Generalization Formalization of Supervised Learning & Mathematical Building Blocks Feasibility of Learning & Understanding the Theory of Generalization Role Learning Algorithms, and Final Hypothesis Different Models in Hypothesis Set & Unlimited ‘Degrees of Freedom‘ Using Training Dataset as Training Dataset and as Testing Dataset

Lecture 3 – Supervised Learning with a Simple Learning Model 4 / 50

Page 5: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Supervised Learning and Multi-Class Classification Problems

Lecture 3 – Supervised Learning with a Simple Learning Model 5 / 50

Page 6: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Learning Approaches – What means Learning from data – Revisited

Supervised Learning Majority of methods follow this

approach in this course Example: credit card approval based

on previous customer applications

Unsupervised Learning Often applied before other learning higher level data representation Example: Coin recognition in vending

machine based on weight and size

Reinforcement Learning Typical ‘human way‘ of learning Example: Toddler tries to touch a hot cup of tea (again and again)

The basic meaning of learning is ‘to use a set of observations to uncover an underlying process‘ The three different learning approaches are supervised, unsupervised, and reinforcement learning

[1] Image sources: Species Iris Group of North America Database, www.signa.org

[2] A.C. Cheng et al., ‘InstaNAS: Instance-aware Neural Architecture Search’, 2018

Lecture 3 – Supervised Learning with a Simple Learning Model 6 / 50

Page 7: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Learning Approaches – Supervised Learning – Revisited

Each observation of the predictor measurement(s)has an associated response measurement: Input Output Data (the output guides the learning process as a ‘supervisor‘)

Goal: Fit a model that relates the response to the predictors Prediction: Aims of accurately predicting the response for future observations Inference: Aims to better understanding the relationship between the

response and the predictors

Supervised learning approaches fits a model that related the response to the predictors Supervised learning approaches are used in classification algorithms such as SVMs Supervised learning works with data = [input, correct output]

Lecture 3 – Supervised Learning with a Simple Learning Model 7 / 50

Page 8: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Deep Learning Frameworks using GPUs also good for Artificial Neural Networks

TensorFlow One of the most popular deep learning frameworks available today Execution on multi-core CPUs or many-core GPUs

Keras is a high-level deep learning library implemented in Python that works on top of existing other rather low-level deep learning frameworks like Tensorflow, CNTK, or Theano

Created deep learning models with Keras run seamlessly on CPU and GPU via low-level deep learning frameworks The key idea behind the Keras tool is to enable faster experimentation with deep networks

[3] Tensorflow Web page Tensorflow is an open source library for deep learning models using a flow graph approach

Tensorflow nodes model mathematical operations and graph edges between the nodes are so-called tensors (also known as multi-dimensional arrays)

The Tensorflow tool supports the use of CPUs and GPUs (much more faster than CPUs) Tensorflow work with the high-level deep learning tool Keras in order to create models fast New versions of Tensorflow have Keras shipped with it as well & many further tools

Keras Often used in combination with low-level frameworks like Tensorflow

[4] Keras Web page

Lecture 3 – Supervised Learning with a Simple Learning Model 8 / 50

Page 9: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Test JURECA Access – Important Tutorial Setup Steps

In the case we use Jupyter JSC (https://jupyter-jsc.fz-juelich.de/) Start a JupyterLab,

Open a new Terminal and Enter the following commands (as it is explained here https://jupyter-jsc.fz-juelich.de/hub/static/files/projects.html ) wget --no-check-certificate https://jupyter-jsc.fz-juelich.de/static/files/symlinks.sh bash ~/symlinks.sh

Create a new folder where you can store all your personal items (practicals and new files) mkdir /p/project/training2001/$USER

Copy your own copy of the practicals cd /p/project/training2001/$USER cp -R /p/project/training2001/practicals .

Backup: In the case we use a SSH client (MobaXterm) Open the terminal

Create a new folder where you can store all your personal items (practicals and new files) mkdir /p/project/training2001/$USER

Copy your own copy of the practicals cd /p/project/training2001/$USER cp -R /p/project/training2001/practicals .

Lecture 3 – Supervised Learning with a Simple Learning Model 9 / 50

Page 10: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Perceptron Model – Mathematical Notation for one Neuron

BiasOutputSum

non-linearactivation function

linear combination of input data

TrainableWeights

Constants

Input Data

Simplify the perceptron learning model formula with techniques from linear algebra for mathematical convenience

Lecture 3 – Supervised Learning with a Simple Learning Model 10 / 50

Page 11: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Handwritten Character Recognition MNIST Dataset

Metadata Not very challenging dataset, but good for benchmarks & tutorials

When working with the dataset Dataset is not in any standard image format like jpg,

bmp, or gif (i.e. file format not known to a graphics viewer) Data samples are stored in a simple file format that is designed

for storing vectors and multidimensional matrices (i.e. numpy arrays) The pixels of the handwritten digit images are organized row-wise

with pixel values ranging from 0 (white background) to 255 (black foreground)

Images contain grey levels as a result of an anti-aliasing technique used by the normalization algorithm that generated this dataset

(10 class classification

problem)

Handwritten Character Recognition MNIST dataset is a subset of a larger dataset from US National Institute of Standards (NIST)

MNIST handwritten digits includes corresponding labels with values 0-9 and is therefore a labeled dataset

MNIST digits have been size-normalized to 28 * 28 pixels & are centered in a fixed-size image for direct processing

Two separate files for training & test:60000 training samples (~47 MB) &10000 test samples (~7.8 MB)

(downloads data into ~home/.keras/datasets asNPZ file format of numpy that provides

storage of array data using gzip compression)

Lecture 3 – Supervised Learning with a Simple Learning Model 11 / 50

Page 12: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset – Data Access in Python & HPC Download Challenges

Warning for very secure HPC environments Note that HPC batch nodes often do not allow for download of remote files (or was downloaded before, then ok!)

ComputeNode

LoginNode

ComputeNode

ComputeNode

ComputeNode

Scheduler

(downloads data into ~home/.keras/datasets asNPZ file format of numpy that provides

storage of array data using gzip compression)

A useful workaround for download remotely stored datasets and files is to start the Keras script on the login node and after data download stop the script for a proper execution on batch nodes for training & inference

Lecture 3 – Supervised Learning with a Simple Learning Model 12 / 50

Page 13: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Check MNIST Data

Lecture 3 – Supervised Learning with a Simple Learning Model 13 / 50

Page 14: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

DEEP Cluster: Check MNIST Data (in SSH) or JupyterLab

Execute on Login-Node the Data Exploration Script to Download MNIST Data Load the following module environment:

Python explore-MNIST-training.py

Lecture 3 – Supervised Learning with a Simple Learning Model 14 / 50

Page 15: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset – Training/Testing Datasets & One Character Encoding

Work on two disjoint datasets One for training only (i.e. training set) One for testing only (i.e. test set) Exact seperation is rule of thumb per use case

(e.g. 10 % training, 90% test) Practice: If you get a dataset take immediately test data away

(‘throw it into the corner and forget about it during modelling‘) Once we learned from training data it has an ‘optimistic bias‘ Usually start by exploring the dataset and its format & labels

Training Examples

(historical records, groundtruth data, examples)

‘test set’‘training set’

Different phases in machine learning Training phases is a hypothesis search Testing phase checks if we are on the right track

once the hypothesis is clear Validation phane for model selection (set fixed

parameters and set model types)

Lecture 3 – Supervised Learning with a Simple Learning Model 15 / 50

Page 16: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset – Data Exploration Script Training Data & JupyterLab Example

Loading MNIST training datasets (X) with labels (Y) stored in a binary numpy format

Format is 28 x 28 pixel values with grey level from 0 (white background) to 255 (black foreground)

Small helper function that prints row-wise one ‘hand-written‘ character with the grey levels stored in training dataset

Should reveal the nature of the number (aka label)

Example: loop of the training dataset (e.g. first 10 characters as shown here) At each loop interval the ‘hand-written‘ character (X) is printed in ‘matrix notation‘ & label (Y)

[5] Jupyter Web Page

Lecture 3 – Supervised Learning with a Simple Learning Model 16 / 50

Page 17: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset with Perceptron Learning Model – Need for Reshape

Two dimensional dataset (28 x 28) Does not fit well with input to Perceptron Model Need to prepare the data even more Reshape data we need one long vector

Note that the reshape from two dimensional MNIST data to one long vector means that we loose the surrounding context

Loosing the surrounding context is one factor why later in this lecture deep learning networks achieving essentially better performance by, e.g., keeping the surrounding context

Lecture 3 – Supervised Learning with a Simple Learning Model 17 / 50

Page 18: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset – Reshape & Normalization – Example

(numbers are between 0 and 1)

(one long input vectorwith length 784)

(two dimensional original input)

Lecture 3 – Supervised Learning with a Simple Learning Model 18 / 50

Page 19: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Perform Reshape & Normalization on Different Training / Testing Data

Lecture 3 – Supervised Learning with a Simple Learning Model 19 / 50

Page 20: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset & Multi Output Perceptron Model

10 Class Classification Problem Use 10 Perceptrons for 10 outputs with softmax activation function (enables probabilities for 10 classes)

Note that the output units are independent among each other in contrast to neural networks with one hidden layer The output of softmax gives class probabilities The non-linear Activation function ‘softmax‘ represents a generalization of the sigmoid function – it squashes an

n-dimensional vector of arbitrary real values into a n-dimenensional vector of real values in the range of 0 and 1 –here it aggregates 10 answers provided by the Dense layer with 10 neurons

(DenseLayer)

(outputprobabilities)

(SoftmaxLayer)

(NB_CLASSES = 10)(softmaxactivation)

(10 neurons sum with 10 bias)

(input m = 784)(parameters = 784 * 10 + 10 bias

= 7850)

Lecture 3 – Supervised Learning with a Simple Learning Model 20 / 50

Page 21: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset & Compile Multi Output Perceptron Model

Compile the model Optimizer as algorithm used to update

weights while training the model Specify loss function (i.e. objective

function) that is used by the optimizerto navigate the space of weights

(note: process of optimization is also called loss minimization, cf. Invitedlecture Gabriele Cavallaro)

Indicate metric for model evaluation(e.g., accuracy)

Specify loss function Compare prediction vs. given class label E.g. categorical crossentropy

Compile the model to be executed by the Keras backend (e.g. TensorFlow) Optimizer Gradient Descent (GD) uses all the training samples available for a

step within a iteration Optimizer Stochastic Gradient Descent (SGD) converges faster: only one

training samples used per iteration Loss function is a multi-class logarithmic loss: target is ti,j and prediction is pi,j Categorical crossentropy is suitable for multiclass label predictions (default

with softmax)

[6] Big Data Tips,Gradient Descent

Lecture 3 – Supervised Learning with a Simple Learning Model 21 / 50

Page 22: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Full Script: MNIST Dataset – Model Parameters & Data Normalization

NB_CLASSES: 10 Class Problem NB_EPOCH: number of times the model is exposed to the overall training set – at

each iteration the optimizer adjusts the weights so that the objective function is minimized

BATCH_SIZE: number of training instances taken into account before the optimizer performs a weight update to the model

OPTIMIZER: Stochastic Gradient Descent (‘SGD‘) – only one training sample/iteration

Data load shuffled between training and testing set in files Data preparation, e.g. X_train is 60000 samples / rows of 28 x 28 pixel values that are

reshaped in 60000 x 784 including type specification (i.e. float32) Data normalization: divide by 255 – the max intensity value

to obtain values in range [0,1]

Training Examples

(historical records, groundtruth data, examples)

‘test set’‘training set’

Lecture 3 – Supervised Learning with a Simple Learning Model 22 / 50

Page 23: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Full Script: MNIST Dataset – Fitting a Multi Output Perceptron Model

The Sequential() Keras model is a linear pipeline (aka ‘a stack‘) of various neural network layers including Activation functions of different types (e.g. softmax)

Dense() represents a fully connected layer used in ANNs that means that each neuron in a layer is connected to all neurons located in the previous layer

The non-linear activation function ‘softmax‘ is a generalization of the sigmoid function – it squashes an n-dimensional vector of arbitrary real values into a n-dimenensional vector of real values in the range of 0 and 1 – here it aggregates 10 answers provided by the Dense layer with 10 neurons

Loss function is a multi-class logarithmic loss: target is ti,j and the prediction is pi,j

Train the model (‘fit‘) using selected batch & epoch sizes on training & test data

(full script continued from previous slide)

Lecture 3 – Supervised Learning with a Simple Learning Model 23 / 50

Page 24: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Execute Multi Output Perceptron Model

Lecture 3 – Supervised Learning with a Simple Learning Model 24 / 50

Page 25: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Dataset – A Multi Output Perceptron Model – Output & Evaluation

(DenseLayer)

(outputprobabilities)

(SoftmaxLayer)

(NB_CLASSES = 10)(softmaxactivation)

(10 neurons sum with 10 bias)

(input m = 784)

How to improve the model design by extending the neural network topology? Which layers are required? Think about input layer need to match the data – what data we had? Maybe hidden layers? How many hidden layers? What activation function for which layer (e.g. maybe ReLU)? Think Dense layer – Keras? Think about final Activation as Softmax output probability

Lecture 3 – Supervised Learning with a Simple Learning Model 25 / 50

Page 26: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

[Video] Multi Output Perceptron – More Details

Lecture 3 – Supervised Learning with a Simple Learning Model 26 / 50

[8] YouTube Video, The Linear model with Multiple Inputs and Multiple Outputs Detail Explanation

Page 27: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Formalization of Supervised Learning & Theory of Generalization

Lecture 3 – Supervised Learning with a Simple Learning Model 27 / 50

Page 28: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Execute Multi Output Perceptron Model and Test on Training Dataset

Lecture 3 – Supervised Learning with a Simple Learning Model 28 / 50

Page 29: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Learning Approaches – Supervised Learning – Formalization

Each observation of the predictor measurement(s)has an associated response measurement: Input Output Data (the output guides the learning process as a ‘supervisor‘)

Goal: Fit a model that relates the response to the predictors Prediction: Aims of accurately predicting the response for future observations Inference: Aims to better understanding the relationship between the

response and the predictors

Supervised learning approaches fits a model that related the response to the predictors Supervised learning approaches are used in classification algorithms such as SVMs Supervised learning works with data = [input, correct output]

Lecture 3 – Supervised Learning with a Simple Learning Model 29 / 50

Training Examples

(historical records, groundtruth data, examples)

Page 30: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Feasibility of Learning from Data – Formalization

Theoretical framework underlying practical learning algorithms E.g. Support Vector Machines (SVMs) Best understood for ‘Supervised Learning‘ Valid for bascially all machine learning algorithms

Theoretical background used to solve ‘A learning problem‘ Inferring one ‘target function‘ that maps

between input and output Learned function can be used to

predict output from future input(fitting existing data is not enough)

Lecture 3 – Supervised Learning with a Simple Learning Model 30 / 50

Unknown Target Function

(ideal function)

Page 31: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Summary Terminologies & Importance of Theory of Generalization

Target Function Ideal function that ‘explains‘ the data we want to learn

Labelled Dataset (samples) ‘in-sample‘ data given to us:

Learning vs. Memorizing The goal is to create a system that works well ‘out of sample‘ In other words we want to classify ‘future data‘ (ouf of sample) correct

Dataset Part One: Training set Used for training a machine learning algorithms Result after using a training set: a trained system

Dataset Part Two: Test set Used for testing whether the trained system might work well Result after using a test set: accuracy of the trained model

Lecture 3 – Supervised Learning with a Simple Learning Model 31 / 50

Page 32: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Elements we not exactly

(need to) know

Elements wemust and/or

should have and that might raisehuge demands

for storage

Elementsthat we derive

from our skillsetand that can becomputationally

intensive

Elementsthat we

derive fromour skillset

Unknown Target Function

(ideal function)

Training Examples

(historical records, groundtruth data, examples)

Lecture 3 – Supervised Learning with a Simple Learning Model 32 / 50

Page 33: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Mathematical Building Blocks (1) – Our Linear Example

Unknown Target Function

(ideal function)

Training Examples

(historical records, groundtruth data, examples)

(decision boundaries depending on f)

Iris-virginica if

Iris-setosa if

(wi and threshold arestill unknown to us)

1. Some pattern exists2. No exact mathematical

formula (i.e. target function)3. Data exists

(if we would know the exact target function we dont need machine learning, it would not make sense)

(we search a function similiar like a target function)

Lecture 3 – Supervised Learning with a Simple Learning Model 33 / 50

Page 34: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Feasibility of Learning – Hypothesis Set & Final Hypothesis

The ‘ideal function‘ will remain unknown in learning Impossible to know and learn from data If known a straightforward implementation would be better than learning E.g. hidden features/attributes of data not known or not part of data

But ‘(function) approximation‘ of the target function is possible Use training examples to learn and approximate it Hypothesis set consists of m different hypothesis (candidate functions)

Unknown Target Function

Final HypothesisHypothesis Set

‘select one function‘that best approximates

Lecture 3 – Supervised Learning with a Simple Learning Model 34 / 50

Page 35: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Elements we not exactly

(need to) know

Elements wemust and/or

should have and that might raisehuge demands

for storage

Elementsthat we derive

from our skillsetand that can becomputationally

intensive

Elementsthat we

derive fromour skillset

Unknown Target Function

(ideal function)

Training Examples

(historical records, groundtruth data, examples)

Final Hypothesis

(set of candidate formulas)

Hypothesis Set

Lecture 3 – Supervised Learning with a Simple Learning Model 35 / 50

Page 36: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Mathematical Building Blocks (2) – Our Linear Example

(Perceptron model – linear model)

Hypothesis Set

Final Hypothesis

(decision boundaries depending on f)

(we search a function similiar like a target function)

(trained perceptron modeland our selected final hypothesis)

Lecture 3 – Supervised Learning with a Simple Learning Model 36 / 50

Page 37: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

The Learning Model: Hypothesis Set & Learning Algorithm

The solution tools – the learning model:1. Hypothesis set - a set of candidate formulas /models2. Learning Algorithm - ‘train a system‘ with known algorithms

Final HypothesisLearning Algorithm (‘train a system‘)

Hypothesis Set

Training Examples

Our Linear Example1. Perceptron Model2. Perceptron Learning

Algorithm (PLA)‘solution tools‘

Lecture 3 – Supervised Learning with a Simple Learning Model 37 / 50

Page 38: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Elements we not exactly

(need to) know

Elements wemust and/or

should have and that might raisehuge demands

for storage

Elementsthat we derive

from our skillsetand that can becomputationally

intensive

Elementsthat we

derive fromour skillset

Unknown Target Function

(ideal function)

Training Examples

(historical records, groundtruth data, examples)

Final Hypothesis

(set of candidate formulas)

Learning Algorithm (‘train a system‘)

Hypothesis Set

(set of known algorithms)

Lecture 3 – Supervised Learning with a Simple Learning Model 38 / 50

Page 39: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Mathematical Building Blocks (3) – Our Linear Example

Unknown Target Function

(ideal function)

Training Examples

(historical records, groundtruth data, examples)

Final Hypothesis

(Perceptron model – linear model)

Learning Algorithm (‘train a system‘)

Hypothesis Set

(Perceptron Learning Algorithm)

(trained perceptron modeland our selected final hypothesis)

(training data)

(training phase;Find wi and threshold that fit the data)(algorithm uses

training dataset)

Lecture 3 – Supervised Learning with a Simple Learning Model 39 / 50

Page 40: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Different Models – Hypothesis Set & Unlimited ‘Degrees of Freedom‘

Hypothesis Set

(all candidate functionsderived from models and their parameters)

(e.g. support vector machine model)

(e.g. linear perceptron model)

Final Hypothesis

‘select one function‘that best approximates

Choosing from various model approaches h1, …, hm is a different hypothesis

Additionally a change in model parameters of h1, …, hm means a different hypothesis too

(e.g. artificial neural network model)

Lecture 3 – Supervised Learning with a Simple Learning Model 40 / 50

Page 41: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

MNIST Data – Testing on Training Dataset – Results & Discussion

Memorizing vs. Generalization How much we got? We memorize the data

and created a model from it No unseen data used

this no generalization!

Lecture 3 – Supervised Learning with a Simple Learning Model 41 / 50

Page 42: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Train Perceptron on Testing Dataset and Test on Training Dataset

Lecture 3 – Supervised Learning with a Simple Learning Model 42 / 50

Page 43: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Exercises – Train on Testing Dataset & Test on Training Dataset & Increase Epochs

Lecture 3 – Supervised Learning with a Simple Learning Model 43 / 50

Page 44: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

[Video] Neural Networks Summary

Lecture 3 – Supervised Learning with a Simple Learning Model 44 / 50

[7] YouTube Video, Neural Networks – A Simple Explanation

Page 45: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Lecture Bibliography

Lecture 3 – Supervised Learning with a Simple Learning Model 45 / 50

Page 46: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Lecture Bibliography (1)

[1] Species Iris Group of North America Database, Online: http://www.signa.org

[2] Cheng, A.C, Lin, C.H., Juan, D.C., InstaNAS: Instance-aware Neural Architecture Search, Online: https://arxiv.org/abs/1811.10201

[3] Tensorflow, Online: https://www.tensorflow.org/

[4] Keras Python Deep Learning Library, Online: https://keras.io/

[5] Jupyter Web Page, Online:https://jupyter.org/

[6] Big Data Tips, ‘Gradient Descent‘, Online: http://www.big-data.tips/gradient-descent

[7] YouTube Video, ’Neural Networks, A Simple Explanation’, Online: http://www.youtube.com/watch?v=gcK_5x2KsLA

[8] YouTube Video, The Linear model with Multiple Inputs and Multiple Outputs Detail Explanation’, Online: https://www.youtube.com/watch?v=zZg59p0sGVY

[9] Scikit-Learn, Online:https://scikit-learn.org

[10] M.Goetz, M. Riedel et al.,‘HPDBSCAN – Highly Parallel DBSCAN‘, Proceedings of MLHPC Workshop at Supercomputing 2015, Online: https://www.researchgate.net/publication/301463871_HPDBSCAN_highly_parallel_DBSCAN

Lecture 3 – Supervised Learning with a Simple Learning Model 46 / 50

Page 47: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Lecture Bibliography (2)

[11] C. Shearer, CRISP-DM model, Journal Data Warehousing, 5:13 [12] Species Iris Group of North America Database, Online:

http://www.signa.org [13] YouTube Video, ‘Perceptron, Linear‘, Online:

https://www.youtube.com/watch?v=FLPvNdwC6Qo

Lecture 3 – Supervised Learning with a Simple Learning Model 47 / 50

Page 48: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Acknowledgements

Lecture 3 – Supervised Learning with a Simple Learning Model 48 / 50

Page 49: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Acknowledgements – High Productivity Data Processing Research Group

ThesisCompleted

PD Dr.G. Cavallaro

Dr. M. Goetz(now KIT)

Senior PhDStudent A.S. Memon

Senior PhDStudent M.S. Memon

MSc M.Richerzhagen

(now other division)

ThesisCompleted

MSc P. Glock

(now INM-1)

DEEP LearningStartup

MScC. Bodenstein

(now Soccerwatch.tv)

PhD Student E. Erlingsson

PhD Student S. Bakarat

Startedin Spring

2019

MSc Student G.S. Guðmundsson(Landsverkjun)

This research group has received funding from the European Union's Horizon 2020 research and

innovation programme under grant agreement No 763558

(DEEP-EST EU Project)

Finished PHDin 2019

Finishingin Winter

2019

PhD Student R. Sedona

Finished PHDin 2016

Finished PHDin 2018

Mid-Termin Spring

2019

Startedin Spring

2019

Lecture 3 – Supervised Learning with a Simple Learning Model 49 / 50

Page 50: Parallel and Scalable Machine Learning€¦ · 1. Parallel & Scalable Machine Learning driven by HPC 2. Introduction to Machine Learning Fundamentals 3. Supervised Learning with a

Lecture 3 – Supervised Learning with a Simple Learning Model 50 / 50