school ofengineering andnatural sciences, university ... · outline of the course 1. cloud...

40
PARALLEL & SCALABLE MACHINE LEARNING & DEEP LEARNING Prof. Dr. – Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing Centre, Germany Short Introduction to Python & Jupyther September 4 th , 2018 Webinar Cloud Computing & Big Data PRACTICAL LECTURE 0.1

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

PARALLEL & SCALABLE MACHINE LEARNING & DEEP LEARNING

Prof. Dr. – Ing. Morris RiedelAdjunct Associated ProfessorSchool of Engineering and Natural Sciences, University of IcelandResearch Group Leader, Juelich Supercomputing Centre, Germany

Short Introduction to Python & JupytherSeptember 4th, 2018Webinar

Cloud Computing & Big Data

PRACTICAL LECTURE 0.1

Page 2: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Outline of the Course

1. Cloud Computing & Big Data

2. Machine Learning Models in Clouds

3. Apache Spark for Cloud Applications

4. Virtualization & Data Center Design

5. Map-Reduce Computing Paradigm

6. Deep Learning driven by Big Data

7. Deep Learning Applications in Clouds

8. Infrastructure-As-A-Service (IAAS)

9. Platform-As-A-Service (PAAS)

10. Software-As-A-Service (SAAS)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

11. Data Analytics & Cloud Data Mining

12. Docker & Container Management

13. OpenStack Cloud Operating System

14. Online Social Networking & Graphs

15. Data Streaming Tools & Applications

16. Epilogue

+ additional practical lectures for our

hands-on exercises in context

Practical Topics

Theoretical / Conceptual Topics2 / 40

Page 3: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Outline

Python Environments Python Programming Language Powerful NumPy Python Library Deep Learning with Tensorflow & Keras Jupyther & iPython Anaconda Distribution & Installation

Selected Python Demonstrations Basic Variables & Hello World Simple Loops & If Statements Vectors Matrices Data Preprocessing Script Example

This Lecture is not considered to be a fullintroduction to Python and the use of Jupyther

The goal of this lecture isto make courseparticipants aware of thePython environment theywork with in the light ofthe topics of this course

3 / 40

Page 4: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Python Environments

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 4 / 40

Page 5: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Python Programming Language – Introduction

Selected Benefits Simple & flexible programming language Is an interpreted powerful programming language Has Efficient high-level data structures Provides a simple but effective approach

to object-oriented programming Powerful libraries like ‘math’ library for

inputs of functions with real numbers Python script support is offered in almost

every cloud computing environment as a programming language today

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Python is an ideal language for fast scripting and rapid application development that in turn makesit interesting for the machine learning modeling process and easy access to cloud resources

Our course assignments take advantage of Python, but nobody needs to be a full Python Expert

[1] Webpage Python

5 / 40

Page 6: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Python Programming Language – Importance & Reasoning

Key Benefits for this course (and beyond) Relatively small number of lines of code needed Great libraries & community support

(e.g. numpy, tensorflow, keras, etc.) Work with many students reveal:

qucky & easy to learn for experiments

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

The machine learning modeling process in general and the deep learning modeling process in particular requires iterative and highly flexible approaches when working with ‘big data‘

E.g. network topology prototyping, hyper-parameter tuning, data sampling, etc.

Our course assignments take advantage of Python, but nobody needs to be a full Python Expert

[2] F. Chollet, ‘Deep Learning with Python’ Book

6 / 40

Page 7: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Powerful NumPy Python Library

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

[3] NumPy Library Web Page

NumPy is one of the most important Python libraries used in cloud computing and big data analysis NumPy is used as an efficient multi-dimensional container of generic data, vectors, matrices, etc. Instead of math library used for functions with real numbers we use NumPy for vectors & matrices

Selected features Useful linear algebra and

random number capabilities Supports powerful N-dimensional array objects Interesting broadcasting functions Tools for integrating C/C++ and Fortran code Particularly nicely supports work on vectors

& matrices that are useful when working with machine learning & big data

Our course assignments take advantage of NumPy for various aspectcs like working with vectors

(small number of code lines tocreate & print a 1x3 row vector

with 3 random values)

7 / 40

Page 8: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

‘Big Data‘ drives Deep Learning Models – Revisited

Approach: Learn Features Classical Machine Learning (Powerful computing evolved) Deep (Feature) Learning

Very succesful for image recognition and other emerging areas Assumption: data was generated by the interactions of many different

factors on different levels (i.e. form a hierarchical representation)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Lecture 6 & 7 provide a short introduction to deep learning models and applications in clouds

[4] H. Lee et al.

8 / 40

Page 9: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Deep Learning in Clouds – Using Python & TensorFlow

Cloud Computing Enables large-scale

deep learning from ‘big data‘ Deep learning library TensorFlow

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Tensorflow is an open source library for deep learning models using a flow graph approach Tensorflow nodes model mathematical operations and graph edges between the nodes are so-

called tensors (also known as multi-dimensional arrays) The Tensorflow tool supports the use of CPUs and GPUs (much more faster than CPU versions) Tensorflow work with the high-level deep learning tool Keras in order to create models fast

[5] Tensorflow Deep Learning Framework

Our course assignments take advantage of TensorFlow in conjunction with Python scripts & libs

[6] A Tour of Tensorflow

[7] Distributed & Cloud Computing Book

(works well on graphicalprocessing units = GPUs)

9 / 40

Page 10: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Deep Learning in Clouds – What are Tensors?

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

A Tensor is nothing else than a multi-dimensional arrayoften used in scientific & engineering environments

Tensors are best understood when comparing it withvectors or matrices and their dimensions

Those tensors ‘flow‘ through the deep learning networkduring the optimization / learning & inference process

[7] Big Data Tips, What is a Tensor?

Lecture 6 & 7 provide a a more thorough introduction to tensors of various dimensions

(deep learning networkuse tensors much and

NumPy is good to workin Python scripts with these)

10 / 40

Page 11: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Deep Learning in Clouds – Using Python & Keras

Cloud Computing Enables large-scale

deep learning from ‘big data‘ Deep learning library Keras

on top of TensorFlow

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Keras is a high-level deep learning library implemented in Python that works on top of existingother rather low-level deep learning frameworks like Tensorflow, CNTK, or Theano

The key idea behind the Keras tool is to enable faster experimentation with deep networks Created deep learning models run seamlessly on CPU and GPU via low-level frameworks

Our course assignments take advantage of Keras in conjunction with Python scripts & libs

[9] Keras Python Deep Learning Library

keras.layers.Dense(units,

activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

keras.optimizers.SGD(lr=0.01,

momentum=0.0, decay=0.0, nesterov=False)

11 / 40

Page 12: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Jupyter Tool – Interactive Data Analysis using Clouds

Selected Facts Interactive & Web-based

Computing tool using theWeb browser on local machine

Facilitates access to differentavailable cloud platforms

Often used in data analysis Part of the Anaconda distribution

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Jupyther enables the creation and sharing of source codes using so-called ‘Jupyther notebooks’ Jupyther notebooks contain live code, equations, formulas, visualizations & explanatory text

The Jupyter tool is used in practical lectures & our assignments via various Jupyther notebooks

[10] Jupyther Web page

12 / 40

(press ‘shift + enter‘)

Page 13: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Jupyther Tool Example – Starting a Python Command Shell

Startup Jupyter E.g. new menu Python 3

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 13 / 40

Page 14: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Anaconda Distribution

Selected Facts Free & open source distribution Installs Python as well separate

from other installed versions Installation of 1,400+ data science

packages for Python & Statistical Computing with R

Available for Windows, Mac & Linux

Anaconda significantly simplifiesthe Windows setup

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

[11] Anaconda Web page

The Anaconda distribution includes multiple tools for installing & updating Python & its package Anaconda represents one of the most popular Python data science platforms (~6 million users)

Recommendation to install Anaconda to experiment locally on the laptop before using Clouds14 / 40

Page 15: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Download and Install Anaconda Distribution (1)

Choose Anaconda Distribution from Anaconda Products

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 15 / 40

Page 16: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Download and Install Anaconda Distribution (2)

Choose your Operating System & Download (~500 – ~630 MB)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 16 / 40

Page 17: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Download and Install Anaconda Distribution (3)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Leave your contact details with email and role Incentive: You receive a cheat sheet to better work with Anaconda

17 / 40

Page 18: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Download and Install Anaconda Distribution (4)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 18 / 40

Page 19: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Download and Install Anaconda Distribution (5)

Installation of Python

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 19 / 40

Page 20: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Finished

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 20 / 40

Page 21: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Selected Python Demonstrations

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 21 / 40

Page 22: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Jupyther Tool Example – Starting a Python Command Shell

Startup Jupyter E.g. new menu Python 3

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 22 / 40

Page 23: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Basic Variables

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 23 / 40

Page 24: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Basic Loops

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

For Loop Syntax:

for <variable> in <sequence>: <statements>

else: <statements>

24 / 40

Page 25: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

If Statements

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 25 / 40

Page 26: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Working with Vectors – Load NumPy Library

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 26 / 40

(notice Python &case sensitivity)

(could be imported in another cell than thecurrent one – was it really executed?

otherwise np is unknown – the cells areconnected throughout the script)

Page 27: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Vectors

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Note difference rank 1 arrays & true vectors Difference is important and lead to many bugs in scripts

when working with machine & deep learning models

27 / 40

Page 28: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Matrices

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 28 / 40

Page 29: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Handwritten Character Recognition MNIST Dataset

Metadata Subset of a larger dataset from US National Institute of Standards (NIST) Handwritten digits including corresponding labels with values 0 to 9 All digits have been size-normalized to 28 * 28 pixels

and are centered in a fixed-size image for direct processing Not very challenging dataset, but good for experiments / tutorials

Dataset Samples Labelled data (10 classes) Two separate files

for training and test 60000 training samples (~47 MB) 10000 test samples (~7.8 MB)

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 29 / 40

[12] MNIST Database Web page

Page 30: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Python/NumPy Binary Files Example

When working with the dataset Dataset is not in any standard image format like jpg, bmp, or gif One needs to write typically a small program to read and work for them Data samples are stored in a simple file format that is designed for

storing vectors and multidimensional matrices (here numpy binary files) The pixels of the handwritten digit images are organized row-wise with

pixel values ranging from 0 (white background) to 255 (black foreground) Images contain grey levels as a result of an anti-aliasing technique used

by the normalization algorithm that generated this dataset.

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 30 / 40

Page 31: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Exploration – Training Dataset

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 31 / 40

Page 32: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Exploration Script

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Loading MNIST training datasets (X) with labels (Y) stored in a binary numpy format

Format is 28 x 28 pixel values with grey level from 0 (white background) to 255 (black foreground)

Small helper function that prints row-wise one ‘hand-written‘ character with the grey levels stored in training dataset

Should reveal the nature of the number (aka label)

Loop of the training dataset and the testing dataset (e.g. first 10 characters as shown here) At each loop interval the ‘hand-written‘ character (X) is printed in ‘matrix notation‘ & label (Y)

32 / 40

Page 33: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Exploration – Selected Training Samples

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 33 / 40

Page 34: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Exploration Script Testing

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 34 / 40

Page 35: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Reshape & Normalization Example

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

Loading MNIST training datasets (X) and testing datasets (Y) stored in a binary numpy format with labels for X and Y

Format is 28 x 28 pixel values with grey level from 0 (white background) to 255 (black foreground)

Reshape from 28 x 28 matrix of pixels to 784 pixel values considered to be the input for the neural networks later

Normalization is added for mathematical convenience since the computing with numbers get easier (not too large)

35 / 40

Page 36: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

MNIST Dataset – Reshape & Normalization – Result

Practical Lecture 0.1 – Short Introduction to Python & Jupyther

(numbers are between 0 and 1)

36 / 40

Page 37: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Lecture Bibliography

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 37 / 40

Page 38: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Lecture Bibliography (1)

[1] Official Web Page for Python Programming Language, Online: https://www.python.org/

[2] François Chollet ‘Deep Learning with Python‘, Book, ISBN 9781617294433, 384 pages, 2017,Online: https://www.manning.com/books/deep-learning-with-python

[3] Numpy Python Library Web Page,Online: http://www.numpy.org/

[4] H. Lee et al., ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’, Proceedings of the 26th annual International Conference on Machine Learning (ICML), ACM, 2009

[5] Tensorflow Deep Learning Framework,Online: https://www.tensorflow.org/

[6] A Tour of Tensorflow,Online: https://arxiv.org/pdf/1610.01178.pdf

[7] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online: http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049

[8] Big Data Tips, ‘What is a Tensor?‘,Online: http://www.big-data.tips/what-is-a-tensor

[9] Keras Python Deep Learning Library, Online: https://keras.io/

[10] Jupyther Web page,Online: http://jupyter.org/

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 38 / 40

Page 39: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Lecture Bibliography (2)

[11] Anaconda Web page, Online: https://www.anaconda.com/

[12] MNIST Database Web page,OInline: http://yann.lecun.com/exdb/mnist/

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 39 / 40

Page 40: School ofEngineering andNatural Sciences, University ... · Outline of the Course 1. Cloud Computing & Big Data 2. Machine Learning Models in Clouds 3. Apache Spark for Cloud Applications

Practical Lecture 0.1 – Short Introduction to Python & Jupyther 40 / 40