school ofengineering andnatural sciences, university ... · outline of the course 1. cloud...
TRANSCRIPT
PARALLEL & SCALABLE MACHINE LEARNING & DEEP LEARNING
Prof. Dr. – Ing. Morris RiedelAdjunct Associated ProfessorSchool of Engineering and Natural Sciences, University of IcelandResearch Group Leader, Juelich Supercomputing Centre, Germany
Short Introduction to Python & JupytherSeptember 4th, 2018Webinar
Cloud Computing & Big Data
PRACTICAL LECTURE 0.1
Outline of the Course
1. Cloud Computing & Big Data
2. Machine Learning Models in Clouds
3. Apache Spark for Cloud Applications
4. Virtualization & Data Center Design
5. Map-Reduce Computing Paradigm
6. Deep Learning driven by Big Data
7. Deep Learning Applications in Clouds
8. Infrastructure-As-A-Service (IAAS)
9. Platform-As-A-Service (PAAS)
10. Software-As-A-Service (SAAS)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
11. Data Analytics & Cloud Data Mining
12. Docker & Container Management
13. OpenStack Cloud Operating System
14. Online Social Networking & Graphs
15. Data Streaming Tools & Applications
16. Epilogue
+ additional practical lectures for our
hands-on exercises in context
Practical Topics
Theoretical / Conceptual Topics2 / 40
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Outline
Python Environments Python Programming Language Powerful NumPy Python Library Deep Learning with Tensorflow & Keras Jupyther & iPython Anaconda Distribution & Installation
Selected Python Demonstrations Basic Variables & Hello World Simple Loops & If Statements Vectors Matrices Data Preprocessing Script Example
This Lecture is not considered to be a fullintroduction to Python and the use of Jupyther
The goal of this lecture isto make courseparticipants aware of thePython environment theywork with in the light ofthe topics of this course
3 / 40
Python Environments
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 4 / 40
Python Programming Language – Introduction
Selected Benefits Simple & flexible programming language Is an interpreted powerful programming language Has Efficient high-level data structures Provides a simple but effective approach
to object-oriented programming Powerful libraries like ‘math’ library for
inputs of functions with real numbers Python script support is offered in almost
every cloud computing environment as a programming language today
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Python is an ideal language for fast scripting and rapid application development that in turn makesit interesting for the machine learning modeling process and easy access to cloud resources
Our course assignments take advantage of Python, but nobody needs to be a full Python Expert
[1] Webpage Python
5 / 40
Python Programming Language – Importance & Reasoning
Key Benefits for this course (and beyond) Relatively small number of lines of code needed Great libraries & community support
(e.g. numpy, tensorflow, keras, etc.) Work with many students reveal:
qucky & easy to learn for experiments
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
The machine learning modeling process in general and the deep learning modeling process in particular requires iterative and highly flexible approaches when working with ‘big data‘
E.g. network topology prototyping, hyper-parameter tuning, data sampling, etc.
Our course assignments take advantage of Python, but nobody needs to be a full Python Expert
[2] F. Chollet, ‘Deep Learning with Python’ Book
6 / 40
Powerful NumPy Python Library
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
[3] NumPy Library Web Page
NumPy is one of the most important Python libraries used in cloud computing and big data analysis NumPy is used as an efficient multi-dimensional container of generic data, vectors, matrices, etc. Instead of math library used for functions with real numbers we use NumPy for vectors & matrices
Selected features Useful linear algebra and
random number capabilities Supports powerful N-dimensional array objects Interesting broadcasting functions Tools for integrating C/C++ and Fortran code Particularly nicely supports work on vectors
& matrices that are useful when working with machine learning & big data
Our course assignments take advantage of NumPy for various aspectcs like working with vectors
(small number of code lines tocreate & print a 1x3 row vector
with 3 random values)
7 / 40
‘Big Data‘ drives Deep Learning Models – Revisited
Approach: Learn Features Classical Machine Learning (Powerful computing evolved) Deep (Feature) Learning
Very succesful for image recognition and other emerging areas Assumption: data was generated by the interactions of many different
factors on different levels (i.e. form a hierarchical representation)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Lecture 6 & 7 provide a short introduction to deep learning models and applications in clouds
[4] H. Lee et al.
8 / 40
Deep Learning in Clouds – Using Python & TensorFlow
Cloud Computing Enables large-scale
deep learning from ‘big data‘ Deep learning library TensorFlow
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Tensorflow is an open source library for deep learning models using a flow graph approach Tensorflow nodes model mathematical operations and graph edges between the nodes are so-
called tensors (also known as multi-dimensional arrays) The Tensorflow tool supports the use of CPUs and GPUs (much more faster than CPU versions) Tensorflow work with the high-level deep learning tool Keras in order to create models fast
[5] Tensorflow Deep Learning Framework
Our course assignments take advantage of TensorFlow in conjunction with Python scripts & libs
[6] A Tour of Tensorflow
[7] Distributed & Cloud Computing Book
(works well on graphicalprocessing units = GPUs)
9 / 40
Deep Learning in Clouds – What are Tensors?
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
A Tensor is nothing else than a multi-dimensional arrayoften used in scientific & engineering environments
Tensors are best understood when comparing it withvectors or matrices and their dimensions
Those tensors ‘flow‘ through the deep learning networkduring the optimization / learning & inference process
[7] Big Data Tips, What is a Tensor?
Lecture 6 & 7 provide a a more thorough introduction to tensors of various dimensions
(deep learning networkuse tensors much and
NumPy is good to workin Python scripts with these)
10 / 40
Deep Learning in Clouds – Using Python & Keras
Cloud Computing Enables large-scale
deep learning from ‘big data‘ Deep learning library Keras
on top of TensorFlow
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Keras is a high-level deep learning library implemented in Python that works on top of existingother rather low-level deep learning frameworks like Tensorflow, CNTK, or Theano
The key idea behind the Keras tool is to enable faster experimentation with deep networks Created deep learning models run seamlessly on CPU and GPU via low-level frameworks
Our course assignments take advantage of Keras in conjunction with Python scripts & libs
[9] Keras Python Deep Learning Library
keras.layers.Dense(units,
activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
keras.optimizers.SGD(lr=0.01,
momentum=0.0, decay=0.0, nesterov=False)
11 / 40
Jupyter Tool – Interactive Data Analysis using Clouds
Selected Facts Interactive & Web-based
Computing tool using theWeb browser on local machine
Facilitates access to differentavailable cloud platforms
Often used in data analysis Part of the Anaconda distribution
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Jupyther enables the creation and sharing of source codes using so-called ‘Jupyther notebooks’ Jupyther notebooks contain live code, equations, formulas, visualizations & explanatory text
The Jupyter tool is used in practical lectures & our assignments via various Jupyther notebooks
[10] Jupyther Web page
12 / 40
(press ‘shift + enter‘)
Jupyther Tool Example – Starting a Python Command Shell
Startup Jupyter E.g. new menu Python 3
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 13 / 40
Anaconda Distribution
Selected Facts Free & open source distribution Installs Python as well separate
from other installed versions Installation of 1,400+ data science
packages for Python & Statistical Computing with R
Available for Windows, Mac & Linux
Anaconda significantly simplifiesthe Windows setup
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
[11] Anaconda Web page
The Anaconda distribution includes multiple tools for installing & updating Python & its package Anaconda represents one of the most popular Python data science platforms (~6 million users)
Recommendation to install Anaconda to experiment locally on the laptop before using Clouds14 / 40
Download and Install Anaconda Distribution (1)
Choose Anaconda Distribution from Anaconda Products
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 15 / 40
Download and Install Anaconda Distribution (2)
Choose your Operating System & Download (~500 – ~630 MB)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 16 / 40
Download and Install Anaconda Distribution (3)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Leave your contact details with email and role Incentive: You receive a cheat sheet to better work with Anaconda
17 / 40
Download and Install Anaconda Distribution (4)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 18 / 40
Download and Install Anaconda Distribution (5)
Installation of Python
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 19 / 40
Finished
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 20 / 40
Selected Python Demonstrations
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 21 / 40
Jupyther Tool Example – Starting a Python Command Shell
Startup Jupyter E.g. new menu Python 3
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 22 / 40
Basic Variables
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 23 / 40
Basic Loops
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
For Loop Syntax:
for <variable> in <sequence>: <statements>
else: <statements>
24 / 40
If Statements
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 25 / 40
Working with Vectors – Load NumPy Library
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 26 / 40
(notice Python &case sensitivity)
(could be imported in another cell than thecurrent one – was it really executed?
otherwise np is unknown – the cells areconnected throughout the script)
Vectors
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Note difference rank 1 arrays & true vectors Difference is important and lead to many bugs in scripts
when working with machine & deep learning models
27 / 40
Matrices
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 28 / 40
Handwritten Character Recognition MNIST Dataset
Metadata Subset of a larger dataset from US National Institute of Standards (NIST) Handwritten digits including corresponding labels with values 0 to 9 All digits have been size-normalized to 28 * 28 pixels
and are centered in a fixed-size image for direct processing Not very challenging dataset, but good for experiments / tutorials
Dataset Samples Labelled data (10 classes) Two separate files
for training and test 60000 training samples (~47 MB) 10000 test samples (~7.8 MB)
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 29 / 40
[12] MNIST Database Web page
MNIST Dataset – Python/NumPy Binary Files Example
When working with the dataset Dataset is not in any standard image format like jpg, bmp, or gif One needs to write typically a small program to read and work for them Data samples are stored in a simple file format that is designed for
storing vectors and multidimensional matrices (here numpy binary files) The pixels of the handwritten digit images are organized row-wise with
pixel values ranging from 0 (white background) to 255 (black foreground) Images contain grey levels as a result of an anti-aliasing technique used
by the normalization algorithm that generated this dataset.
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 30 / 40
MNIST Dataset – Exploration – Training Dataset
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 31 / 40
MNIST Dataset – Exploration Script
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Loading MNIST training datasets (X) with labels (Y) stored in a binary numpy format
Format is 28 x 28 pixel values with grey level from 0 (white background) to 255 (black foreground)
Small helper function that prints row-wise one ‘hand-written‘ character with the grey levels stored in training dataset
Should reveal the nature of the number (aka label)
Loop of the training dataset and the testing dataset (e.g. first 10 characters as shown here) At each loop interval the ‘hand-written‘ character (X) is printed in ‘matrix notation‘ & label (Y)
32 / 40
MNIST Dataset – Exploration – Selected Training Samples
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 33 / 40
MNIST Dataset – Exploration Script Testing
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 34 / 40
MNIST Dataset – Reshape & Normalization Example
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
Loading MNIST training datasets (X) and testing datasets (Y) stored in a binary numpy format with labels for X and Y
Format is 28 x 28 pixel values with grey level from 0 (white background) to 255 (black foreground)
Reshape from 28 x 28 matrix of pixels to 784 pixel values considered to be the input for the neural networks later
Normalization is added for mathematical convenience since the computing with numbers get easier (not too large)
35 / 40
MNIST Dataset – Reshape & Normalization – Result
Practical Lecture 0.1 – Short Introduction to Python & Jupyther
(numbers are between 0 and 1)
36 / 40
Lecture Bibliography
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 37 / 40
Lecture Bibliography (1)
[1] Official Web Page for Python Programming Language, Online: https://www.python.org/
[2] François Chollet ‘Deep Learning with Python‘, Book, ISBN 9781617294433, 384 pages, 2017,Online: https://www.manning.com/books/deep-learning-with-python
[3] Numpy Python Library Web Page,Online: http://www.numpy.org/
[4] H. Lee et al., ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’, Proceedings of the 26th annual International Conference on Machine Learning (ICML), ACM, 2009
[5] Tensorflow Deep Learning Framework,Online: https://www.tensorflow.org/
[6] A Tour of Tensorflow,Online: https://arxiv.org/pdf/1610.01178.pdf
[7] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online: http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049
[8] Big Data Tips, ‘What is a Tensor?‘,Online: http://www.big-data.tips/what-is-a-tensor
[9] Keras Python Deep Learning Library, Online: https://keras.io/
[10] Jupyther Web page,Online: http://jupyter.org/
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 38 / 40
Lecture Bibliography (2)
[11] Anaconda Web page, Online: https://www.anaconda.com/
[12] MNIST Database Web page,OInline: http://yann.lecun.com/exdb/mnist/
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 39 / 40
Practical Lecture 0.1 – Short Introduction to Python & Jupyther 40 / 40