deep learning student workshop - delta course€¦ · 5 intel student ambassadors - who are they?...

174
Deep Learning Student workshop September, 2017

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Deep Learning Student workshop

September, 2017

Page 2: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Agenda

⎯ Welcome & Introductions

⎯ Intel® Nervana™ AI Academy for Students

⎯ Intel® & AI

⎯ What is Machine Learning & Data Science

⎯ Deep Learning and Neural Networks

⎯ DL frameworks optimized for IA

Page 3: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

3 3

Questions? Ask us!

BEN odom

Developer Evangelist

[email protected]

BOB DUFFY

Student Ambassador Program Manager

Meghana RaoDeveloper evangelist

[email protected]

[email protected]

Niven SinghAI Student Developer Community Manager

[email protected]

Page 4: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

4

Announcing: Intel® Nervana™ AI ACADEMY for studentsWith the Intel® Nervana™ AI Academy for Students, our goal is to drive awareness of the innovation around AI at the academic level.

We do this by training students on campus and online, and then showcasing and highlighting their expertise, inspiration and innovation, as part of being an Intel Student Ambassadors.

⎯ Educate students, on campus, in person and begin to build a relationship between students, professors, universities and Intel

⎯ Recruit qualified Student Ambassadors

⎯ Support them with IA access and training

⎯ Coach and help them to deliver innovative ideas, expert content and student training to others students

⎯ Showcase examples of early innovation work by students

Page 5: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

5

Intel student ambassadors - Who are they?

They’re just like you!

- Graduate and PhD students who are excited and want to do real work in the field of Deep Learning

- They are subject matter experts, who are going to events like SXSW, SIGSE, PyCon, and on campus to talk about their work

- They are active participants, working on projects, papers, articles – content that has their name on it!

- They are curious and inventive thinkers – trying new things, creating demos and working on REALLY cool stuff to share with the community

Page 6: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

6

Intel student ambassadors – What are they doing?Intel Student Ambassadors are working on innovative, real world, applicable research and projects, like:

- Using smart phone cameras to collect and identify data on harmful vs. not mosquitos

- Leveraging neural networks and deep learning to conduct stock price analysis and predictions

- Enabling individuals with speech impediments to use speech-to-text software to recognize and dictate their speech.

- Using ML & AI to solve medical problems, like disease detection and identifying cures for epidemics. http://devmesh.intel.com

Page 7: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Intel & AI

Page 8: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

libraries Intel® MKL MKL-DNN Intel® MLSL

toolkits

Frameworks

Intel® DAAL

hardwareMemory/Storage NetworkingCompute

Intel Distribution

Mlib BigDL

Intel® Nervana™ Graph*

Intel® Nervana™ PORTFOLIO

experiences

Intel® Nervana™ DL Software &

Cloud

Computer Vision*Future

Intel® DL Training &

Deployment

Intel® Computer Vision SDK

MovidiusFathom

Intel® GO™ Automotive

SDK

*

Page 9: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

9

Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured.

- Wikipedia

What is data science?

Page 10: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

10

Source: https://en.wikipedia.org/wiki/Data_science

The data science process

Page 11: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

NOSQL Passion

Math

Statistics

R, Python, Scala

Communication

Visualization

Domain Knowledge

Machine Learning

Story Teller

Hacker MindsetLove the Data

DEEP Learning

Engage with “C” Level

Neural Networks

11

How to become a data scientist?

Page 12: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

12

Applying Algorithms to observed data and make predictions based on data.

What is machine learning?

Page 13: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

13

Machines Learn in two ways:

Supervised Learning & Unsupervised Learning

Page 14: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

14

Supervised Learning

We train the model. We feed the model with correct answers. Model Learns and finally predicts.

We feed the model with “ground truth”.

Page 15: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

15

Unsupervised Learning

Data is given to the model. Right answers are not provided to the model. The model makes sense of the data given to it.Can teach you something you were probably not aware of IN THE given dataset.

Page 16: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

16

Types of Supervised and Unsupervised learning

Classification

Regression

Clustering

Recommendation

SUPERVISED UNSUPERVISED

Page 17: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

17

CLASSIFICATIONPredict a label for an entity with a given set of features.

SPAM

prediction sentiment analysis

Page 18: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

REGRESSIONPredict a real numeric value for an entity with a given set of features.

18

Price

Address

Type

Age

Parking

School

Transit

Total sqft

Lot Size

Bathrooms

Bedrooms

Yard

Pool

Fireplace

Property attributes

$Linear regression model

sqft

Page 19: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

19

Market Segmentation

Play timein hours

Age

Causal

Gamers

No

Gamers

Serious

Gamers

10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

CLUSTERINGGroup entities with similar features

Page 20: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

20

RECOMMENDATIONRecommend an item to a user based on past behavior or preferences of similar users.

User Info+Your Past Purchase Data+Purchase of other user+Product Info

Recommendation ML Method

Recommendations

ClassifierMatrix

YMAL

Data

Page 21: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

21

Applications of Machine Learning

Fraud Detection

Movie Recommendation

Face Detection

Anomaly Detection

Product Sentiment Analysis

Natural Language Processing

Image Analysis

IoT Analysis

Spam Filtering/Virus Detection

Page 22: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Working with data sets

Page 23: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Machine Learning Vocabulary - How do you read a data set?

Target Predicted category or value of the data (column to predict)

Features properties of the data used for prediction (non-target columns)

Example A single data point within the data (one row)

Label The target value for a single data point

23

Page 24: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

An example data set

24

sepal length sepal width petal length petal width species

6.7 3.0 5.2 2.3 virginica

6.4 2.8 5.6 2.1 virginica

4.6 3.4 1.4 0.3 setosa

6.9 3.1 4.9 1.5 versicolor

4.4 2.9 1.4 0.2 setosa

4.8 3.0 1.4 0.1 setosa

5.9 3.0 5.1 1.8 virginica

5.4 3.9 1.3 0.4 setosa

4.9 3.0 1.4 0.2 setosa

5.4 3.4 1.7 0.2 setosa

Target

Example

Features

Label

Page 25: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

25

Training data set & Validation & Test dataset

If our Dataset is a 100,000 homes sold in Portland a typical split would be:

Train = 70,000 Homes

Validation = 10,000 Homes

Test = 20,000 Homes

Page 26: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Setting up your environment

Page 27: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

27

What is in a Basic Data Science Toolkit

Page 28: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

28

Intel® distribution of python* 2017

Page 29: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

1. Install Anaconda https://www.continuum.io/downloads#linux

2. Choose Intel Packages: conda config --add channels intel

3. Create the environment: conda create –n intelpython3 intelpython3_full python=3

4. Activate the environment: source activate intelpython3

5. Run the jupyter notebook: jupyter notebook --no-browser (only use no browser if running remotely or using BASH on windows)

6. Access the notebook: http://localhost:8888

29

6 Steps to Jupyter Notebook with Intel Distribution of Python

Page 30: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

linear regression

Page 31: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

31

Introduction to Linear Regression

𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥

0.0

1.0

2.0

x108

1.0 2.0

Budget

Bo

x O

ffic

e

x108

Page 32: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

32

Introduction to Linear Regression

𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥

0.0

1.0

2.0

x108

1.0 2.0

Budget

Bo

x O

ffic

e

coefficient

0

box

office

revenue

movie

budgetcoefficient

1x108

Page 33: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

33

Introduction to Linear Regression

𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

𝛽0= 80 million, 𝛽1= 0.6

Page 34: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

34

Predicting from Linear Regression

𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

𝛽0= 80 million, 𝛽1= 0.6

Predict 175 Million Gross for

160 Million Budget

Page 35: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

35

Which Model Fits the Best?

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

Page 36: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

36

Calculating the Residuals

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

Predicted

value

Observe

d value

𝑦𝛽 𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

Page 37: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

37

Calculating the Residuals

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

Page 38: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

38

Mean Squared Error

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

1

𝑚

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2

Page 39: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

39

Minimum Mean Squared Error

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

min𝛽0,𝛽1

1

𝑚

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2

Page 40: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

40

Cost Function

0.0

1.0

2.0

1.0 2.0

Budget

Bo

x O

ffic

e

x108

x108

𝐽 𝛽0, 𝛽1 =1

2𝑚

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2

Page 41: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to
Page 42: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

42

Gradient DescentStart with a cost function J(𝛽):

𝑱 𝜷

𝜷

Page 43: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

43

Gradient DescentStart with a cost function J(𝛽):

𝑱 𝜷

𝜷

Then gradually move towards the minimum.

Global Minimum

Page 44: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Now imagine there are two parameters

(𝛽0, 𝛽1)

44

Gradient Descent with Linear Regression

Page 45: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Now imagine there are two parameters (𝛽0, 𝛽1)

This is a more complicated surface on which the minimum must be found

45

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

Page 46: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Now imagine there are two parameters (𝛽0, 𝛽1)

This is a more complicated surface on which the minimum must be found

How can we do this without knowing what 𝐽(𝛽0, 𝛽1) looks like?

46

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

Page 47: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Compute the gradient, 𝛻𝐽(𝛽0, 𝛽1), which points in the direction of the biggest increase!

-𝛻𝐽(𝛽0, 𝛽1)(negative gradient) points to the biggest decrease at that point!

47

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

Page 48: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

The gradient is the a vector whose coordinates consist of the partial derivatives of the parameters

48

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

𝛻𝐽 𝛽0, … , 𝛽𝑛 = <𝜕𝐽

𝜕𝛽0, … ,

𝜕𝐽

𝜕𝛽𝑛>

Page 49: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Then use the gradient (𝛻) and the cost function to calculate the next point (𝜔_1) from the current one (𝜔_0):

49

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

𝜔1 = 𝜔0 − 𝛼𝛻1

2

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2 𝜔0

𝜔1

Page 50: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Then use the gradient (𝛻) and the cost function to calculate the next point (𝜔_1) from the current one (𝜔_0):

The learning rate (𝛼) is a tunable parameter that determines step size

50

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

𝜔1 = 𝜔0 − 𝛼𝛻1

2

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2 𝜔0

𝜔1

Page 51: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Each point can be iteratively calculated from the previous one

51

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

𝜔2 = 𝜔1 − 𝛼𝛻1

2

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2 𝜔0

𝜔1

𝜔2

Page 52: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Each point can be iteratively calculated from the previous one

52

Gradient Descent with Linear Regression

𝐽 𝛽0, 𝛽1

𝛽1 𝛽0

𝜔0

𝜔1𝜔2 = 𝜔1 − 𝛼𝛻

1

2

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2

𝜔2

𝜔3 = 𝜔2 − 𝛼𝛻1

2

𝑖=1

𝑚

𝛽0 + 𝛽1𝑥𝑜𝑏𝑠(𝑖)

− 𝑦𝑜𝑏𝑠(𝑖)

2 𝜔3

Page 53: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

53

Modelling Best Practice

Use cost function to fit model

Develop multiple models

Compare results and choose best one

Page 54: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

k nearest neighbors

Page 55: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

55

K Nearest Neighbors Classification

Survived

Did not survive

Number of Malignant Nodes

0

Age

60

40

20

10 20

Page 56: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

56

K Nearest Neighbors Classification

Number of Malignant Nodes

0

Age

60

40

20

10 20

Predict

Page 57: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

57

K Nearest Neighbors Classification

0

1

Neighbor Count (K = 1):

Number of Malignant Nodes

0

Age

60

40

20

10 20

Predict

Page 58: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

58

K Nearest Neighbors Classification

1

1

Neighbor Count (K = 2):

Number of Malignant Nodes

0

Age

60

40

20

10 20

Predict

Page 59: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

59

K Nearest Neighbors Classification

Number of Malignant Nodes

2

1

Neighbor Count (K = 3):

0

Age

60

40

20

10 20

Predict

Page 60: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

60

K Nearest Neighbors Classification

Number of Malignant Nodes

0

Age

60

40

20

10 20

3

1

Predict

Neighbor Count (K = 4):

Page 61: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Correct value for 'K'

How to measure closeness of neighbors?

61

What is Needed to Select a KNN Model?

Number of Malignant Nodes

0

Age

60

40

20

10 20

Page 62: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

62

Value of 'K' Affects Decision Boundary

Number of Malignant Nodes

K=1

0 10 20

60

40

20

0

60

40

20

10 20

Number of Malignant Nodes

K=All

Page 63: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

63

Measurement of Distance in KNN

Number of Malignant Nodes

0

Age

60

40

20

10 20

Page 64: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

64

Measurement of Distance in KNN

Number of Malignant Nodes

0

Age

60

40

20

10 20

Page 65: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

65

Euclidean Distance

Number of Malignant Nodes

0

Age

60

40

20

10 20

Page 66: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

66

Euclidean Distance (L2 Distance)

Number of Malignant Nodes

0

Age

60

40

20

10 20

𝑑 = ∆𝑁𝑜𝑑𝑒𝑠2 + ∆𝐴𝑔𝑒2

∆ Age

d

∆ Nodes

Page 67: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

67

Manhattan Distance (L1 or City Block Distance)

Number of Malignant Nodes

0

Age

60

40

20

10 20

∆ Age

∆ Nodes 𝑑 = ∆𝑁𝑜𝑑𝑒𝑠 + ∆𝐴𝑔𝑒

Page 68: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

68

Scale is Important for Distance Measurement

Number of Surgeries

12345

Age

60

40

20

Page 69: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

69

Scale is Important for Distance Measurement

12345

Number of Surgeries

Age

60

40

20

24

22

20

18

Page 70: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

70

Scale is Important for Distance Measurement

Number of Surgeries

12345

Age

60

40

20

24

22

20

18

Nearest Neighbors!

Page 71: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

"Feature Scaling"

71

Scale is Important for Distance Measurement

1 4 53

Number of Surgeries

0

Age

60

40

20

2

Page 72: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

"Feature Scaling"

72

Scale is Important for Distance Measurement

1 4 53

Number of Surgeries

0

Age

60

40

20

2

Page 73: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

"Feature Scaling"

73

Scale is Important for Distance Measurement

1 4 53

Number of Surgeries

0

Age

60

40

20

2

Nearest Neighbors!

Page 74: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

74

Performance comparison - Linear Regression and KNN

K nearest neighborsLinear regression

Fitting involves minimizing cost function (slow)

Model has few parameters (memory efficient)

Prediction involves calculation (fast)

Fitting involves storing training data (fast)

Model has many parameters (memory intensive)

Prediction involves finding closest neighbors (slow)

Page 75: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

75

what is the issue with linear classifiers we have learnt so far?

XORThe counter

example to all models

We need non-linear functions

X1 X2

0 0 0

y

0 1 1

1 0 1

1 1 0

0

X1

X2

0

1

1

Source: https://medium.com/towards-data-science/introducing-deep-learning-and-neural-networks-deep-learning-for-rookies-1-bd68f9cf5883

Page 76: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

76

We need layers Usually lots with Non-Linear TransformationsXOR = X1 and not X2 OR Not X1 and X2

1.5 0.5

Input

Input

+1

+1

+1

+1

-2Output

Threshold to 0 or 1

X1 X2

0 0 0

y

0 1 1

1 0 1

1 1 0

Page 77: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

77

This is a brewing domain called Deep Learning In the machine learning world, we use neural networks. The idea comes from biology. Each layer learns something.

Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations.

--Wikipedia

Page 78: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Layer 1 Layer 2 Layer N Prediction

78

Each layer learns something

Elephant

Faces

Cars

Elephants

Chairs

FullyConnected

layer

Page 79: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

What is deep learning good for?

Page 80: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

80

Classification And DETECTION

Detect and label the image

Person

Motorcyclist

Bike

https://people.eecs.berkeley.edu/~jhoffman/talks/lsda-baylearn2014.pdf

Page 81: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

https://people.eecs.berkeley.edu/~jhoffman/talks/lsda-baylearn2014.pdf

81

Semantic Segmentation

Label every pixel

Page 82: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

http://arxiv.org/pdf/1511.04164v3.pdf

82

Natural Language Object Retrieval

Page 83: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

The same architecture is used for English and Mandarin Chinese speech recognition

http://svail.github.io/mandarin/

83

Speech Recognition

Page 84: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

The basics of building a neural network

Page 85: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Motivation for Neural Nets• Use biology as inspiration for mathematical model

• Get signals from previous neurons

• Generate signals (or not) according to inputs

• Pass signals on to next neurons

• By layering many neurons, can create complex model

Page 86: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

bw3

Basic Neuron Visualization

activationfunction

x1

x2

x3

w1

w2

z = x1w1+ x2w2+ x3w3+b

f(z)

1

Page 87: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

87

• Sigmoid function

• Smooth transition in output between (0,1)

• Tanh function

• Smooth transition in output between (-1,1)

• ReLU function

• f(x) = max(x,0)

• Step function

• f(x) = (0,1)

Types of activation functions

Page 88: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Why Neural Nets?• Why not just use a single neuron? Why do we need a larger

network?• A single neuron (like logistic regression) only permits a linear

decision boundary.• Most real-world problems are considerably more complicated!

Page 89: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Feedforward Neural Network

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

Page 90: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Weights

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

Page 91: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Input Layer

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

Page 92: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Hidden Layers

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

Page 93: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Output Layer

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

Page 94: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Weights (represented by matrices)

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑊(1) 𝑊(2) 𝑊(3)

Page 95: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Net Input (sum of weighted inputs, before activation function)

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑧(2) 𝑧(3)

𝑧(4)

Page 96: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Activations (output of neurons to next layer)

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑎(1)𝑎(2) 𝑎(3)

𝑎(4)

Page 97: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Matrix representation of computation

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝑧(2) = 𝑥𝑊(1)

𝑎(2) = 𝜎(𝑧 2 )

𝑥 = 𝑥1, 𝑥2, 𝑥3

(𝑥 = 𝑎(1))

𝑧(2)

𝑊(1)

𝑎(2)

𝑊(1) is a

3x4 matrix

𝑧(2) is a

4-vector

For a single data point (instance)

𝑎(2) is a

4-vector

Page 98: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Continuing the Computation

For a single training instance (data point)

Input: vector x (a row vector of length 3)Output: vector 𝑦 (a row vector of length 3)

𝑧(2) = 𝑥𝑊(1) 𝑎(2) = 𝜎(𝑧 2 )

𝑧(3) = 𝑎(2)𝑊(2) 𝑎(3) = 𝜎(𝑧 3 )

𝑧(4) = 𝑎(3)𝑊(3) 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧 4 )

Page 99: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Multiple data pointsIn practice, we do these computation for many data points at the same time, by “stacking” the rows into a matrix. But the equations look the same!

Input: matrix x (an nx3 matrix) (each row a single instance)Output: vector 𝑦 (an nx3 matrix) (each row a single prediction)

𝑧(2) = 𝑥𝑊(1) 𝑎(2) = 𝜎(𝑧 2 )

𝑧(3) = 𝑎(2)𝑊(2) 𝑎(3) = 𝜎(𝑧 3 )

𝑧(4) = 𝑎(3)𝑊(3) 𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧 4 )

Page 100: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

How to Train a Neural Net?

Input(Feature Vector)

Output(Label)

• Put in Training inputs, get the output• Compare output to correct answers: Look at loss function J• Adjust and repeat!

• Backpropagation tells us how to make a single adjustment using calculus.

Page 101: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Using Gradient Descent

1. Make prediction2. Calculate Loss3. Calculate gradient of the loss function w.r.t. parameters4. Update parameters by taking a step in the opposite direction5. Iterate

Page 102: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

𝑦1

Calculate the loss function

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑦2

𝑦3

Evaluate:𝐽 𝑦𝑖 , 𝑦𝑖

Page 103: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Chain Rule

𝜕𝐽

𝜕𝑊(2)= ( 𝑦 − 𝑦) ⋅ 𝑊 3 ⋅ 𝜎′ 𝑧(3) ⋅ 𝑎(2)

𝜕𝐽

𝜕𝑊(1)= 𝑦 − 𝑦 ⋅ 𝑊 3 ⋅ 𝜎′ 𝑧(3) ⋅ 𝑊 2 ⋅ 𝜎′ 𝑧 2 ⋅ 𝑋

𝜕𝐽

𝜕𝑊(3)= ( 𝑦 − 𝑦) ⋅ 𝑎(3)

• Recall that: 𝜎′ 𝑧 = 𝜎(𝑧)(1 − 𝜎(𝑧))• Though they appear complex, above are easy to compute!

Page 104: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

𝑦1

Backpropagation

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑦2

𝑦3

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊𝑘

𝑊(1) 𝑊(2) 𝑊(3)Want:

Page 105: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

𝑦1

Backpropagation

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑦2

𝑦3

𝑊(1) 𝑊(2) 𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊3

Page 106: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

𝑦1

Backpropagation

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑦2

𝑦3

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊3

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊2

𝑊(1)

Page 107: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

𝑦1

Backpropagation

𝑥1

𝑥2

𝑥3𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝜎

𝑦1

𝑦2

𝑦3

𝑦2

𝑦3

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊3

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊2

𝜕𝐽 𝑦𝑖 , 𝑦𝑖𝜕𝑊1

Page 108: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

108

What we have learnt so far

• Nomenclature required to build a NN

• Input, hidden, output layers

• Weights, activation

• Backpropagation using gradient descent

• Representing it all using matrices

Page 109: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Convolutional neural network

Page 110: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Convolutional Neural Nets

Primary Ideas behind Convolutional Neural Networks:

• Let the Neural Network learn which kernels are most useful• Use same set of kernels across entire image (translation invariance)• Reduces number of parameters and “variance” (from bias-variance point

of view)

Page 111: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Kernels as Feature Detectors

Can think of kernels as a ”local feature detectors”

Vertical Line Detector

-1 1 -1

-1 1 -1

-1 1 -1

Horizontal Line Detector

-1 -1 -1

1 1 1

-1 -1 -1

Corner Detector

-1 -1 -1

-1 1 1

-1 1 1

Page 112: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Without Padding, we lose data at the edges

Page 113: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Padding the input data

Page 114: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Pooling: Max-pool• For each distinct patch, represent it by the maximum

• 2x2 maxpool shown below

Page 115: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

115

CNN for Digit recognition

Page 116: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Source: http://cs231n.github.io/

116

Convolutional Neural Networks (CNN) for Image Recognition

Page 117: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

LeNet-5

How many total weights in the network?

Conv1: 1*6*5*5 + 6 = 156Conv3: 6*16*5*5 + 16 = 2416FC1: 400*120 + 120 = 48120FC2: 120*84 + 84 = 10164FC3: 84*10 + 10 = 850Total: = 61706

Less than a single FC layer with [1200x1200] weights!Note that Convolutional Layers have relatively few weights.

Page 118: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Differences between CNN and fully connected networks

Page 119: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

119

CONVOLUTIONAL NEURAL NETWORK FULLY CONNECTED NEURAL NETWORKS• Each neuron connected to a small set of

nearby neurons in the previous layer• Uses same set of weights for each neuron• Ideal for spatial feature recognition, Ex:

Image recognition• Cheaper on resources due to fewer

connections

• Each neuron is connected to every neuron in the previous layer

• Every connection has a separate weight• Not optimal for detecting features• Computationally intensive – heavy

memory usage

Page 120: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Network architectures

Page 121: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

AlexNet - Model Diagram

Page 122: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG16 Diagram

Page 123: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 124: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 125: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Layer 1 Layer 2 Layer 3

We can say that the “receptive field” of Layer 2 is 3x3

Each output has been influenced by a 3x3 patch of inputs

Page 126: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

What about on Layer 3?

Page 127: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

This output on Layer 3 uses a 3x3 patch from Layer 2

How much from Layer 1 does it use?

Page 128: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 129: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 130: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 131: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 132: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 133: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 134: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 135: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 136: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Page 137: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

Layer 1 Layer 2 Layer 3

(Input)

Each square in Layer 3 “sees” a 5x5 grid from Layer 1

Page 138: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

VGG

3 × 3 × 𝐶 × 𝐶 = 9𝐶2 7 × 7 × 𝐶 × 𝐶 = 49𝐶2One 3x3 layer One 7x7 layer

3 × (9𝐶2) = 27𝐶2Three 3x3 layers

49𝐶2 27𝐶2 ≈45% reduction!

Two 3x3, stride 1 convolutions in a row one 5x5

Three 3x3 convolutions one 7x7 convolution

Benefit: fewer parameters

Page 139: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Inception V3 schematic

Page 140: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Inception

This whole “block” serves

the function of a previous

convolutional layer.

Page 141: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

ResNet

• Add previous layer back in to current layer!• Similar idea to “boosting”

Page 142: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

examples

Page 143: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

143

Unattended baggage detection using Intel® optimized caffe*

Source: https://software.intel.com/en-us/articles/unattended-baggage-detection-using-deep-neural-networks-in-intel-architecture

Page 144: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

why ARE Deep Neural Networks called “Deep”?

Source: https://research.facebook.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/

144

Page 145: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Example of CNN topologies

11/9/2017 Intel Confidential

GoogLeNet (2014)ConvolutionPoolingSoftmaxOther

Source: Google white paper and Krizhevsky et al.

145

Page 146: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

146

Diagnosis of heart disease using CNNs

Source: http://cs231n.stanford.edu/reports2016/331_Report.pdf

Using 30 MRIs during one cardiac cycle from different axis viewsto predict VS and VD

Page 147: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

147

Diabetic Retinopathy diagnosis A Kaggle competition solution from deepsense.io

Images from EyePACS

Source: https://deepsense.io/diagnosing-diabetic-retinopathy-with-deep-learning/

Page 148: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Intel® NERVANA™ AI PORTFOLIO

Page 149: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

libraries Intel® MKL MKL-DNN Intel® MLSL

toolkits

Frameworks

Intel® DAAL

hardwareMemory/Storage NetworkingCompute

Intel Distribution

Mlib BigDL

Intel® Nervana™ Graph*

Intel® Nervana™ PORTFOLIO

experiences

Intel® Nervana™ DL Software &

Cloud

Computer Vision*Future

Intel® DL Training &

Deployment

Intel® Computer Vision SDK

MovidiusFathom

Intel® GO™ Automotive

SDK

*

Page 150: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Batch Many batch modelsTrain machine learning models across a

diverse set of dense and sparse dataTrain large deep neural networks

Train large models as fast as possible

LAKECREST

Stream EdgeInfer billions of data samples at a time

and feed applications within ~1 dayInfer deep data streams with low latency in order to take action within milliseconds

Power-constrained environments

Training

inference

Batch

or other Intel® edge processor

OR OR

OR

Option for higher throughput/watt

*Future*

Required for lower latency

AI silicon positioning

OR

Page 151: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

151

Intel® Movidius™ Neural Compute Stick

Get started: https://developer.movidius.com/

Page 152: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

• Nervana Cloud Build an AI POC

• neon Train DL models quickly

• Intel Nervana Graph any framework, any hardware

• Intel Nervana HW industry leading AI, coming soon

“deep learning by design”

neon

deep learning

framework

Intel® Nervana™ Full stack platform

Page 153: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Multi-user collaboration

Interactive sessions

Model library

Fast training

Batch training

Experiment tracking

Multi-node distribution

Analytics & visualization

Hyperparameter optimization

Batch inference

Model compression

Inference deployment

Export to edge devices

Data curation/processing

Data partitioning

Data labeling

Accelerate time-to-solution by compressing both compute and labor-intensive steps in the innovation cycle to deliver scalable end-to-end AI solutions

Intel® Nervana™ Deep Learning Software

Page 154: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

154

Intel® distribution of python* 2017

Page 155: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

DL Framework Optimized for IA:

Tensorflow

Page 156: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Coarse-Grained / multi-node

Domain decomposition

156

Performance Optimization on Modern Platforms

Utilize all the cores

OpenMP, MPI, TBB…

Reduce synchronization events, serial code

Improve load balancing

Vectorize/SIMD

Unit strided access per SIMD lane

High vector efficiency

Data alignment

Efficient memory/cache use

Blocking

Data reuse

Prefetching

Memory allocation

Hierarchical Parallelism

Fine-Grained Parallelism / within node Sub-domain: 1) Multi-level domain decomposition (ex. across layers)

2) Data decomposition (layer parallelism)

Scaling

Improve load balancing

Reduce synchronization events, all-to-all comms

Page 157: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

157

Example Challenge 1: Data Layout Has Big Impact on Performance• Data Layouts impacts performance

• Sequential access to avoid gather/scatter• Have iterations in inner most loop to ensure high vector utilization• Maximize data reuse; e.g. weights in a convolution layer

• Converting to/from optimized Layout is some times less expensive than operating on unoptimized Layout

21 18 32 6 3

1 8 0 3 26

40 9 22 76 81

23 44 81 32 11

5 38 10 11 1

8 92 37 29 44

11 9 22 3 26

3 47 29 88 1

15 16 22 46 12

29 9 13 11 1 21 8 18 92 .. 1 11 ..

21 18 … 1 .. 8 92 ..

Better optimized for some operations

vs

Page 158: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

158

• End to end optimization can reduce conversions• Staying in optimized layout as long as possible becomes

one of the tuning goals • Minimize the number of back and forth conversions

• Use of graph optimization techniques

Convolution ConvolutionMax PoolNative to MKL layout

MKL layout to Native

Native to MKL layout

MKL layout to Native

Example Challenge 2: Minimize Conversions Overhead

Page 159: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

159

Optimizing TensorFlow & Other DL Frameworks for Intel® Architecture • Leverage high performant compute libraries and tools

• e.g. Intel® Math Kernel Library, Intel® Python, Intel® Compiler etc.• Data Format/Shape:

• Right format/shape for max performance: blocking, gather/scatter• Data Layout:

• Minimize cost of data layout conversions • Parallelism:

• Use all cores, eliminate serial sections, load imbalance• Memory allocation

• unique characteristics and ability to reuse buffers• Data layer optimizations:

• parallelization, vectorization, IO• Optimize hyper parameters:

• e.g. batch size for more parallelism• learning rate and optimizer to ensure accuracy/convergence

Page 160: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

160

Benchmark MetricBatch

Size

Baseline

Performance

Training

Baseline

Perf

Inference

Optimized

Perf

Training

Optimized

Perf

Inference

Speedup

Training

Speedup

Inference

ConvNet-

Alexnet

Images

/ sec 12833.52 84.2

5241696

15.6x 20.2xConvNet-

GoogleNet

v1

Images

/ sec 12816.87 49.9

112.3439.7

6.7x 8.8x

ConvNet-

VGG

Images

/ sec64 8.2 30.7 47.1 151.1 5.7x 4.9x

• Baseline using TensorFlow 1.0 release with standard compiler knobs

• Optimized performance using TensorFlow with Intel optimizations and built with

• bazel build --config=mkl --copt=”-DEIGEN_USE_VML”

Initial Performance Gains on Modern Xeon (2 Sockets Broadwell - 22 Cores)

Page 161: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

161

Benchmark MetricBatch

Size

Baseline

Performance

Training

Baseline

Perf

Inference

Optimized

Perf

Training

Optimized

Perf

Inference

Speedup

Training

Speedup

Inference

ConvNet-

Alexnet

Images

/ sec 12812.21 31.3

549 2698.3 45x 86.2xConvNet-

GoogleNet

v1

Images

/ sec 1285.43 10.9

106 576.6 19.5x 53x

ConvNet-

VGG

Images

/ sec64 1.59 24.6 69.4 251 43.6x 10.2x

• Baseline using TensorFlow 1.0 release with standard compiler knobs

• Optimized performance using TensorFlow with Intel optimizations and built with

• bazel build --config=mkl --copt=”-DEIGEN_USE_VML”

Initial Performance Gains on Modern Xeon Phi (Knights Landing – 68 Cores)

Page 162: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

162

• Data format: CPU prefers NCHW data format• Intra_op, inter_op and OMP_NUM_THREADS: set for best core utilization• Batch size: higher batch size provides for better parallelism

• Too high a batch size can increase working set and impact cache/memory perf

Benchmark Data Format Inter_op Intra_op KMP_BLOCKTIME Batch size

ConvNet- AlexnetNet NCHW 1 44 30 2048

ConvNet-Googlenet V1 NCHW 2 44 1 256

ConvNet-VGG NCHW 1 44 1 128

Best Setting for Xeon (Broadwell – 2 Socket – 44 Cores)

BenchmarkData

Format

Inter_

opIntra_op

KMP_BLOCKTI

ME

OMP_NUM_

THREADSBatch size

ConvNet- AlexnetNet NCHW 1 68 30 136 2048

ConvNet-Googlenet V1 NCHW 2 68 1 68 256

ConvNet-VGG NCHW 1 68 1 136 128

Best Setting for Xeon Phi (Knights Landing – 68 Cores)

Additional Performance Gains from Parameters Tuning

Page 163: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Q&A

Page 164: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Social Media & SurveyPrize Winners

Page 165: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Want to learn more?Check out the

Intel® Nervana™ AI Academy for students

software.intel.com/AIStudents

Page 166: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

backup

Page 167: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

Intel tools and libraries

Page 168: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

168

Intel® Distribution for Python*• Ready access to set of tools and techniques for high performance on Intel®

Architecture

• Accelerated Python packages - NumPy, SciPy, pandas, scikit-learn, Jupyter, matplotlib, and mpi4py

• Integrated with Intel® Math Kernel Library (Intel® MKL), Intel® Data Analytics Acceleration Library (Intel® DAAL) and pyDAAL, Intel® MPI Library, and Intel® Threading Building Blocks (Intel® TBB)

• Get out-of-the-box performance that is closer to native code speeds.

• Speed up data analytics with pyDAAL and parallelize Python workloads.

• Manage packages and Jupyter Notebooks easily with conda, Anaconda Cloud, and PIP.

Learn more: https://software.intel.com/en-us/intel-distribution-for-python

Page 169: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

169

Intel® Math Kernel Library (MKL)

• Features highly optimized, threaded and vectorized functions to maximize performance on Intel® Architecture and compatible processors

• Linear Algebra, Fast Fourier Transforms (FFT), Neural Network, Vector Math and Statistics functions

• Standard APIs for immediate performance results

• Utilizes de facto standard C and Fortran APIs for compatibility with BLAS, LAPACK and FFTW functions from other math libraries

• Available with both free community-supported and paid support licenses

Learn more: https://software.intel.com/en-us/intel-mkl

Page 170: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

170

Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)

• A library of DNN performance primitives optimized for Intel architectures

• A set of highly optimized building blocks intended to accelerate compute-intensive parts of deep learning applications, particularly DNN frameworks such as Caffe, Tensorflow, Theano and Torch

• Distributed as source code through GitHub

• Implemented in C++ and provides both C++ and C APIs

• Allows the functionality to be used from a wide range of high-level languages, such as Python or Java

Learn more: https://01.org/mkl-dnn/overview

Page 171: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

171

Intel® Data Analytics Acceleration Library (Intel® DAAL)• Features highly tuned functions for deep learning, classical machine learning,

and data analytics performance across spectrum of Intel® architecture devices

• Intel® DAAL addresses all stages of the Big Data Ecosystem

• Includes Python*, C++, and Java* APIs and connectors to popular data sources including Spark* and Hadoop*

• Free and open source community-supported versions are available, as well as paid versions that include premium support.

Learn more: https://software.intel.com/en-us/intel-daal

Page 172: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

172

Intel® Machine Learning Scaling Library for Linux* OS

• A library providing an efficient implementation of communication patterns used in deep learning.

• Built on top of MPI, allows for use of other communication libraries

• Optimized to drive scalability of communication patterns

• Works across various interconnects: Intel(R) Omni-Path Architecture, InfiniBand*, and Ethernet

• Common API to support Deep Learning frameworks (Caffe*, Theano*, Torch*, etc.)

Learn more: https://github.com/01org/MLSL

Page 173: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

173

BigDL: Distributed Deep Learning Library for Apache Spark*

• Write deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters

• Rich deep learning support - numeric computing (via Tensor) and high level neural networks; load pre-trained Caffe or Torch models into Spark programs using BigDL

• Extremely high performance - uses Intel® MKL and multi-threaded programming in each Spark task

• Efficiently scale-out to “Big Data Scale” using Apache Spark

Learn more: https://github.com/intel-analytics/BigDL

Page 174: Deep Learning Student workshop - Delta Course€¦ · 5 Intel student ambassadors - Who are they? They’re just like you! - Graduate and PhD students who are excited and want to

174

Trusted analytics platform• Facilitates data ingestion, preparation, and analysis with parallel processing

and distributed analytics.

• The software leverages Apache Spark*, Intel® Data Analytics Acceleration Library, and Intel® Math Kernel Library for optimized distributed analytics and parallel processing on Intel® processors.

• Accelerates the modeling process with Intel optimized computational machine-learning and deep-learning algorithms, as well as graph operations, scoring engine, and pipelines.

• Integrates with industry-leading software frameworks such as Apache Spark, TensorFlow*, and Superset to expedite application development and enable deep-learning and visualization techniques.

Learn more: https://software.intel.com/en-us/bigdata/tap