azure big data & machine learning matthias gessenay ... · 2 agenda introduction to azure data...
TRANSCRIPT
![Page 1: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/1.jpg)
Azure Big Data & Machine Learning
Matthias Gessenay & Roman A. Kahr
![Page 2: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/2.jpg)
2
Agenda
Introduction to Azure Data Science Tools
Azure Data Lake
Hadoop
Azure Jupyter Notebooks
Azure Machine Learning Studio
Machine Learning
Regression vs. Classification vs. Neural Network
Sampling Probleme
Case: Building predictive Web Service with Azure ML
Data Analysis
Implementation Algorithm & Web Service
![Page 3: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/3.jpg)
3
Matthias Gessenay
Co-Founder Corporate Software AG
Microsoft Professional Program – Data Science, ITIL Expert, MCSA/E/ITP/A/E
Senior Consultant & Trainer
![Page 4: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/4.jpg)
4
Roman A. Kahr
Corporate Software AG
Microsoft Data Science Professional
Consultant & Trainer
![Page 5: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/5.jpg)
5
Corporate Software
Founded 2011 in Biel/Bienne
Microsoft Partner
Gold Cloud Productivity
Gold Collaboration and Content
Gold Project and Portfolio Management
Gold Data Analytics
17 Consultants
![Page 6: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/6.jpg)
6
Azure Data Science Tools – Big Picture
Bring together all your Data
Exploding Data Volumes
Unstructured Data
No bounds – no cost tradeoff
Improve performance
On-prem infrastructure too slow
Difficulty to build a distributed on-prem infrastructure
Cost-intensive
Scalability
![Page 7: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/7.jpg)
7
Data Lake
![Page 8: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/8.jpg)
8
Data Lake
Two Parts:
Data Lake Store
Data Lake Analytics
U-SQL
Similar to T-SQL
Range of extensions: R, Python..
200x more storage
Pay-as-you-go
1TB ~ 35$ p.a
![Page 9: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/9.jpg)
9
HDInsight
Hadoop in common world
Principles:
Split work
Split Data from Analytics
Pros of Azure
Scalability
Pay-as-you-go/cost optimization
Fast deployment
![Page 10: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/10.jpg)
10
Jupyter Notebook
Virtual instance to run R or Python
Nice interface
Highly performant
Perfectly integrated into the Azure ecosystem
Perfect to make presentations of analysis!
Demo (Analyzing the Data)
![Page 11: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/11.jpg)
11
Machine Learning
“Giving the computer the ability to learn without being explicitly programmed”
Supervised learning
Regression
Unsurpervised learning
Clustering, neural nets
Reinforced learning
AlphaGo
![Page 12: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/12.jpg)
12
Machine Learning – Sampling Issue
Inductive
Issue before Data Science: Capacity!
Paid price: inaccuracy
![Page 13: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/13.jpg)
13
Problems by using on-prem solution
Say you have a performant working algorithm
How do you consume the data?
Request/Response API
How are you working with additional data?
How do you manage the costs?
![Page 14: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/14.jpg)
14
Azure Machine Learning Studio
Free (for the moment)
Unlimited computing power
Prewritten modules
Possibility to use R code
Existing API to consume a trained model
Prewritten Web Applications are shared
![Page 15: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/15.jpg)
15
Demo
~2 Mio flights
Arrival/Departure Zurich
Set of attributes
Objective: Train a model to predict if a plane is on time or not and publish the trained model to a web application where the end user can consume this data
![Page 16: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/16.jpg)
16
Logistic Regression
Problem with linear regressions:
Min < 0
Max > 1
Fit is bad!
Solution: Logistic Regression:
Min == 0
Max == 1
Regression vs. Classification?
![Page 17: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning](https://reader030.vdocuments.site/reader030/viewer/2022041023/5ed51a9f30f0eb5025593a3a/html5/thumbnails/17.jpg)
17
Method
Jupyter
Merge and clean data
Transform the data
Analyze
Azure ML
Choose type of machine learning
Build predictive model
Publish Web Service
Build Web application
Outcome: simple Website predicting if a flight is on time or not