machine learning - eclipsepedia › images › 5 › 5f › d4_zimmermann.pdf · q-learning...
TRANSCRIPT
![Page 1: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/1.jpg)
Machine Learning «A Gentle Introduction» @ZimMatthias Matthias Zimmermann BSI Business Systems Integration AG
![Page 2: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/2.jpg)
«So what? IBM’s chess system did this 20 years ago?»
![Page 3: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/3.jpg)
2014 Google buys DeepMind for $660m …
![Page 4: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/4.jpg)
Markov Decision Process Environment (Atari Breakout) Agent performing Actions (Left, Right, Release Ball) State (Bricks, location / direction of ball, …) Rewards (A Brick is hit)
Deep Reinforcement Learning
![Page 5: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/5.jpg)
Q-Learning (simplified)
Markov Decision Process Q(s, a) Highest sum of future Rewards for action a in state s
initialize Q randomly
set initial state s0 repeat
execute a to maximize Q(si, a)
observe r and new state si+1
set Q = update(Q, r, a, si+1)
set si = si+1
until terminated
Deep Reinforcement Learning
![Page 6: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/6.jpg)
Deep Q Learning (DQN) Q Learning Q(s, a) = Deep Neural Network (DNN) Retrain DNN regularly (using it’s own experience)
Deep Reinforcement Learning
Action a Left, Right, Release
DNN Q(s, a)
State s
![Page 7: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/7.jpg)
ML Concepts Demos Recent Advances Food for Thought
… Agenda
![Page 8: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/8.jpg)
Machine Learning Concepts
![Page 9: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/9.jpg)
Data Models Training and Evaluation ML Topics
![Page 10: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/10.jpg)
Challenges Getting the RIGHT data for the task And LOTs of it There is never enough data …
Real World Lessons Data is crucial for successful ML projects Most boring and timeconsuming task Most underestimated task
Getting the Data
![Page 11: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/11.jpg)
Rosemary, Rosmarinus officinalis
Sentiment Analysis
1245 NEGATIVE \ shallow , noisy and pretentious . 14575 POSITIVE \ one of the most splendid entertainments to emerge from the french film industry in years
Iris or Flower set or example for outlier detection?
86211,B,12.18,17.84,77.79, … 862261,B,9.787,19.94,62.11, … 862485,B,11.6,12.84,74.34, … 862548,M,14.42,19.77,94.48, … 862009,B,13.45,18.3,86.6, …
![Page 12: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/12.jpg)
Data Models Training and Evaluation ML Topics
![Page 13: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/13.jpg)
2012, ImageNet, G. Hinton
![Page 14: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/14.jpg)
Data Models Training and Evaluation ML Topics
![Page 15: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/15.jpg)
Training Iterationen
Error Rate
Training Data Test Data
«Underfitting» more training needed
«Overfitting» too much training
![Page 16: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/16.jpg)
Data Models Training and Evaluation ML Topics
![Page 17: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/17.jpg)
Supervised Learning • Learning from Examples • Right Answers are known
Unsupervised Learning • Discover Structure in Data • Anomaly Detection/Data Compression
Reinforcement Learning • Interaction with Dynamic Environment
![Page 18: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/18.jpg)
Demo Time
![Page 19: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/19.jpg)
Demo 1 Unsupervised Learning Anomaly Detection Breast Cancer Data Auto-encoder Demo 2 Supervised Learning Pattern recognition Handwritten character recognition Convolutional neural network Demo 3 Natual Language Processing Happy to show at #EclipseScout Booth
Demos
![Page 20: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/20.jpg)
Data Breast Tissue Features (floats) «12.86,18,83.19,506.3,0.09934, … »
Model Auto-encoder Network Only train on healthy data Use reconstruction error Deeplearning4j Deep Learning Library Open Source (Apache) Java
Anomaly Detection WDBC Wisconsin Diagnostic Breast Cancer Data
86211,B,12.18,17.84,77.79, … 862261,B,9.787,19.94,62.11, … 862485,B,11.6,12.84,74.34, … 862548,M,14.42,19.77,94.48, … 862009,B,13.45,18.3,86.6, …
![Page 21: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/21.jpg)
Auto-Encoder Network
Input Features Reconstructed Features
Reconstruction Error
Core Layer
Compression Layer Expansion Layer
![Page 22: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/22.jpg)
https://github.com/matthiaszimmermann/ml_demo
![Page 23: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/23.jpg)
Data Which digit is this? Collect our own data
Model Deep Neural Network (LeNet-5) Deeplearning4j Deep Learning Library Open Source (Apache) Java
Pattern Recognition Handwritten Digits
![Page 24: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/24.jpg)
0 0.000 1 0.000 2 0.000 3 0.000 4 0.993
5 0.000 6 0.000 7 0.000 8 0.000 9 0.007
Input Image Recognition Results Deep Neural Network
Ou
tpu
t La
yer
De
nse
La
yer
Su
bsa
mp
ling
La
yer
Co
nvo
luti
on
al
La
yer
Hig
hle
vel Fe
atu
res
Inp
ut
La
yer
No
rma
lize
d I
ma
ge
Co
nvo
luti
on
al
La
yer
Lo
wle
vel Fe
atu
res
Su
bsa
mp
ling
La
yer
![Page 25: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/25.jpg)
![Page 26: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/26.jpg)
https://github.com/BSI-Business-Systems-Integration-AG/anagnostes
![Page 27: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/27.jpg)
Recent Advances
![Page 28: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/28.jpg)
2014, Stanford
http://cs.stanford.edu/people/karpathy/deepimagesent/devisagen.pdf
![Page 29: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/29.jpg)
2017, Cornell + Adobe
https://arxiv.org/abs/1703.07511
![Page 30: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/30.jpg)
2016, Google
https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
![Page 31: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/31.jpg)
http://www.graphics.stanford.edu/~niessner/thies2016face.html
2016, Erlangen, Max-
Plank, Stanford
![Page 32: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/32.jpg)
Modern Art «Turing Test» 2017, Reutgers,
Facebook, …
https://www.technologyreview.com/s/608195/machine-creativity-beats-some-modern-art/
![Page 33: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/33.jpg)
2011
2015
2016
2017
2014
ML Libraries
![Page 34: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/34.jpg)
Food for Thought
![Page 35: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/35.jpg)
Games Backgammon 1979, chess 1997, Jeopardy! 2011, Atari games 2014, Go 2016, Poker (Texas Hold’em) 2017 Visual CAPTCHAs 2005, face recognition 2007, traffic sign reading 2011, ImageNet 2015, lip-reading 2016 Other Age estimation from pictures 2013, personality judgement from Facebook «likes» 2014, conversational speech recognition 2016, contemporary art, 2017
ML performance >= Human Levels (2017)
![Page 36: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/36.jpg)
![Page 37: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/37.jpg)
Positive Outcomes Statement by Lee Sedol
![Page 38: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/38.jpg)
Socalizing Go to talks, conferences, Meetups
Increase Context Twitter, Blogs, … arxiv.org (state of the art ML publications)
Doing GitHub (deeplearning4j/deeplearning4j, BSI-Business-Systems-Integration-AG/anagnostes, …) For the latest frameworks: learn Python ;-)
Learn more?
![Page 39: Machine Learning - Eclipsepedia › images › 5 › 5f › D4_Zimmermann.pdf · Q-Learning (simplified) Markov Decision Process Q(s, a) Highest sum of future Rewards for action a](https://reader034.vdocuments.site/reader034/viewer/2022042406/5f20a1003fd600530b59fcbb/html5/thumbnails/39.jpg)
Thanks! @ZimMatthias