machine learning at geeky base 2
TRANSCRIPT
Machine Learning
http://www.bigdata-madesimple.com/
Kan Ouivirach
Geeky Base (2015)
About Me
Research & Development Engineer
www.kanouivirach.com
Kan Ouivirach
Outline
• What is Machine Learning?
• Main Types of Learning
• Model Validation, Selection, and Evaluation
• Applied Machine Learning Process
• Cautions
What is Machine Learning?
–Arthur Samuel (1959)
“Field of study that gives computers the ability to learn without being explicitly programmed.”
–Tom Mitchell (1988)
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with experience E.”
Statistics vs. Data Mining vs. Machine Learning vs. …?
Programming vs. Machine Learning?
Programming
“Given a specification of a function f, implement f that meets the specification.”
Machine Learning
“Given example (x, y) pairs, induce f such that y = f(x) for given pairs and generalizes
well for unseen x”
–Peter Norvig (2014)
http://www.thinkgeek.com/product/f0ba/
What do you see?
11111110 11100101 00001010
While the computer sees this
Machine Learning and Feature Representation
Learning Algorithm
Input
Feature Representation
Dog and Cat?
http://thisvsthatshow.com/
Applications of Machine Learning
• Search Engines
• Medical Diagnosis
• Object Recognition
• Stock Market Analysis
• Credit Card Fraud Detection
• Speech Recognition
• etc.
Recommendation System on Amazon.com
http://www.npr.org/sections/money/2011/11/15/142366953/the-tuesday-podcast-from-harvard-economist-to-casino-ceo
Ceasars Entertainment CorporationGary Loveman
God’s EyeFast & Furious 7
http://www.standbyformindcontrol.com/2015/04/furious-7-gets-completely-untethered/
PREDdictive POLicing - type of crime, place of crime, and time of crime
http://www.predpol.com/
Speech Recognition from Microsoft
Robot Localization
https://github.com/mjl/particle_filter_demo
Machine Learning Tasks
Classification
Regression
Similarity Matching
ClusteringCo-Occurrence Grouping
Profiling
Link Prediction
Data Reduction
Causal Modeling
Main Types of Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning
y = f(x)
Given x, y pairs, find a function f that will map new x to a proper y.
Supervised Learning Problems
• Regression
• Classification
Regression
Linear Regression
y = wx + b
http://thisvsthatshow.com/
Classification
k-Nearest Neighbors
http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/
Perceptron
Processor
Input 0
Input 1
Output
One or more inputs, a processor, and a single output
Perceptron Algorithm
Processor
12
4
Output
0.5
-1
(12 x 0.5) + (4 x -1)
sign(2)
+1
Perceptron’s Goal
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
w0x0 + w1x1
How Perceptron Learning Works
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
Let’s implement k-Nearest Neighbors!
Probability Theoryhttps://seisanshi.wordpress.com/tag/probability/
Calculating Conditional Probability
• Probability that I eat bread for breakfast, P(A), is 0.6.
• Probability that I eat steak for lunch, P(B), is 0.5.
• Given I eat steak for lunch, the probability that I eat bread for breakfast, P(A | B), is 0.7.
• What is P(B | A)?
• What about when A and B are independent?
A2A1 A3 An
Ck
. . .
P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)
P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)
with independence assumption, we then have
Naive Bayes
Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
Naive Bayes
P(Spam | Party, Programming) = P(Spam) * P(Party | Spam) * P(Programming | Spam)
P(NotSpam | Party, Programming) = P(NotSpam) * P(Party | NotSpam) * P(Programming | NotSpam)
We want to find if “Party Programming” is spam or not?
We need to know
P(Spam), P(NotSpam)
P(Party | Spam), P(Party | NotSpam)
P(Programming | Spam), P(Programming | NotSpam)
Naive Bayes
No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No
P(Spam) = ? P(NotSpam) = ?
P(Party | Spam) = ? P(Party | NotSpam) = ?
P(Programming | Spam) = ? P(Programming | NotSpam) = ?
Naive Bayes
No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No
P(Spam) = 3/5 P(NotSpam) = 2/5
P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2
P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2
Naive Bayes
P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0
P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1
P(NotSpam | Party, Programming) > P(Spam | Party, Programming)
“Party Programming” is NOT a spam.
Decision Tree
Outlook
Humidity Wind
SunnyOvercast
Rain
Yes
High Normal Strong Weak
No Yes No Yes
Day Outlook Temp Humidity WInd Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Mild High Strong Yes
D4 Rain Cool Normal Strong No
Play tennis?
Support Vector Machines
x
y
Support Vector Machines
x
y
Current Coordinate System
x
z
New Coordinate System
“Kernel Trick”
Support Vector Machines
http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/
3 support vectors
Unsupervised Learning
f(x)
Given x, find a function f that gives a compact description of x.
Unsupervised Learning
• k-Means Clustering
• Hierarchical Clustering
• Gaussian Mixture Models (GMMs)
k-Means Clustering
http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894
Recommendation
Should I recommend “The Last Which Hunter” to Roofimon? (User-Based)
The Hunger Game
Warcraft The Beginning
The Good Dinosaur
The Last Witch Hunter
Kan 5 4 1 3
Roofimon 5 4 3 ?
Juacompe 1 3 3
John 4 1What should the rating be?
Find the most similar user to Roofimon
Should I recommend “The Last Which Hunter” to Roofimon? (Item-Based)
The Hunger Game
Warcraft The Beginning
The Good Dinosaur
The Last Witch Hunter
Kan 5 4 1 3
Roofimon 5 4 3 ?
Juacompe 1 3 3
John 4 1Find the most similar item to The Last Witch Hunter
What should the rating be?
Should I recommend “The Last Which Hunter” to Roofimon? (Matrix Factorization)
The Hunger Game
Warcraft The Beginning
The Good Dinosaur
The Last Witch Hunter
Roofimon 5 4 3 ?
User Scary Kiddy
Roofimon 2 5
Movie Scary Kiddy
TLWH 3/4 1/4
(2 x 3/4) + (5 x 1/4) = 2.75
Anomaly Detection
http://modernfarmer.com/2013/11/farm-pop-idioms/
http://boxesandarrows.com/designing-screens-using-cores-and-paths/
Let’s try k-Means!
1D k-Means Clustering
• Given these items: {2, 4, 10, 12, 3, 20, 30, 11, 25}
• Given these initial centroids: m1 = 2 and m2 = 4
• Find me the final clusters!
Initialize Assign Update Centroids Converge? Done
Yes
No
Recap: Supervised vs. Unsupervised?
Reinforcement Learning
y = f(x)
Given x and z, find a function f that generates y.
z
Flappy Bird Hack using Reinforcement Learninghttp://sarvagyavaish.github.io/FlappyBirdRL/
Model Validation
I’ve got a perfect classifiers!
https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti
http://blog.csdn.net/love_tea_cat/article/details/25972921
Overfitting (High Variance)
Normal fit Overfitting
http://blog.csdn.net/love_tea_cat/article/details/25972921
Underfitting (High Bias)
Normal fit Underfitting
How to Avoid Overfitting and Underfitting
• Using more data does NOT always help.
• Recommend to
• find a good number of features;
• perform cross validation;
• use regularization when overfitting is found.
Model Selection
Model Selection
• Use cross validation to find the best parameters for the model.
Model Evaluation
Metrics
• Accuracy
• True Positive, False Positive, True Negative, False Negative
• Precision and Recall
• F1 Score
• etc.
Let’s evaluate this Giving Cats system!
Give me cats!
3 True Positives1 False Positive
2 False Negatives
4 True Negatives
System
User
Precision and Recall
http://en.wikipedia.org/wiki/Precision_and_recall
False Positive or False Negative?
Metrics Summary
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
Applied Machine Learning Process
http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/
Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000)
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
Define the Problem
https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/
Prepare Data
http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/
Spot Check Algorithms
https://www.flickr.com/photos/withassociates/4385364607/sizes/l/
If two models fit the data equally well, choose the simpler one.
Present Results
http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/
http://newventurist.com/
• Curse of dimensionality
• Correlation does NOT imply causation.
• Learn many models, not just ONE.
• More data beats a cleaver algorithm.
• Data alone are not enough.
A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)
Some Cautions
–John G. Richardson
“Learning Best Through Experience”
https://studio.azureml.net/
Machine Learning and Feature Representation
Learning Algorithm
Input
— Feature engineering is the key. —
Feature Representation
Garbage In - Garbage Out
http://blog.marksgroup.net/2013/05/zoho-crm-garbage-in-garbage-out-its.html
Example of Feature Engineering
Width (m) Length (m) Cost (baht)
100 100 1,200,000
500 50 1,300,000
100 80 1,000,000
400 100 1,500,000
Are the data good to model the area’s cost?
Size (m x m) Cost (baht)
100,000 1,200,000
25,000 1,300,000
8,000 1,000,000
400,00 1,500,000
Engineer features.
They look better here.
Can we do better?
Deep Learning at Microsoft’s Speech Group
Recommended Books
http://www.barnstable.k12.ma.us/domain/210
https://github.com/zkan/intro-to-machine-learning