machine learning at geeky base 2

Machine Learning

http://www.bigdata-madesimple.com/

Kan Ouivirach

Geeky Base (2015)

http://www.bigdata-madesimple.com/

About Me

Research & Development Engineer

www.kanouivirach.com

Kan Ouivirach

http://www.kanouivirach.com

Outline

• What is Machine Learning?

• Main Types of Learning

• Model Validation, Selection, and Evaluation

• Applied Machine Learning Process

• Cautions

What is Machine Learning?

–Arthur Samuel (1959)

“Field of study that gives computers the ability to learn without being explicitly programmed.”

–Tom Mitchell (1988)

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its

performance at tasks in T, as measured by P, improves with experience E.”

Statistics vs. Data Mining vs. Machine Learning vs. …?

Programming vs. Machine Learning?

Programming

“Given a specification of a function f, implement f that meets the specification.”

Machine Learning

“Given example (x, y) pairs, induce f such that y = f(x) for given pairs and generalizes

well for unseen x”

–Peter Norvig (2014)

Why is Machine Learning so hard?

http://veronicaforand.com/

http://veronicaforand.com/

http://www.thinkgeek.com/product/f0ba/

What do you see?

11111110 11100101 00001010

While the computer sees this

http://www.thinkgeek.com/product/f0ba/

Machine Learning and Feature Representation

Learning Algorithm

Input

Feature Representation

Dog and Cat?

http://thisvsthatshow.com/

Applications of Machine Learning

• Search Engines

• Medical Diagnosis

• Object Recognition

• Stock Market Analysis

• Credit Card Fraud Detection

• Speech Recognition

• etc.

Recommendation System on Amazon.com

http://www.npr.org/sections/money/2011/11/15/142366953/the-tuesday-podcast-from-harvard-economist-to-casino-ceo

Ceasars Entertainment CorporationGary Loveman

http://www.npr.org/sections/money/2011/11/15/142366953/the-tuesday-podcast-from-harvard-economist-to-casino-ceo

God’s EyeFast & Furious 7

http://www.standbyformindcontrol.com/2015/04/furious-7-gets-completely-untethered/

http://www.standbyformindcontrol.com/2015/04/furious-7-gets-completely-untethered/

PREDdictive POLicing - type of crime, place of crime, and time of crime

http://www.predpol.com/

http://www.predpol.com/

Speech Recognition from Microsoft

Robot Localization

https://github.com/mjl/particle_filter_demo

https://github.com/mjl/particle_filter_demo

Machine Learning Tasks

Classification

Regression

Similarity Matching

ClusteringCo-Occurrence Grouping

Profiling

Link Prediction

Data Reduction

Causal Modeling

Main Types of Learning

• Supervised Learning

• Unsupervised Learning

• Reinforcement Learning

Supervised Learning

y = f(x)

Given x, y pairs, find a function f that will map new x to a proper y.

Supervised Learning Problems

• Regression

• Classification

Regression

Linear Regression

y = wx + b

http://thisvsthatshow.com/

Classification

k-Nearest Neighbors

http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

Perceptron

Processor

Input 0

Input 1

Output

One or more inputs, a processor, and a single output

Perceptron Algorithm

Processor

12

4

Output

0.5

-1

(12 x 0.5) + (4 x -1)

sign(2)

+1

Perceptron’s Goal

https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/

w0x0 + w1x1


How Perceptron Learning Works



Let’s implement k-Nearest Neighbors!

Probability Theoryhttps://seisanshi.wordpress.com/tag/probability/

https://seisanshi.wordpress.com/tag/probability/

Calculating Conditional Probability

• Probability that I eat bread for breakfast, P(A), is 0.6.

• Probability that I eat steak for lunch, P(B), is 0.5.

• Given I eat steak for lunch, the probability that I eat bread for breakfast, P(A | B), is 0.7.

• What is P(B | A)?

• What about when A and B are independent?

A2A1 A3 An

Ck

. . .

P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)

P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)

with independence assumption, we then have

Naive Bayes

Naive Bayes

No. Content Spam?

1 Party Yes

2 Sale Discount Yes

3 Party Sale Discount Yes

4 Python Party No

5 Python Programming No

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = ? P(NotSpam) = ?

P(Party | Spam) = ? P(Party | NotSpam) = ?

P(Programming | Spam) = ? P(Programming | NotSpam) = ?

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = 3/5 P(NotSpam) = 2/5

P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2

P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2

Naive Bayes

P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0

P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1

P(NotSpam | Party, Programming) > P(Spam | Party, Programming)

“Party Programming” is NOT a spam.

Decision Tree

Outlook

Humidity Wind

SunnyOvercast

Rain

Yes

High Normal Strong Weak

No Yes No Yes

Day Outlook Temp Humidity WInd Play

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Mild High Strong Yes

D4 Rain Cool Normal Strong No

Play tennis?

Support Vector Machines

x

y


x

y

Current Coordinate System

x

z

New Coordinate System

“Kernel Trick”


http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/

3 support vectors

http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/

Unsupervised Learning

f(x)

Given x, find a function f that gives a compact description of x.

Unsupervised Learning

• k-Means Clustering

• Hierarchical Clustering

• Gaussian Mixture Models (GMMs)

k-Means Clustering

http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

Recommendation

Should I recommend “The Last Which Hunter” to Roofimon? (User-Based)

The Hunger Game

Warcraft The Beginning

The Good Dinosaur

The Last Witch Hunter

Kan 5 4 1 3

Roofimon 5 4 3 ?

Juacompe 1 3 3

John 4 1What should the rating be?

Find the most similar user to Roofimon

Should I recommend “The Last Which Hunter” to Roofimon? (Item-Based)

The Hunger Game


The Good Dinosaur


Kan 5 4 1 3

Roofimon 5 4 3 ?

Juacompe 1 3 3

John 4 1Find the most similar item to The Last Witch Hunter

What should the rating be?

Should I recommend “The Last Which Hunter” to Roofimon? (Matrix Factorization)

The Hunger Game


The Good Dinosaur


Roofimon 5 4 3 ?

User Scary Kiddy

Roofimon 2 5

Movie Scary Kiddy

TLWH 3/4 1/4

(2 x 3/4) + (5 x 1/4) = 2.75

Anomaly Detection

http://modernfarmer.com/2013/11/farm-pop-idioms/

http://modernfarmer.com/2013/11/farm-pop-idioms/

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

Let’s try k-Means!

1D k-Means Clustering

• Given these items: {2, 4, 10, 12, 3, 20, 30, 11, 25}

• Given these initial centroids: m1 = 2 and m2 = 4

• Find me the final clusters!

Initialize Assign Update Centroids Converge? Done

Yes

No

Recap: Supervised vs. Unsupervised?

Reinforcement Learning

y = f(x)

Given x and z, find a function f that generates y.

z

Flappy Bird Hack using Reinforcement Learninghttp://sarvagyavaish.github.io/FlappyBirdRL/

http://sarvagyavaish.github.io/FlappyBirdRL/

Model Validation

I’ve got a perfect classifiers!

https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

http://blog.csdn.net/love_tea_cat/article/details/25972921

Overfitting (High Variance)

Normal fit Overfitting



Underfitting (High Bias)

Normal fit Underfitting


How to Avoid Overfitting and Underfitting

• Using more data does NOT always help.

• Recommend to

• find a good number of features;

• perform cross validation;

• use regularization when overfitting is found.

Model Selection

Model Selection

• Use cross validation to find the best parameters for the model.

Model Evaluation

Metrics

• Accuracy

• True Positive, False Positive, True Negative, False Negative

• Precision and Recall

• F1 Score

• etc.

Let’s evaluate this Giving Cats system!

Give me cats!

3 True Positives1 False Positive

2 False Negatives

4 True Negatives

System

User

Precision and Recall

http://en.wikipedia.org/wiki/Precision_and_recall

http://en.wikipedia.org/wiki/Precision_and_recall

False Positive or False Negative?

Metrics Summary

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Applied Machine Learning Process

http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000)

https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

Define the Problem

https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

Prepare Data

http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/

http://vpnexpress.net/big-data-use-a-vpn-block-data-collection

Spot Check Algorithms

https://www.flickr.com/photos/withassociates/4385364607/sizes/l/

https://www.flickr.com/photos/withassociates/4385364607/sizes/l/

If two models fit the data equally well, choose the simpler one.

Improve Results

http://www.mobilemechanicprosaustin.com/

http://www.mobilemechanicprosaustin.com/

Present Results

http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/

http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/

http://newventurist.com/

• Curse of dimensionality

• Correlation does NOT imply causation.

• Learn many models, not just ONE.

• More data beats a cleaver algorithm.

• Data alone are not enough.

A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)

Some Cautions

http://newventurist.com/

–John G. Richardson

“Learning Best Through Experience”

https://studio.azureml.net/

https://studio.azureml.net/

Machine Learning and Feature Representation

Learning Algorithm

Input

— Feature engineering is the key. —

Feature Representation

Garbage In - Garbage Out

http://blog.marksgroup.net/2013/05/zoho-crm-garbage-in-garbage-out-its.html

http://blog.marksgroup.net/2013/05/zoho-crm-garbage-in-garbage-out-its.html

Example of Feature Engineering

Width (m) Length (m) Cost (baht)

100 100 1,200,000

500 50 1,300,000

100 80 1,000,000

400 100 1,500,000

Are the data good to model the area’s cost?

Size (m x m) Cost (baht)

100,000 1,200,000

25,000 1,300,000

8,000 1,000,000

400,00 1,500,000

Engineer features.

They look better here.

Can we do better?

Deep Learning at Microsoft’s Speech Group

Recommended Books

http://www.barnstable.k12.ma.us/domain/210

http://www.barnstable.k12.ma.us/domain/210

https://github.com/zkan/intro-to-machine-learning

https://github.com/zkan/intro-to-machine-learning

machine learning at geeky base 2

Data & Analytics