blinq media praneeth vepakomma senior data scientist

29
BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist Generalization in Supervised Machine Learning

Upload: beate

Post on 23-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Generalization in Supervised Machine Learning. BLiNQ MEDIA Praneeth Vepakomma Senior Data Scientist. Hypothetical Knapsack of Coins:. Copper and Gold Coins Total number of coins is fixed and is a large sample. Capture-Recapture What is the proportion of Gold coins?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

BLiNQ MEDIAPraneeth VepakommaSenior Data Scientist

Generalization in Supervised

Machine Learning

Page 2: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Hypothetical Knapsack of Coins:

Copper and Gold CoinsTotal number of coins is fixed and is a large sample.Capture-RecaptureWhat is the proportion of Gold coins?

Copper and Gold CoinsTotal number of coins is variable and is a large sample.Capture-RecaptureWhat is the proportion of Gold coins?

Page 3: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

BASIC ML/STAT TERMINOLOGY:

Page 4: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

190 Years after Gauss, the core problem of prediction remains an active problem :

Then:

Now:

Page 5: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 6: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

190 Years after Gauss, the core problem of prediction remains an active problem :

Find a mapping♯ from the features:

#Approximation

is a list of parameters, required to represent the function

Page 7: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

ExistingFeatures

KnownLabels

UnavailableFeatures

UnknownLabels

Loss Function

Loss Function

Assumptions

What is Supervised Learning?

Page 8: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Evaluating the Learned Function:

Loss Function quantifies the error in the approximation.

Learn a mapping by optimizing the loss.

Example:

Page 9: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Predictions with varying parameters:

Page 10: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Predictions with varying parameters:

Page 11: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

How do we generalize?

Page 12: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Generalization and Predictability

Empirical Risk Minimization:

True Risk Minimization:

Empirical Risk is the average (expected) loss on seen data.

True Risk is the expected risk on the process generating the X,Y pairs.

Page 13: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 14: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

PARAMETRIC CHARACTERIZATION OF THE MAPPING :

2d-Linear function: Slope, InterceptCubic Spline: Number of knots, Location of KnotsNearest-Neighbor regression: Number of neighborsLasso: L1-L2 WeightsSupport Vector Machines: Kernel width, Margin LengthRandom Forests: Resampling sample size

Page 15: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 16: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 17: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 18: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 19: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Long list of available Supervised Learning Techniques.

Most of the techniques have tuning parameters.

We can minimize out-of-sample performance by tuning the technique with optimal parameters.

Tuning can be performed by cross-validation over a discrete grid of parameter combinations.

Page 20: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist
Page 21: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 22: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 23: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Flat World-10D World:

Page 24: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

CURSE OF DIMENSIONALITY-Let us validate:

Page 25: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Structural Risk Minimization via Regularization:

Page 27: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Brief Description

Technology Overview

Hiring (What we’re looking for)http://blinqmedia.com/contact/job-openings/

Page 28: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Lets work with Abalone

Page 29: BLiNQ  MEDIA Praneeth Vepakomma Senior Data Scientist

Thank You!