ethem alpaydin © the mit press, 2010ethem/i2ml2e/2e_v1-0/i2ml2...nonparametric estimation...

ETHEM ALPAYDIN© The MIT Press, 2010

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

Nonparametric Estimation Parametric (single global model), semiparametric (small

number of local models)

Nonparametric: Similar inputs have similar outputs

Functions (pdf, discriminant, regression) change smoothly

Keep the training data;“let the data speak for itself”

Given x, find a small number of closest training instances and interpolate from these

Aka lazy/memory-based/case-based/instance-based learning

3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Given the training set X={xt}t drawn iid from p(x)

Divide data into bins of size h

Histogram:

Naive estimator:

or

Density Estimation

Nh

xxxp

t as bin same the in #ˆ

4

Nh

hxxhxxp

t

2

#ˆ

otherwise

if

0

1211

1

uuw

h

xxw

Nhxp

N

t

t /ˆ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Kernel Estimator

N

t

t

h

xxK

Nhxp

1

1ˆ

7

Kernel function, e.g., Gaussian kernel:

Kernel estimator (Parzen windows)

22

1 2uuK exp


Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width

dk(x), distance to kth closest instance to x

k-Nearest Neighbor Estimator

xNd

kxp

k2ˆ


Kernel density estimator

Multivariate Gaussian kernel

spheric

ellipsoid

Multivariate Data

N

t

t

d hK

Nhp

1

1 xxxˆ

11

uuu

uu

1

212

2

2

1

2

1

22

1

SS

T

d

d

K

K

exp

exp

//


Estimate p(x|Ci) and use Bayes’ rule

Kernel estimator

k-NN estimator

Nonparametric Classification

ti

N

t

t

diii

ii

ti

N

t

t

di

i

rh

KNh

CPCpg

N

NCPr

hK

hNCp

1

1

1

1

xxxx

xxx

ˆ|ˆ

ˆ |ˆ

12

k

k

p

CPCpCP

VN

kCp iii

iki

ii

x

xx

xx

ˆ

ˆ|ˆ|ˆ |ˆ


Condensed Nearest Neighbor

13

Time/space complexity of k-NN is O (N)

Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968)

ZZXXZ ||' EE


Condensed Nearest Neighbor Incremental algorithm: Add instance if needed


Nonparametric Regression

otherwise

withbin same the in is if

where

0

1

1

1

xxxxb

xxb

rxxbxg

tt

N

t

t

tN

t

t

,

,

,ˆ

15

Aka smoothing models

Regressogram


Running Mean/Kernel Smoother Running mean smoother

Running line smoother

Kernel smoother

where K( ) is Gaussian

Additive models (Hastie and Tibshirani, 1990)

18

otherwise

1i f

where

ˆ

0

1

1

1

uuw

h

xxw

rh

xxw

xgN

t

t

tN

t

t

N

t

t

tN

t

t

h

xxK

rh

xxK

xg

1

1

ˆ


How to Choose k or h? When k or h is small, single instances matter; bias is

small, variance is large (undersmoothing): High complexity

As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity

Cross-validation is used to finetune k or h.


ethem alpaydin © the mit press, 2010ethem/i2ml2e/2e_v1-0/i2ml2...nonparametric estimation...

Documents