radial basis functions lecture 18 learning from data

8
Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k-RBF-Network M. Magdon-Ismail CSCI 4100/6100 recap: Data Condensation and Nearest Neighbor Search Training Set Consistent −−−→ S 1 S 2 x Branch and bound for finding nearest neighbors. Lloyd’s algorithm for finding a good clustering. c AM L Creator: Malik Magdon-Ismail Radial Basis Functions:2 /31 RBF vs. k-NN −→ Radial Basis Functions (RBF) k-Nearest Neighbor: Only considers k-nearest neighbors. each neighbor has equal weight What about using all data to compute g(x)? RBF: Use all data. data further away from x have less weight. c AM L Creator: Malik Magdon-Ismail Radial Basis Functions:3 /31 Weighting data points −→ Weighting the Data Points: α n Test point x. α n : the weight of x n in g(x). α n (x)= φ || x x n || r decreasing function of || x xn || Most popular kernel: Gaussian φ(z)= e 1 2 z 2 . Window kernel, mimics k-NN, φ(z)= 1 z 1, 0 z> 1, c AM L Creator: Malik Magdon-Ismail Radial Basis Functions:4 /31 Weighting depends on distance −→

Upload: others

Post on 02-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Radial Basis Functions Lecture 18 Learning From Data

Learning From DataLecture 18

Radial Basis Functions

Non-Parametric RBF

Parametric RBFk-RBF-Network

M. Magdon-IsmailCSCI 4100/6100

recap: Data Condensation and Nearest Neighbor Search

Training Set Consistent

−−−→

S1

S2

x

Branch and bound for finding

nearest neighbors.

Lloyd’s algorithm for finding agood clustering.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 2 /31 RBF vs. k-NN −→

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.each neighbor has equal weight

What about using all data to compute g(x)?

RBF: Use all data.data further away from x have less weight.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 3 /31 Weighting data points −→

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

decreasing function of ||x− xn ||

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 4 /31 Weighting depends on distance −→

Page 2: Radial Basis Functions Lecture 18 Learning From Data

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

weighting depends on the distance ||x− xn ||

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 5 /31 Relative to scale r −→

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

. . . relative to a scale parameter r

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 6 /31 Determined by φ −→

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 7 /31 Example Kernels φ −→

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 8 /31 Nonparametric RBF final hypothesis −→

Page 3: Radial Basis Functions Lecture 18 Learning From Data

Nonparametric RBF – Regression

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of target values

ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 9 /31 Nonparametric RBF – classsification −→

Nonparametric RBF – Classification

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) = sign

(N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn)

Weighted average of target values

ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 10 /31 Nonparametric RBF – logistic regression −→

Nonparametric RBF – Logistic Regression

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· Jyn = +1K

Weighted average of target values

ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 11 /31 Choosing the scale r −→

Choice of Scale r

Nearest Neighbor

k = 1 k = 3 k = 11Choosing k:

k = 3

k =√N

CV

Nonparametric RBF

r = 0.01 r = 0.05 r = 0.5 Choosing r:

r ∼ 12d√N

CV

overfitting underfitting

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 12 /31 Highlights of Nonparametric RBF −→

Page 4: Radial Basis Functions Lecture 18 Learning From Data

Highlights of Nonparametric RBF

1. Simple (‘smooth’ version of k-NN rule).

2. No training.

3. Near optimal Eout.

4. Easy to justify classification to customer.

5. Can do classification, multi-class, regression, logistic regression.

6. Computationally demanding.

}A good! method

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 13 /31 Bumps on Data Points −→

Scaled Bumps on Each Data Point

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of ynx (xn, yn)

αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x)φ

(||x− xn ||r

)

Sum of bumps at xn scaled by wn(x)

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 14 /31 Rewrite as weighted bumps −→

Scaled Bumps on Each Data Point

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of ynx (xn, yn)

αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x)φ

(||x− xn ||r

)

Sum of bumps at xn scaled by wn(x)

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 15 /31 Weighted bumps, wn(x) −→

Scaled Bumps on Each Data Point

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of ynx (xn, yn)

αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x) · φ(||x− xn ||

r

)

Sum of bumps at xn scaled by wn(x)

x (xn, yn)

wn

y

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 16 /31 Nonparametric RBF: 3 point example −→

Page 5: Radial Basis Functions Lecture 18 Learning From Data

Nonparametric RBF: wn(x)

Nonparametric RBF

g(x) =

N∑

n=1

wn(x) · φ(||x− xn ||

r

)

Only need to specify r.r = 0.1

x

y

r = 0.3

x

y

Parametric RBF

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.

— overfit the data?

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 17 /31 Parametric RBF −→

Parametric RBF, wn – A Linear Model

Nonparametric RBF

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

Only need to specify r.r = 0.1

x

y

r = 0.3

x

y

Parametric RBF

h(x) =

N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.— overfit the data?

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 18 /31 Parametric RBF 3 point example −→

Parametric RBF – A Linear Model

Nonparametric RBF

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

Only need to specify r.r = 0.1

x

y

r = 0.3

x

y

Parametric RBF

h(x) =

N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.— overfit the data?

r = 0.1

x

y

r = 0.3

x

y

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 19 /31 RBF-Nonlinear Transform −→

RBF-Nonlinear Transform Depends on Data

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)= wtz

z = Φ(x) =

φ1(x)φ2(x)...

φN(x)

, φn(x)=φ( ||x−xn ||

r ). Z =

— zt1 —— zt2 —

...— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...— Φ(xN)

t —

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 20 /31 Solving for w −→

Page 6: Radial Basis Functions Lecture 18 Learning From Data

RBF-Nonlinear Transform Depends on Data

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)= wtz

z = Φ(x) =

φ1(x)φ2(x)...

φN(x)

, φn(x)=φ( ||x−xn ||

r ). Z =

— zt1 —— zt2 —

...— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...— Φ(xN)

t —

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

r = 0.1

x

y

r = 0.3

x

yc© AM

L Creator: Malik Magdon-Ismail Radial Basis Functions: 21 /31 Reducing N → k (nonparametric) −→

Reducing the Number of Bumps: Nonparametric

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)

Φ(x)t = [1,Φ1(x), . . . ,Φk(x)], where Φj(x) = Φ(||x−µj ||

r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 22 /31 Parametric, N centers −→

Reducing the Number of Bumps: Parametric

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)

Φ(x)t = [1,Φ1(x), . . . ,Φk(x)], where Φj(x) = Φ(||x−µj ||

r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 23 /31 k-RBF-Network −→

Reducing the Number of Bumps: k-RBF-Network

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)

Φ(x)t = [1,Φ1(x), . . . ,Φk(x)], where Φj(x) = Φ(||x−µj ||

r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 24 /31 Graphical representation −→

Page 7: Radial Basis Functions Lecture 18 Learning From Data

Reducing the Number of Bumps: k-RBF-Network

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)

Φ(x)t = [1,Φ1(x), . . . ,Φk(x)], where Φj(x) = Φ(||x−µj ||

r

).

||x−µk ||r

x

wj

+

· · ·· · ·φ φ φ

w1

h(x)

wkw0

||x−µ1 ||r

||x−µj ||r

1

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 25 /31 Fitting the data −→

Fitting the Data

Before: bumps were centered on xn — no choice

Now: we may choose the bump centers µj

Choose them to ‘cover’ the data

As the centers of k ‘clusters’

Given the bump centers, we have a linear model to determine the wj

That’s ‘easy’, we know how to do that.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 26 /31 The algorithm −→

Fitting the Data

Fitting the RBF-network to the data (given k, r):

1: Use the inputs X to determine k centers µ1, . . . ,µk.

2: Compute the N × (k + 1) feature matrix Z

Z =

— zt1 —

— zt2 —...

— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...

— Φ(xN)t —

, where Φ(x) =

1

φ1(x)...

φk(x)

, φj(x) = φ

(||x−µj ||

r

)

Each row of Z is the RBF-feature corresponding to xn (with dummy bias coordinate 1).

3: Fit the linear model Zw to y to determine the weights w∗.classification: PLA, pocket, linear programming,. . . .regression: pseudoinverse.

logistic regression: gradient descent on cross entropy error.

Choose r using CV, or (a heuristic):

r ∼ radius of data

k1/d(so your clusters ‘cover’ the data)

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 27 /31 Example, k = 4, 10 −→

Our Example

k = 4, r = 1k

k = 10, r = 1k

x

y

0.2

x

y

w = (ZtZ)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 28 /31 Regularization −→

Page 8: Radial Basis Functions Lecture 18 Learning From Data

Use Regularization to Fight Overfitting

k = 10, r = 1k k = 10, r = 1

k , regularized

x

y0.2

x

y

w = (ZtZ + λI)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 29 /31 Summary of k-RBF-Network −→

Reflecting on the k-RBF-Network

1. We derived it as a ‘soft’ generalization of k-NN rule.Can also be derived from regularization theory.

Can also be derived from noisy interpolation theory.

2. Can use nonparametric or parametric versions.

3. Given centers, ‘easy’ to learn the weights using techniques from linear models.A linear model with an adaptable nonlinear transform.

4. We used uniform bumps – can have different shapes Σj.

5.NEXT: How to better choose the centers: unsupervised learning.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 30 /31 A peek at unsupervised learning −→

A Peek at Unsupervised Learning

21-NN rule, 10 Classes 10 Clustering of Data

0

1

2

3

4

67

89

Average Intensity

Sym

metry

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 31 /31