radial basis functions lecture 18 learning from data

Learning From DataLecture 18

Radial Basis Functions

Non-Parametric RBF

Parametric RBFk-RBF-Network

M. Magdon-IsmailCSCI 4100/6100

recap: Data Condensation and Nearest Neighbor Search

Training Set Consistent

−−−→

S1

S2

x

Branch and bound for finding

nearest neighbors.

Lloyd’s algorithm for finding agood clustering.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 2 /31 RBF vs. k-NN −→

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.each neighbor has equal weight

What about using all data to compute g(x)?

RBF: Use all data.data further away from x have less weight.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 3 /31 Weighting data points −→

Weighting the Data Points: αn

Test point x.

αn: the weight of xn in g(x).

αn(x) = φ

(||x− xn ||r

)

decreasing function of ||x− xn ||

Most popular kernel: Gaussian

φ(z) = e−12z

2

.

Window kernel, mimics k-NN,

φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 4 /31 Weighting depends on distance −→


Test point x.


αn(x) = φ

(||x− xn ||r

)

weighting depends on the distance ||x− xn ||


φ(z) = e−12z

2

.


φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 5 /31 Relative to scale r −→


Test point x.


αn(x) = φ

(||x− xn ||r

)

. . . relative to a scale parameter r


φ(z) = e−12z

2

.


φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 6 /31 Determined by φ −→


Test point x.


αn(x) = φ

(||x− xn ||r

)

kernel φ determines how the weighting decreases with distance


φ(z) = e−12z

2

.


φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 7 /31 Example Kernels φ −→


Test point x.


αn(x) = φ

(||x− xn ||r

)

kernel φ determines how the weighting decreases with distance


φ(z) = e−12z

2

.


φ(z) =

{1 z ≤ 1,

0 z > 1,

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 8 /31 Nonparametric RBF final hypothesis −→

Nonparametric RBF – Regression

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of target values

ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 9 /31 Nonparametric RBF – classsification −→

Nonparametric RBF – Classification

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) = sign

(N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn)


ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 10 /31 Nonparametric RBF – logistic regression −→

Nonparametric RBF – Logistic Regression

αn(x) = φ

(||x− xn ||r

)

x (xn, yn)

αn

y

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· Jyn = +1K


ր

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 11 /31 Choosing the scale r −→

Choice of Scale r

Nearest Neighbor

k = 1 k = 3 k = 11Choosing k:

k = 3

k =√N

CV

Nonparametric RBF

r = 0.01 r = 0.05 r = 0.5 Choosing r:

r ∼ 12d√N

CV

overfitting underfitting

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 12 /31 Highlights of Nonparametric RBF −→

Highlights of Nonparametric RBF

1. Simple (‘smooth’ version of k-NN rule).

2. No training.

3. Near optimal Eout.

4. Easy to justify classification to customer.

5. Can do classification, multi-class, regression, logistic regression.

6. Computationally demanding.

}A good! method

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 13 /31 Bumps on Data Points −→

Scaled Bumps on Each Data Point

g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn

Weighted average of ynx (xn, yn)

αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x)φ

(||x− xn ||r

)

Sum of bumps at xn scaled by wn(x)

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 14 /31 Rewrite as weighted bumps −→


g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn


αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x)φ

(||x− xn ||r

)


c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 15 /31 Weighted bumps, wn(x) −→


g(x) =N∑

n=1

(αn(x)∑N

m=1 αm(x)

)· yn


αn

y

g(x) =N∑

n=1

(yn∑N

m=1 αm(x)

)· φ(||x− xn ||

r

)

=N∑

n=1

wn(x) · φ(||x− xn ||

r

)


x (xn, yn)

wn

y

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 16 /31 Nonparametric RBF: 3 point example −→

Nonparametric RBF: wn(x)

Nonparametric RBF

g(x) =

N∑

n=1

wn(x) · φ(||x− xn ||

r

)

Only need to specify r.r = 0.1

x

y

r = 0.3

x

y

Parametric RBF

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.

— overfit the data?

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 17 /31 Parametric RBF −→

Parametric RBF, wn – A Linear Model

Nonparametric RBF

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)


x

y

r = 0.3

x

y

Parametric RBF

h(x) =

N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.— overfit the data?

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 18 /31 Parametric RBF 3 point example −→

Parametric RBF – A Linear Model

Nonparametric RBF

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)


x

y

r = 0.3

x

y

Parametric RBF

h(x) =

N∑

n=1

wn · φ(||x− xn ||

r

)

Fix r; need to determine the parameters wn.— fit the data.— overfit the data?

r = 0.1

x

y

r = 0.3

x

y

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 19 /31 RBF-Nonlinear Transform −→

RBF-Nonlinear Transform Depends on Data

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)= wtz

z = Φ(x) =

φ1(x)φ2(x)...

φN(x)

, φn(x)=φ( ||x−xn ||

r ). Z =

— zt1 —— zt2 —

...— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...— Φ(xN)

t —

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 20 /31 Solving for w −→

RBF-Nonlinear Transform Depends on Data

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)= wtz

z = Φ(x) =

φ1(x)φ2(x)...

φN(x)

, φn(x)=φ( ||x−xn ||

r ). Z =

— zt1 —— zt2 —

...— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...— Φ(xN)

t —

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

r = 0.1

x

y

r = 0.3

x

yc© AM

L Creator: Malik Magdon-Ismail Radial Basis Functions: 21 /31 Reducing N → k (nonparametric) −→

Reducing the Number of Bumps: Nonparametric

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)

Φ(x)t = [1,Φ1(x), . . . ,Φk(x)], where Φj(x) = Φ(||x−µj ||

r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 22 /31 Parametric, N centers −→

Reducing the Number of Bumps: Parametric

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)


r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 23 /31 k-RBF-Network −→

Reducing the Number of Bumps: k-RBF-Network

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)


r

).

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 24 /31 Graphical representation −→

Reducing the Number of Bumps: k-RBF-Network

g(x) =N∑

n=1

wn(x) · φ(||x− xn ||

r

)

−−→

h(x) =N∑

n=1

wn · φ(||x− xn ||

r

)

−−→

h(x) = w0 +k∑

j=1

wj · φ(||x− µj ||

r

)

= wtΦ(x)


r

).

||x−µk ||r

x

wj

+

· · ·· · ·φ φ φ

w1

h(x)

wkw0

||x−µ1 ||r

||x−µj ||r

1

ւnonlinear in µj

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 25 /31 Fitting the data −→

Fitting the Data

Before: bumps were centered on xn — no choice

Now: we may choose the bump centers µj

Choose them to ‘cover’ the data

As the centers of k ‘clusters’

Given the bump centers, we have a linear model to determine the wj

That’s ‘easy’, we know how to do that.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 26 /31 The algorithm −→

Fitting the Data

Fitting the RBF-network to the data (given k, r):

1: Use the inputs X to determine k centers µ1, . . . ,µk.

2: Compute the N × (k + 1) feature matrix Z

Z =

— zt1 —

— zt2 —...

— ztN —

=

— Φ(x1)t —

— Φ(x2)t —

...

— Φ(xN)t —

, where Φ(x) =

1

φ1(x)...

φk(x)

, φj(x) = φ

(||x−µj ||

r

)

Each row of Z is the RBF-feature corresponding to xn (with dummy bias coordinate 1).

3: Fit the linear model Zw to y to determine the weights w∗.classification: PLA, pocket, linear programming,. . . .regression: pseudoinverse.

logistic regression: gradient descent on cross entropy error.

Choose r using CV, or (a heuristic):

r ∼ radius of data

k1/d(so your clusters ‘cover’ the data)

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 27 /31 Example, k = 4, 10 −→

Our Example

k = 4, r = 1k

k = 10, r = 1k

x

y

0.2

x

y

w = (ZtZ)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 28 /31 Regularization −→

Use Regularization to Fight Overfitting

k = 10, r = 1k k = 10, r = 1

k , regularized

x

y0.2

x

y

w = (ZtZ + λI)−1Zty

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 29 /31 Summary of k-RBF-Network −→

Reflecting on the k-RBF-Network

1. We derived it as a ‘soft’ generalization of k-NN rule.Can also be derived from regularization theory.

Can also be derived from noisy interpolation theory.

2. Can use nonparametric or parametric versions.

3. Given centers, ‘easy’ to learn the weights using techniques from linear models.A linear model with an adaptable nonlinear transform.

4. We used uniform bumps – can have different shapes Σj.

5.NEXT: How to better choose the centers: unsupervised learning.

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 30 /31 A peek at unsupervised learning −→

A Peek at Unsupervised Learning

21-NN rule, 10 Classes 10 Clustering of Data

0

1

2

3

4

67

89

Average Intensity

Sym

metry

c© AML Creator: Malik Magdon-Ismail Radial Basis Functions: 31 /31

radial basis functions lecture 18 learning from data

Documents