function learning and neural nets r&n: chap. 20, sec. 20.5

45
1 Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

Upload: mallory-gonzalez

Post on 30-Dec-2015

37 views

Category:

Documents


1 download

DESCRIPTION

Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5. f(x). x. Function-Learning Formulation. Goal function f Training set: ( x (i) ,y (i) ), i = 1,…,n, y (i) = f ( x (i) ) Inductive inference: find a function h that fits the points well Same Keep-It-Simple bias. f(x). x. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

1

Function Learning and Neural Nets

R&N: Chap. 20, Sec. 20.5

Page 2: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

2

Function-Learning Formulation

Goal function f Training set: (x(i),y(i)), i = 1,…,n, y(i)=f(x(i))

Inductive inference: find a function h that fits the points well

Same Keep-It-Simple biasx

f(x)

Page 3: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

3

Least-Squares Fitting Propose a class of functions g(x,)

parameterized by Minimize E() = i ( g(x(i),)-y(i))2

x

f(x)

Page 4: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

4

Linear Least-Squares

g(x,) = x1 1 + … + xN N

Best given by = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s

x

f(x)g(x,)

Page 5: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

5

Constant offset

Set x0=1, g(x,) = x0 0 + x1 1 + … + xN N

Best given by = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s

x

f(x)g(x,)

Page 6: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

6

Nonlinear Least-Squares

E.g. quadratic g(x,) = 0 + x 1 + x2

2

E.g. exponential g(x,) = exp(0 + x

1) Any combinations

g(x,) = exp(0 + x 1) + 2 + x

3x

f(x)

linear

quadratic other

Page 7: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

7

Performance of Nonlinear Least-squares

Overfitting: too many parameters Efficient optimization

Often can only find a local minimum of objective E()

Expensive with lots of data!

Page 8: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

8

Neural Networks Overfitting: too many parameters Efficient optimization

Often can only find a local minimum of objective E()

Expensive with lots of data!

Page 9: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

9

Perceptron(The goal function f is a boolean

one)

gxi

x1

xn

ywi

y = g(i=1,…,n wi xi)

+ +

+

++ -

-

--

-x1

x2

w1 x1 + w2 x2 = 0

Page 10: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

10

gxi

x1

xn

ywi

y = g(i=1,…,n wi xi)

+ +

+ +

+ -

-

--

-

?

Perceptron(The goal function f is a boolean

one)

Page 11: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

11

Unit (Neuron)

gxi

x1

xn

ywi

y = g(i=1,…,n wi xi)

g(u) = 1/[1 + exp(-u)]

Page 12: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

12

A Single Neuron can learn

gxi

x1

xn

ywi

A disjunction of boolean literals x1 x2 x3

Majority function

XOR?

Page 13: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

13

Neural Network

Network of interconnected neurons

gxi

x1

xn

ywi

gxi

x1

xn

ywi

Acyclic (feed-forward) vs. recurrent networks

Page 14: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

14

Two-Layer Feed-Forward Neural Network

Inputs Hiddenlayer

Outputlayer

w1j w2k

Page 15: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

15

Backpropagation (Principle)

New example y(k) = f(x(k)) φ(k) = outcome of NN with weights w(k-1) for

inputs x(k) Error function: E(k)(w(k-1)) = ||φ(k) – y(k)||2

wij(k) = wij

(k-1) – εE(k)/wij (w(k) = w(k-1) - E)

Backpropagation algorithm: Update the weights of the inputs to the last layer, then the weights of the inputs to the previous layer, etc.

Page 16: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

16

Understanding Backpropagation

Minimize E() Gradient Descent…

E()

Page 17: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

17

Understanding Backpropagation

Minimize E() Gradient Descent…

E()

Gradient of E

Page 18: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

18

Understanding Backpropagation

Minimize E() Gradient Descent…

E()

Step ~ gradient

Page 19: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

19

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e1

Page 20: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

20

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e1

Page 21: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

21

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e2

Page 22: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

22

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e2

Page 23: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

23

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e3

Page 24: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

24

Understanding Backpropagation

Example of Stochastic Gradient Descent

Minimize E() = e1()+e2()+…+eN() Here ei = (g(x(i),)-y(i))2

Take a step to reduce eiE()

Gradient of e3

Page 25: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

25

Stochastic Gradient Descent

Parameter values over time

(local) minimum of E

Page 26: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

26

Stochastic Gradient Descent

Objective function values over time

Page 27: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

27

Caveats

Choosing a convergent “learning rate” can be hard in practice

E()

Page 28: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

28

Comments and Issues

How to choose the size and structure of networks?• If network is too large, risk of over-fitting

(data caching)• If network is too small, representation

may not be rich enough Role of representation: e.g., learn the

concept of an odd number Incremental learning

Page 29: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

29

Role of Marketing

Not a good model of a neuron Spiking behavior, recurrence in real NNs

No special properties above other learning techniques

Like other learning techniques, a convenient way to get results without thinking too hard

Page 30: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

30

Incremental (“Online”) Function Learning

Page 31: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

31

Incremental (“Online”) Function Learning

Data is streaming into learnerx1,y1, …, xt,yt yi = f(xi)

Observes xt+1 and must make

prediction for next time step yt+1

Brute force approach: Store all data at step t Use your learner of choice on all data up

to time t, predict for time t+1

Page 32: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

32

Example: Mean Estimation

yi = + error term (no x’s) Current estimate t= 1/t i=1…t yi

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

5

Page 33: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

33

Example: Mean Estimation

yi = + error term (no x’s) Current estimate t= 1/t i=1…t yi

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

5

y6

Page 34: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

34

Example: Mean Estimation

yi = + error term (no x’s) Current estimate t= 1/t i=1…t yi

t+1= 1/(t+1) i=1…t+1 yi

= 1/(t+1) (yt+1 + i=1…t yi) = 1/(t+1) (yt+1 + tt)

5 6 = 5/6 5 + 1/6 y6

Page 35: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

35

Example: Mean Estimation

t+1= 1/(t+1) (yt+1 + tt) Only need to store t, t

Similar formulas for standard deviation

5 6 = 5/6 6 + 1/6 y6

Page 36: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

36

Incremental Least Squares

Recall Least Squares estimate = (ATA)-1 AT b

Where A is matrix of x(i)’s, b is vector of y(i)’s (laid out in rows)

A =

x(1)

x(2)

x(N)

b =

y(1)

y(2)

y(N)

…NxM Nx1

Page 37: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

37

Incremental Least Squares

Let A(t), b(t) be A matrix, b vector up to time t

(t) = (A(t)TA(t))-1 A(t)T b(t)

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

Page 38: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

38

Incremental Least Squares

Let A(t), b(t) be A matrix, b vector up to time t

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

Page 39: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

39

Incremental Least Squares

Let A(t), b(t) be A matrix, b vector up to time t

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1)TA(t+1) = A(t)TA(t) + x(t+1)x(t+1)T

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

Page 40: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

40

Incremental Least Squares

Let A(t), b(t) be A matrix, b vector up to time t

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1)TA(t+1) = A(t)TA(t) + x(t+1)x(t+1)T

A(t+1) =

x(t+1)

b(t+1) =

y(t+1)

(T+1)xM (t+1)x1

b(t)A(t)

Page 41: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

41

Incremental Least Squares

Let A(t), b(t) be A matrix, b vector up to time t

(t+1) = (A(t+1)TA(t+1))-1 A(t+1)T b(t+1)

A(t+1)T b(t+1) =A(t)T b(t) + y(t+1)x(t+1)

A(t+1)TA(t+1) = A(t)TA(t) + x(t+1)x(t+1)T

Sherman-Morrison Update (Y + xxT)-1 = Y-1 - Y-1

xxT Y-1 / (1 – xT Y-1 x)

Page 42: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

42

Incremental Least Squares

Putting it all together Store

p(t) = A(t)Tb(t)

Q(t) = (A(t)TA(t))-1

Updatep(t+1) = p(t) + y x

Q(t+1) = Q(t) - Q(t)

xxT Q(t) / (1 – xT Q(t) x)(t+1) = Q(t+1)p(t+1)

Page 43: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

43

Recap

• Function learning with least squares

• Neural nets, backpropagation, and gradient descent

• Incremental learning

Page 44: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

44

Reminder

• HW6 due

• HW7 available on Oncourse

Page 45: Function Learning and Neural Nets R&N: Chap. 20, Sec. 20.5

45

Machine Learning Classes

• CS659 (Hauser) Principles of Intelligent Robot Motion

• CS657 (Yu) Computer Vision

• STAT520 (Trosset) Introduction to Statistics

• STAT682 (Rocha) Statistical Model Selection