example with numbers - vision.in.tum.de · • regression learns a function, classification...

58
PD Dr. Rudolph Triebel Computer Vision Group Machine Learning for Computer Vision lowers the probability that the door is open” Example with Numbers Assume we have a second sensor: Then: (from above) 1

Upload: dinhmien

Post on 14-May-2018

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

“ lowers the probability that the door is open”

Example with Numbers

Assume we have a second sensor:

Then:

(from above)

!1

Page 2: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

General Form

Measurements:

Markov assumption: and are conditionally independent given the state .

Recursion

!2

Page 3: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Example: Sensing and Acting

Now the robot senses the door state and acts (it opens or closes the door).

!3

Page 4: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

If the door is open, the action “close door” succeeds in 90% of all cases.

State Transitions

The outcome of an action is modeled as a random variable where in our case means “state after closing the door”.

State transition example:

!4

Page 5: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

If the state space is continuous:

If the state space is discrete:

For a given action we want to know the probability . We do this by integrating over all possible previous states .

!5

The Outcome of Actions

Page 6: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision !6

Back to the Example

Page 7: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Definition 2.1: Let be a sequence of sensor measurements and actions until time . Then the belief of the current state is defined as

Sensor Update and Action Update

So far, we learned two different ways to update the system state: • Sensor update: • Action update: • Now we want to combine both:

!7

Page 8: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

This incorporates the following Markov assumptions:

We can describe the overall process using a Dynamic Bayes Network:

(measurement)

(state)

!8

Graphical Representation

p(xt | x0:t�1, u1:t, z1:t�1) = p(xt | xt�1, ut)

p(zt | x0:t, u1:t, z1:t) = p(zt | xt)

Page 9: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

(Bayes)

(Markov)

(Tot. prob.)

(Markov)

(Markov)

!9

The Overall Bayes Filter

Page 10: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Algorithm Bayes_filter :

1. if is a sensor measurement then 2. 3. for all do 4. 5. 6. for all do 7. else if is an action then 8. for all do 9. return

Machine Learning for Computer Vision

The Bayes Filter Algorithm

!10

Page 11: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Bayes Filter Variants

The Bayes filter principle is used in • Kalman filters • Particle filters • Hidden Markov models • Dynamic Bayesian networks • Partially Observable Markov Decision Processes

(POMDPs)

!11

Page 12: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Summary• Probabilistic reasoning is necessary to deal with

uncertain information, e.g. sensor measurements • Using Bayes rule, we can do diagnostic reasoning

based on causal knowledge • The outcome of a robot‘s action can be described by a

state transition diagram • Probabilistic state estimation can be done recursively

using the Bayes filter using a sensor and a motion update • A graphical representation for the state estimation

problem is the Dynamic Bayes Network

!12

Page 13: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

Computer Vision Group Prof. Daniel Cremers

2. Regression

Page 14: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Categories of Learning (Rep.)

no supervision, but a reward function

Learning

Unsupervised Learning

Supervised Learning

Reinforcement Learning

clustering, density estimation

learning from a training data set, inference on

the test data

!14

Regression Classification

target set is discrete, e.g.

Y = [1, . . . , C]

target set is continuous, e.g.

Y = R

Page 15: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Mathematical Formulation (Rep.)

Suppose we are given a set of objects and a set oof object categories (classes). In the learning task we search for a mapping such that similar elements in are mapped to similar elements in .

Difference between regression and classification: • In regression, is continuous, in classification it is

discrete

• Regression learns a function, classification usually learns class labels

For now we will treat regression

!15

Page 16: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Basis Functions

In principal, the elements of can be anything (e.g. real numbers, graphs, 3D objects). To be able to treat these objects mathematically we need functions that map from to . We call these the basis functions. We can also interpret the basis functions as functions that extract features from the input data. Features reflect the properties of the objects (width, height, etc.).

!16

RM

Page 17: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Simple Example: Linear Regression

• Assume: (identity) • Given: data points • Goal: predict the value t of a new example x • Parametric formulation:

x1 x2 x3 x4 x5

t5 t3 t4 t1 t2

!17

ff(x,w) = w0 + w1x

f(x,w⇤)

Page 18: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Linear RegressionTo determine the function f, we need an error function:

We search for parameters s.th. is minimal:

“Sum of Squared Errors”

!18

E(w) =1

2

NX

i=1

(f(xi,w)� ti)2

rE(w) =NX

i=1

(f(xi,w)� ti)rf(xi,w)·= (0 0)

f(x,w) = w0 + w1x ) rf(xi,w) = (1 xi)

Page 19: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Linear RegressionTo determine the function f, we need an error function:

We search for parameters s.th. is minimal:

Using vector notation:

“Sum of Squared Errors”

!19

E(w) =1

2

NX

i=1

(f(xi,w)� ti)2

rE(w) =NX

i=1

(f(xi,w)� ti)rf(xi,w)·= (0 0)

f(x,w) = w0 + w1x ) rf(xi,w) = (1 xi)

f(xi,w) = wTxixi = (1 xi)T )

Page 20: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Linear RegressionTo determine the function f, we need an error function:

We search for parameters s.th. is minimal:

Using vector notation:

“Sum of Squared Errors”

!20

| {z }=:AT

rE(w) =NX

i=1

wTxixTi �

NX

i=1

tixTi = (0 0) ) wT

NX

i=1

xixTi =

NX

i=1

tixTi

| {z }=:bT

E(w) =1

2

NX

i=1

(f(xi,w)� ti)2

rE(w) =NX

i=1

(f(xi,w)� ti)rf(xi,w)·= (0 0)

f(x,w) = w0 + w1x ) rf(xi,w) = (1 xi)

f(xi,w) = wTxixi = (1 xi)T )

Page 21: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Polynomial Regression

Now we have: Given: data points Assume we are given M basis functions

Model Complexity

Data Set Size

!21

f(x,w) = w0 +MX

i=1

wj�j(x) = wT�(x)

f

M � 1<latexit sha1_base64="Cif6fbZLBKWaAiM/r0il5ka3tEA=">AAACfXicbVFLa9tAEF6rr9R9Je2xl6XG0ENqJBNIjyG99GJIoU4CsQmj1dhesg+xMyo2Qr8h1/an9de0K0eHxunALh/z+Gbmm7w0mjhNf/eSR4+fPH2297z/4uWr12/2D96ek6+CwqnyxofLHAiNdjhlzQYvy4Bgc4MX+c2XNn7xAwNp777zpsS5haXTC62Ao2s6kZ9kdr0/SEfp1uRDkHVgIDo7uz7o5bPCq8qiY2WA6CpLS57XEFgrg01/VhGWoG5giVcROrBI83o7bSOH0VPIhQ/xOZZb778VNViijc1jpgVe0W6sdf43xuuWkHba8+LzvNaurBiduuu+qIxkL1s9ZKEDKjabCEAFHReQagUBFEfV7jG1EheHoTLxV962y5OrbI4Bi7i7WfpYvrJjbKQcyqhK1IdkN7DsWKR2xGAiR39GyMpXLjaqJ7CeAAe9jhejph6nTT9eJdu9wUNwPh5l6Sj7djQ4Oe3usyfeiw/io8jEsTgRX8WZmAoltLgVP8Wv3p9kmBwmo7vUpNfVvBP3LDn+CzG4w+E=</latexit><latexit sha1_base64="Cif6fbZLBKWaAiM/r0il5ka3tEA=">AAACfXicbVFLa9tAEF6rr9R9Je2xl6XG0ENqJBNIjyG99GJIoU4CsQmj1dhesg+xMyo2Qr8h1/an9de0K0eHxunALh/z+Gbmm7w0mjhNf/eSR4+fPH2297z/4uWr12/2D96ek6+CwqnyxofLHAiNdjhlzQYvy4Bgc4MX+c2XNn7xAwNp777zpsS5haXTC62Ao2s6kZ9kdr0/SEfp1uRDkHVgIDo7uz7o5bPCq8qiY2WA6CpLS57XEFgrg01/VhGWoG5giVcROrBI83o7bSOH0VPIhQ/xOZZb778VNViijc1jpgVe0W6sdf43xuuWkHba8+LzvNaurBiduuu+qIxkL1s9ZKEDKjabCEAFHReQagUBFEfV7jG1EheHoTLxV962y5OrbI4Bi7i7WfpYvrJjbKQcyqhK1IdkN7DsWKR2xGAiR39GyMpXLjaqJ7CeAAe9jhejph6nTT9eJdu9wUNwPh5l6Sj7djQ4Oe3usyfeiw/io8jEsTgRX8WZmAoltLgVP8Wv3p9kmBwmo7vUpNfVvBP3LDn+CzG4w+E=</latexit><latexit sha1_base64="Cif6fbZLBKWaAiM/r0il5ka3tEA=">AAACfXicbVFLa9tAEF6rr9R9Je2xl6XG0ENqJBNIjyG99GJIoU4CsQmj1dhesg+xMyo2Qr8h1/an9de0K0eHxunALh/z+Gbmm7w0mjhNf/eSR4+fPH2297z/4uWr12/2D96ek6+CwqnyxofLHAiNdjhlzQYvy4Bgc4MX+c2XNn7xAwNp777zpsS5haXTC62Ao2s6kZ9kdr0/SEfp1uRDkHVgIDo7uz7o5bPCq8qiY2WA6CpLS57XEFgrg01/VhGWoG5giVcROrBI83o7bSOH0VPIhQ/xOZZb778VNViijc1jpgVe0W6sdf43xuuWkHba8+LzvNaurBiduuu+qIxkL1s9ZKEDKjabCEAFHReQagUBFEfV7jG1EheHoTLxV962y5OrbI4Bi7i7WfpYvrJjbKQcyqhK1IdkN7DsWKR2xGAiR39GyMpXLjaqJ7CeAAe9jhejph6nTT9eJdu9wUNwPh5l6Sj7djQ4Oe3usyfeiw/io8jEsTgRX8WZmAoltLgVP8Wv3p9kmBwmo7vUpNfVvBP3LDn+CzG4w+E=</latexit><latexit sha1_base64="Cif6fbZLBKWaAiM/r0il5ka3tEA=">AAACfXicbVFLa9tAEF6rr9R9Je2xl6XG0ENqJBNIjyG99GJIoU4CsQmj1dhesg+xMyo2Qr8h1/an9de0K0eHxunALh/z+Gbmm7w0mjhNf/eSR4+fPH2297z/4uWr12/2D96ek6+CwqnyxofLHAiNdjhlzQYvy4Bgc4MX+c2XNn7xAwNp777zpsS5haXTC62Ao2s6kZ9kdr0/SEfp1uRDkHVgIDo7uz7o5bPCq8qiY2WA6CpLS57XEFgrg01/VhGWoG5giVcROrBI83o7bSOH0VPIhQ/xOZZb778VNViijc1jpgVe0W6sdf43xuuWkHba8+LzvNaurBiduuu+qIxkL1s9ZKEDKjabCEAFHReQagUBFEfV7jG1EheHoTLxV962y5OrbI4Bi7i7WfpYvrJjbKQcyqhK1IdkN7DsWKR2xGAiR39GyMpXLjaqJ7CeAAe9jhejph6nTT9eJdu9wUNwPh5l6Sj7djQ4Oe3usyfeiw/io8jEsTgRX8WZmAoltLgVP8Wv3p9kmBwmo7vUpNfVvBP3LDn+CzG4w+E=</latexit>

j<latexit sha1_base64="8hnDaes31D3huxz1kSvfmpdGaos=">AAACenicZVFLa9tAEF6rr9R9JGmPvSwxhpYWI5lAewztpZdAAnUSiE0Yrcb2NvsQu6NiI/QLem1/XP9LDx05oiTOwC4f8/hm5pu8NDpSmv7pJQ8ePnr8ZOdp/9nzFy939/ZfnUVfBYUT5Y0PFzlENNrhhDQZvCgDgs0NnufXX9r4+Q8MUXv3jdYlziwsnJ5rBcSu0+9Xe4N0lG5M3gdZBwais5Or/Z6aFl5VFh0pAzFeZmlJsxoCaWWw6U+riCWoa1jgJUMHFuOs3kzayCF7Cjn3gZ8jufHerqjBxri2OWdaoGXcjrXO/7Hh7SCtWsa41Z/mn2a1dmVF6NRN+3llJHnZiiELHVCRWTMAFTRvINUSAihiye4wtfoWH0Jl+FfetttHV9kcAxa8vFl4Ll/aMTZSDiXLwgJF2U0sOxapXSQwzNGfRiTlK8eN6mNYHQMFveJzxaYep02fz5JtH+E+OBuPsnSUnR4Ojj53B9oRb8SBeCsy8VEcia/iREyEEih+il/id+9vcpC8S97fpCa9rua1uGPJ4T8gncNj</latexit><latexit sha1_base64="8hnDaes31D3huxz1kSvfmpdGaos=">AAACenicZVFLa9tAEF6rr9R9JGmPvSwxhpYWI5lAewztpZdAAnUSiE0Yrcb2NvsQu6NiI/QLem1/XP9LDx05oiTOwC4f8/hm5pu8NDpSmv7pJQ8ePnr8ZOdp/9nzFy939/ZfnUVfBYUT5Y0PFzlENNrhhDQZvCgDgs0NnufXX9r4+Q8MUXv3jdYlziwsnJ5rBcSu0+9Xe4N0lG5M3gdZBwais5Or/Z6aFl5VFh0pAzFeZmlJsxoCaWWw6U+riCWoa1jgJUMHFuOs3kzayCF7Cjn3gZ8jufHerqjBxri2OWdaoGXcjrXO/7Hh7SCtWsa41Z/mn2a1dmVF6NRN+3llJHnZiiELHVCRWTMAFTRvINUSAihiye4wtfoWH0Jl+FfetttHV9kcAxa8vFl4Ll/aMTZSDiXLwgJF2U0sOxapXSQwzNGfRiTlK8eN6mNYHQMFveJzxaYep02fz5JtH+E+OBuPsnSUnR4Ojj53B9oRb8SBeCsy8VEcia/iREyEEih+il/id+9vcpC8S97fpCa9rua1uGPJ4T8gncNj</latexit><latexit sha1_base64="8hnDaes31D3huxz1kSvfmpdGaos=">AAACenicZVFLa9tAEF6rr9R9JGmPvSwxhpYWI5lAewztpZdAAnUSiE0Yrcb2NvsQu6NiI/QLem1/XP9LDx05oiTOwC4f8/hm5pu8NDpSmv7pJQ8ePnr8ZOdp/9nzFy939/ZfnUVfBYUT5Y0PFzlENNrhhDQZvCgDgs0NnufXX9r4+Q8MUXv3jdYlziwsnJ5rBcSu0+9Xe4N0lG5M3gdZBwais5Or/Z6aFl5VFh0pAzFeZmlJsxoCaWWw6U+riCWoa1jgJUMHFuOs3kzayCF7Cjn3gZ8jufHerqjBxri2OWdaoGXcjrXO/7Hh7SCtWsa41Z/mn2a1dmVF6NRN+3llJHnZiiELHVCRWTMAFTRvINUSAihiye4wtfoWH0Jl+FfetttHV9kcAxa8vFl4Ll/aMTZSDiXLwgJF2U0sOxapXSQwzNGfRiTlK8eN6mNYHQMFveJzxaYep02fz5JtH+E+OBuPsnSUnR4Ojj53B9oRb8SBeCsy8VEcia/iREyEEih+il/id+9vcpC8S97fpCa9rua1uGPJ4T8gncNj</latexit><latexit sha1_base64="8hnDaes31D3huxz1kSvfmpdGaos=">AAACenicZVFLa9tAEF6rr9R9JGmPvSwxhpYWI5lAewztpZdAAnUSiE0Yrcb2NvsQu6NiI/QLem1/XP9LDx05oiTOwC4f8/hm5pu8NDpSmv7pJQ8ePnr8ZOdp/9nzFy939/ZfnUVfBYUT5Y0PFzlENNrhhDQZvCgDgs0NnufXX9r4+Q8MUXv3jdYlziwsnJ5rBcSu0+9Xe4N0lG5M3gdZBwais5Or/Z6aFl5VFh0pAzFeZmlJsxoCaWWw6U+riCWoa1jgJUMHFuOs3kzayCF7Cjn3gZ8jufHerqjBxri2OWdaoGXcjrXO/7Hh7SCtWsa41Z/mn2a1dmVF6NRN+3llJHnZiiELHVCRWTMAFTRvINUSAihiye4wtfoWH0Jl+FfetttHV9kcAxa8vFl4Ll/aMTZSDiXLwgJF2U0sOxapXSQwzNGfRiTlK8eN6mNYHQMFveJzxaYep02fz5JtH+E+OBuPsnSUnR4Ojj53B9oRb8SBeCsy8VEcia/iREyEEih+il/id+9vcpC8S97fpCa9rua1uGPJ4T8gncNj</latexit>

Page 22: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Polynomial Regression

We have defined:

Therefore:

Outer Product

!22

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2

rE(w) = wT

NX

i=1

�(xi)�(xi)T

!�

NX

i=1

ti�(xi)T

�T1

�1

“Basis functions”

T

f(x,w) = wT�(x) w 2 RM<latexit sha1_base64="6cgrN4fgVGZOKrdMlLHkvNZXJs8=">AAAClnicZVFNa9tAEF0rTZu6aeM0l0AuS42hh2IkE0hPJbSE9GJwS50EItesViN7yX6I3VFrI3TIr8m1/Tn9N1nZoiTOwC5v33ztvElyKRyG4b9WsPVs+/mLnZftV7uv3+x19t9eOFNYDmNupLFXCXMghYYxCpRwlVtgKpFwmdx8qf2Xv8A6YfQPXOYwUWymRSY4Q09NO0exYjhPsvJ3RWOh6fqZlN+rn8Nppxv2w5XRpyBqQJc0Nprut3icGl4o0Mglc+46CnOclMyi4BKqdlw4yBm/YTO49lAzBW5SrqaoaM8zKc2M9UcjXbEPM0qmnFuqxEfWn3Sbvpr87+s9dOKirug2+mP2cVIKnRcImq/bZ4WkaGgtFE2FBY5y6QHjVvgJKJ8zyzh6OR9VqrVPP9hC+psbVU/vdKESsJD64eXM+PS5GkBFaY96WbxAjjY/pk0VKrRDJn2NduwAuSm0b1QO2WLI0IqFX6WrykFYtf1aos0lPAUXg34U9qNvx93Tz82CdsgReUfek4ickFPylYzImHByS+7IH/I3OAw+BWfB+To0aDU5B+SRBaN7xgXNng==</latexit><latexit sha1_base64="6cgrN4fgVGZOKrdMlLHkvNZXJs8=">AAAClnicZVFNa9tAEF0rTZu6aeM0l0AuS42hh2IkE0hPJbSE9GJwS50EItesViN7yX6I3VFrI3TIr8m1/Tn9N1nZoiTOwC5v33ztvElyKRyG4b9WsPVs+/mLnZftV7uv3+x19t9eOFNYDmNupLFXCXMghYYxCpRwlVtgKpFwmdx8qf2Xv8A6YfQPXOYwUWymRSY4Q09NO0exYjhPsvJ3RWOh6fqZlN+rn8Nppxv2w5XRpyBqQJc0Nprut3icGl4o0Mglc+46CnOclMyi4BKqdlw4yBm/YTO49lAzBW5SrqaoaM8zKc2M9UcjXbEPM0qmnFuqxEfWn3Sbvpr87+s9dOKirug2+mP2cVIKnRcImq/bZ4WkaGgtFE2FBY5y6QHjVvgJKJ8zyzh6OR9VqrVPP9hC+psbVU/vdKESsJD64eXM+PS5GkBFaY96WbxAjjY/pk0VKrRDJn2NduwAuSm0b1QO2WLI0IqFX6WrykFYtf1aos0lPAUXg34U9qNvx93Tz82CdsgReUfek4ickFPylYzImHByS+7IH/I3OAw+BWfB+To0aDU5B+SRBaN7xgXNng==</latexit><latexit sha1_base64="6cgrN4fgVGZOKrdMlLHkvNZXJs8=">AAAClnicZVFNa9tAEF0rTZu6aeM0l0AuS42hh2IkE0hPJbSE9GJwS50EItesViN7yX6I3VFrI3TIr8m1/Tn9N1nZoiTOwC5v33ztvElyKRyG4b9WsPVs+/mLnZftV7uv3+x19t9eOFNYDmNupLFXCXMghYYxCpRwlVtgKpFwmdx8qf2Xv8A6YfQPXOYwUWymRSY4Q09NO0exYjhPsvJ3RWOh6fqZlN+rn8Nppxv2w5XRpyBqQJc0Nprut3icGl4o0Mglc+46CnOclMyi4BKqdlw4yBm/YTO49lAzBW5SrqaoaM8zKc2M9UcjXbEPM0qmnFuqxEfWn3Sbvpr87+s9dOKirug2+mP2cVIKnRcImq/bZ4WkaGgtFE2FBY5y6QHjVvgJKJ8zyzh6OR9VqrVPP9hC+psbVU/vdKESsJD64eXM+PS5GkBFaY96WbxAjjY/pk0VKrRDJn2NduwAuSm0b1QO2WLI0IqFX6WrykFYtf1aos0lPAUXg34U9qNvx93Tz82CdsgReUfek4ickFPylYzImHByS+7IH/I3OAw+BWfB+To0aDU5B+SRBaN7xgXNng==</latexit><latexit sha1_base64="6cgrN4fgVGZOKrdMlLHkvNZXJs8=">AAAClnicZVFNa9tAEF0rTZu6aeM0l0AuS42hh2IkE0hPJbSE9GJwS50EItesViN7yX6I3VFrI3TIr8m1/Tn9N1nZoiTOwC5v33ztvElyKRyG4b9WsPVs+/mLnZftV7uv3+x19t9eOFNYDmNupLFXCXMghYYxCpRwlVtgKpFwmdx8qf2Xv8A6YfQPXOYwUWymRSY4Q09NO0exYjhPsvJ3RWOh6fqZlN+rn8Nppxv2w5XRpyBqQJc0Nprut3icGl4o0Mglc+46CnOclMyi4BKqdlw4yBm/YTO49lAzBW5SrqaoaM8zKc2M9UcjXbEPM0qmnFuqxEfWn3Sbvpr87+s9dOKirug2+mP2cVIKnRcImq/bZ4WkaGgtFE2FBY5y6QHjVvgJKJ8zyzh6OR9VqrVPP9hC+psbVU/vdKESsJD64eXM+PS5GkBFaY96WbxAjjY/pk0VKrRDJn2NduwAuSm0b1QO2WLI0IqFX6WrykFYtf1aos0lPAUXg34U9qNvx93Tz82CdsgReUfek4ickFPylYzImHByS+7IH/I3OAw+BWfB+To0aDU5B+SRBaN7xgXNng==</latexit>

Page 23: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

�T1

�1

�T2

�2

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Polynomial Regression

!23

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2

rE(w) = wT

NX

i=1

�(xi)�(xi)T

!�

NX

i=1

ti�(xi)T

We have defined:

Therefore:

T

f(x,w) = wT�(x)

Page 24: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Polynomial Regression

!24

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2

rE(w) = wT

NX

i=1

�(xi)�(xi)T

!�

NX

i=1

ti�(xi)T

We have defined:

Therefore:

T

f(x,w) = wT�(x)

Page 25: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Polynomial Regression

Thus, we have:

where

It follows:“Pseudoinverse”

!25

�+

“Normal Equation”

Page 26: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Computing the Pseudoinverse

Mathematically, a pseudoinverse exists for every matrix . However: If is (close to) singular the direct solution of is numerically unstable. Therefore: Singular Value Decomposition (SVD) is used: where

• matrices U and V are orthogonal matrices

•D is a diagonal matrix Then: where contains the reciprocal of all non-zero elements of D

!26

�+

D+�+ = V D+UT

� = UDV T

Page 27: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

�j(x) = xj

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

A Simple Example

!27

Page 28: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

�j(x) = xj

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

A Simple Example

!28

“Overfitting”

Page 29: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Varying the Sample Size

!29

Page 30: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Resulting Model Parameters

0.000

0.050

0.100

0.150

0.200

-0.60-0.45-0.30-0.150.000.150.300.450.60

-30

-23

-15

-8

0

8

15

23

-8E+05-6E+05-4E+05-2E+050E+002E+054E+056E+058E+05

!30

Page 31: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Observations

• The higher the model complexity grows, the better is the fit to the data • If the model complexity is too high, all data points

are explained well, but the resulting model oscillates very much. It can not generalize well.This is called overfitting. • By increasing the size of the data set (number of

samples), we obtain a better fit of the model • More complex models have larger parameters Problem: How can we find a good model complexity

for a given data set with a fixed size?

!31

Page 32: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

RegularizationWe observed that complex models yield large parameters, leading to oscillation. Idea:

Minimize the error function and the magnitude of the parameters simultaneously

We do this by adding a regularization term :

where λ rules the influence of the regularization.

!32

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2 +

2kwk2

Page 33: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Regularization

As above, we set the derivative to zero:

With regularization, we can find a complex model for a small data set. However, the problem now is to find an appropriate regularization coefficient λ.

!33

rE(w) =NX

i=1

(wT�(xi)� ti)�(xi)T + �wT .

= 0T

Page 34: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Regularized Results

!34

Page 35: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Problem from a Different View Point

Assume that y is affected by Gaussian noise : where

Thus, we have

!35

t = f(x,w) + ✏

p(t | x,w,�) = N (t; f(x,w),�2)

Page 36: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum Likelihood Estimation

Aim: we want to find the w that maximizes p. is the likelihood of the measured data given a model. Intuitively: Find parameters w that maximize the probability of measuring the already measured data t.

We can think of this as fitting a model w to the data t. Note: σ is also part of the model and can be estimated.

For now, we assume σ is known.

“Maximum Likelihood Estimation”

!36

Page 37: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum Likelihood Estimation

Given data points: Assumption: points are drawn independently from p:

where: Instead of maximizing p we can also maximize its

logarithm (monotonicity of the logarithm)

!37

p(t | x,w,�) =NY

i=1

p(ti | xi,w,�)

=NY

i=1

N (ti;wT�(xi),�

2)

Page 38: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum Likelihood Estimation

Constant for all w Is equal to

The parameters that maximize the likelihood are equal to the minimum of the sum of squared errors

!38

i

i2�2

ln p(ti | xi,w,�)

Page 39: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

wML := argmaxw

ln p(t | x,w,�) = argminw

E(w) = (�T�)�1�T t

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum Likelihood Estimation

!39

i

i

The ML solution is obtained using the Pseudoinverse

ln p(ti | xi,w,�)

Page 40: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum A-Posteriori Estimation

So far, we searched for parameters w, that maximize the data likelihood. Now, we assume a Gaussian prior:

Using this, we can compute the posterior (Bayes):

“Maximum A-Posteriori Estimation (MAP)”

!40

Likelihood Prior Posterior

p(w | x, t) / p(t | w,x)p(w)

Page 41: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum A-Posteriori Estimation

So far, we searched for parameters w, that maximize the data likelihood. Now, we assume a Gaussian prior:

Using this, we can compute the posterior (Bayes):

strictly:

but the denominator is independent of w and we want to maximize p.

!41

p(w | x, t,�1,�2) =p(t | x,w,�1)p(w | �2)Rp(t | x,w,�1)p(w | �2)dw

Page 42: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Maximum A-Posteriori Estimation

This is equal to the regularized error minimization. The MAP Estimate corresponds to a regularized

error minimization where λ = (σ1 / σ2 )2

!42

2�21

2�21

Page 43: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Summary: MAP Estimation

To summarize, we have the following optimization problem:

The same in vector notation:

!43

J(w) =1

2

NX

n=1

(wT�(xn)� tn)2 +

2wTw �(xn) 2 RM

J(w) =1

2wT�T�w �w�T t+

1

2tT t+

2wTw t 2 RN

� 2 RN⇥M

“Feature

Matrix”

Page 44: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Summary: MAP Estimation

To summarize, we have the following optimization problem:

The same in vector notation:

And the solution is

!44

J(w) =1

2

NX

n=1

(wT�(xn)� tn)2 +

2wTw �(xn) 2 RM

J(w) =1

2wT�T�w �w�T t+

1

2tT t+

2wTw t 2 RN

w⇤ = (�IM + �T�)�1�T t

Identity matrix of size M by M

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2 +

2kwk2

E(w) =1

2

NX

i=1

(wT�(xi)� ti)2 +

2kwk2

Page 45: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

MLE And MAP

• The benefit of MAP over MLE is that prediction is less sensitive to overfitting, i.e. even if there is only little data the model predicts well.

• This is achieved by using prior information, i.e. model assumptions that are not based on any observations (= data)

• But: both methods only give the most likely model, there is no notion of uncertainty yet

Idea 1: Find a distribution over model parameters (“parameter posterior”)

!45

Page 46: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

MLE And MAP

• The benefit of MAP over MLE is that prediction is less sensitive to overfitting, i.e. even if there is only little data the model predicts well.

• This is achieved by using prior information, i.e. model assumptions that are not based on any observations (= data)

• But: both methods only give the most likely model, there is no notion of uncertainty yet

Idea 1: Find a distribution over model parameters Idea 2: Use that distribution to estimate prediction

uncertainty (“predictive distribution”)

!46

Page 47: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

When Bayes Meets Gauß

Theorem: If we are given this: I. II. Then it follows (properties of Gaussians):

III. IV.

where

!47

p(x) = N (x | µ,⌃1)

p(y | x) = N (y | Ax+ b,⌃2)

p(y) = N (y | Aµ+ b,⌃2 +A⌃1AT )

p(x | y) = N (x | ⌃(AT⌃�12 (y � b) + ⌃�1

1 µ),⌃)

⌃ = (⌃�11 +AT⌃�1

2 A)�1

”Linear Gaussian Model”

linear dependency

on x

See Bishop’s book for the proof!

Page 48: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

When Bayes Meets Gauß

Thus: When using the Bayesian approach, we can do even more than MLE and MAP by using these formulae.

This means: If the prior and the likelihood are Gaussian then the posterior and the normalizer are also Gaussian and we can compute them in closed form.

This gives us a natural way to compute uncertainty!

!48

Page 49: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Posterior Distribution

Remember Bayes Rule:

With our theorem, we can compute the posterior in closed form (and not just its maximum)! The posterior is also a Gaussian and its mean is the MAP solution.

!49

Likelihood Prior Posterior

p(w | x, t) / p(t | w,x)p(w)

Page 50: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Posterior Distribution

We have and

From this and IV. we get the posterior covariance:

and the mean: So the entire posterior distribution is

!50

p(t | w,x) = N (t;�w,�21IM )

p(w) = N (w;0,�22IM )

⌃ = (��22 IM + ��2

1 �T�)�1

= �21(�21

�22

IM + �T�)�1

p(w | t,x) = N (w;µ,⌃)

µ = ��21 ⌃�T t

N

Note: So far we only used the training data!

(x, t)

Page 51: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Predictive Distribution

We obtain the predictive distribution by integrating over all possible model parameters (“inference”):

This distribution can be computed in closed form, because both terms on the RHS are Gaussian. From above we have where and

Parameter posterior Test data likelihood

!51

p(w | t,x) = N (w;µ,⌃)

µ = ��21 ⌃�T t

= �21(�21

�22

IM + �T�)�1⌃

** **

Test data

Page 52: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Predictive Distribution

We obtain the predictive distribution by integrating over all possible model parameters (“inference”):

This distribution can be computed in closed form, because both terms on the RHS are Gaussian. From above we have where and

Parameter posterior Test data likelihood

!52

p(w | t,x) = N (w;µ,⌃)

µ = ��21 ⌃�T t

= �21(�21

�22

IM + �T�)�1⌃w⇤ = (�IM + �T�)�1�T t ) µ

MAP solution

Test data

** **

Page 53: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Predictive Distribution

Using formula III. from above (linear Gaussian),

where

!53

=

ZN (t;�(x)Tw,�)N (w;µ,⌃)dw

= N (t;�(x)Tµ,�2N (x))

�2N (x) = �2 + �(x)T⌃�(x)

** **

**

**

*

Page 54: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

The Predictive Distribution (2)

• Example: Sinusoidal data, 9 Gaussian basis functions, 1 data point

From: C.M. Bishop

Some samples from the posterior

The predictive distribution

!54

Page 55: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Predictive Distribution (3)

• Example: Sinusoidal data, 9 Gaussian basis functions, 2 data points

From: C.M. Bishop

Some samples from the posterior

The predictive distribution

!55

Page 56: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Predictive Distribution (4)

• Example: Sinusoidal data, 9 Gaussian basis functions, 4 data points

From: C.M. Bishop

Some samples from the posterior

The predictive distribution

!56

Page 57: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Predictive Distribution (5)

• Example: Sinusoidal data, 9 Gaussian basis functions, 25 data points

From: C.M. Bishop

Some samples from the posterior

The predictive distribution

!57

Page 58: Example with Numbers - vision.in.tum.de · • Regression learns a function, classification usually learns class labels For now we will treat regression!15. PD Dr. Rudolph Triebel

PD Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision

Summary

• Regression can be expressed as a least-squares problem

• To avoid overfitting, we need to introduce a regularisation term with an additional parameter λ • Regression without regularisation is equivalent to

Maximum Likelihood Estimation

• Regression with regularisation is Maximum A-Posteriori

• When using Gaussian priors (and Gaussian noise), all computations can be done analytically

• This gives a closed form of the parameter posterior and the predictive distribution

!58