cmsc 422: machine learning linear regression kalman filters · 2018. 9. 18. · •prediction (ŷ-k...

46
CMSC 422: Machine Learning Linear Regression Kalman Filters William Regli, Professor of Computer Science Slide Credits: Andrew W. Moore, School of Computer Science, Carnegie Mellon University Hal Daumé, Furong Huang, Marine Carpuat, Computer Science Department, U of Maryland Other slides and images attributed as best one can, apologies for any errors or omissions, used either with permission or under Fair Use (https://www.copyright.gov/fls/fl102.html)

Upload: others

Post on 30-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

CMSC 422: Machine Learning

Linear RegressionKalman Filters

William Regli, Professor of Computer Science

Slide Credits: Andrew W. Moore, School of Computer Science, Carnegie Mellon UniversityHal Daumé, Furong Huang, Marine Carpuat, Computer Science Department, U of MarylandOther slides and images attributed as best one can, apologies for any errors or omissions, used either with permission or under Fair Use (https://www.copyright.gov/fls/fl102.html)

Page 2: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Regression Models• Learning a functional relationship about a real-valued

number, i.e., when y is tomorrow’s temperature.• Technically, solving a regression problem is finding a

conditional expectation or average value of y,• the probability that we have found exactly the right real-valued

number for y is 0.

• Regression models capture the relationship between one dependent variable and explanatory variable(s)

• Use equation to set up relationship• Numerical Dependent (Response) Variable• 1 or More Numerical or Categorical Independent (Explanatory)

Variables

• Used Mainly for Prediction & Estimation

2

Page 3: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Copyright © 2001, 2003, Andrew W. Moore

Linear Regression

Linear regression assumes that the expected value of the output given an input, E[y|x], is linear.

Simplest case: Out(x) = wx for some unknown w.

Given the data, we can estimate w.

inputs outputsx1 = 1 y1 = 1

x2 = 3 y2 = 2.2

x3 = 2 y3 = 2

x4 = 1.5 y4 = 1.9

x5 = 4 y5 = 3.1

DATASET

¬ 1 ®

­w¯

Page 4: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Copyright © 2001, 2003, Andrew W. Moore

1-parameter linear regressionAssume that the data is formed by

yi = wxi + noiseiwhere…• the noise signals are independent• the noise has a normal distribution with

mean 0 and unknown variance σ2

p(y|w,x) has a normal distribution with• mean wx• variance σ2

Page 5: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Regression examples

Page 6: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Prediction of menu pricesChaheau Gimpel … and Smith EMNLP 2012

Page 7: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

7

Types of Regression Models

RegressionModels

Linear Non-Linear

2+ ExplanatoryVariables

Simple Multiple

Linear

1 ExplanatoryVariable

Non-Linear

Page 8: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

8

Regression Modeling Steps

• Hypothesize Deterministic Component• Estimate Unknown Parameters

• Specify Probability Distribution of Random Error Term

• Estimate Standard Deviation of Error

• Evaluate the fitted Model• Use Model for Prediction & Estimation

Page 9: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Linear regression

• Given an input x we would like to compute an output y

• For example:• Predict height from age• Predict Google’s price from

Apple‘s price• Predict distance from wall

from sensors• BMI based on height and

weight• Papers published based on

age

X

Y

Page 10: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Linear regression• Given an input x we would like to

compute an output y• In linear regression we assume

that y and x are related with the following equation:

y = wx+ewhere w is a parameter and erepresents measurement or other noise

X

Y

What we are trying to predict

Observed values

Page 11: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Our goal is to estimate w from a training data of <xi,yi> pairs

Optimization goal: minimize squared error (least squares)

Why least squares?

• minimizes squared distance between measurements and predicted line

• has a nice probabilistic interpretation

Linear regression

∑ −i

iiw wxy 2)(minargX

Y

ε+= wxy

Page 12: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Regression example

• Generated: w=2• Recovered: w=2.03• Noise: std=1

Page 13: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Regression example

• Generated: w=2• Recovered: w=2.05• Noise: std=2

Page 14: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Regression example

• Generated: w=2• Recovered: w=2.08• Noise: std=4

Page 15: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Bias term

• What if the line does not pass through the origin?

• No problem, simply change the model to

y = w0 + w1x+e

• Can use least squares to determine w0 , w1

n

xwyw i

ii∑ −=

1

0

X

Y

w0

∑ −=

ii

iii

x

wyxw 2

0

1

)(

Page 16: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Data points of price versus floor space of houses for sale in Berkeley, CA, in July ‘09:

9/17/18 18

Regression function hypothesis that minimizes squared error loss

Page 17: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Plot of the loss function

9/17/18 19

Note: shape is convexone global minima

Page 18: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Multivariate regression

• What if we have several inputs?Stock prices for Apple, Microsoft and Amazon for the Google prediction task

Multivariate Regression• Model:

y = w0 + w1x1+ … + wkxk + e

Google’s stock price

Apple’s stock price

Microsoft’s stock price

Page 19: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Non-Linear basis function

• So far we only used the observed values x1,x2,… directly• However, linear regression can be applied in the same

way to functions of these values• E.g., to add a term w x1x2 add a new variable z=x1x2 so each

example becomes: x1, x2, …. z

• As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a multi-variate linear regression problem

e++++= 22110 kk xwxwwy !

Page 20: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Non-linear basis functions

• What type of functions can we use?

• A few common examples:

- Polynomial: fj(x) = xj for j=0 … n

- Gaussian:

- Sigmoid:

- Logs:

φ j (x) =(x −µ j )2σ j

2

φ j (x) =1

1+ exp(−s j x)

Any function of the input values can be used. The solution for the parameters of the regression remains the same.

φ j (x) = log(x +1)

Page 21: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

General linear regression problem

• Using our new notations for the basis function linear regression can be written as

• Where fj(x) can be either xj for multivariate regression or one of the non-linear basis functions we defined

• … and f0(x)=1 for the intercept term€

y = w jφ j (x)j= 0

n

Page 22: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

26

Introduction to Kalman Filters

Page 23: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

27

The Problem

• Why do we need Kalman Filters?• What is a Kalman Filter?• Conceptual Overview• The Theory of Kalman Filter• Simple Example

Page 24: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

9/17/18 28

Page 25: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

29

• System state cannot be measured directly• Need to estimate “optimally” from measurements

Measuring Devices Estimator

MeasurementError Sources

System State (desired but not known)

External Controls

Observed Measurements

Optimal Estimate of

System State

SystemError Sources

System

Black Box

Page 26: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

30

What is a Kalman Filter?• Recursive data processing algorithm• Generates optimal estimate of desired quantities

given the set of measurements• Optimal?

• For linear system and Gaussian errors, Kalman filter is “best” estimate based on all previous measurements

• For non-linear system optimality is ‘qualified’• Recursive?

• Doesn’t need to store all previous measurements and reprocess all data each time step

Page 27: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

31

Conceptual Overview

• Lost on the 1-dimensional line• Position: y(t)• Assume Gaussian distributed measurements

y

Page 28: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

32

Conceptual Overview

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

• Sextant Measurement at t1: Mean = z1 and Variance = s2z1

• Optimal estimate of position is: ŷ(t1) = z1

• Variance of error in estimate: s2x (t1) = s2

z1

• Boat in same position at time t2 - Predicted position is z1

Page 29: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

33

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Conceptual Overview

• So we have the prediction ŷ-(t2)• GPS Measurement at t2: Mean = z2 and Variance = sz2

• Need to correct the prediction due to measurement to get ŷ(t2)• Closer to more trusted measurement – linear interpolation?

prediction ŷ-(t2)measurement z(t2)

Page 30: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

34

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Conceptual Overview

• Corrected mean is the new optimal estimate of position• New variance is smaller than either of the previous two variances

measurement z(t2)

corrected optimal estimate ŷ(t2)

prediction ŷ-(t2)

Page 31: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

35

Conceptual Overview

Basic ideas:

Make prediction based on previous data: ŷ-, s-

Take measurement: zk, sz

Optimal estimate (ŷ) = Prediction + (Kalman Gain) * (Measurement - Prediction)

Variance of estimate = Variance of prediction * (1 – Kalman Gain)

Page 32: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

36

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Conceptual Overview

• At time t3, boat moves with velocity dy/dt=u• Naïve approach: Shift probability to the right to predict• This would work if we knew the velocity exactly (perfect model)

ŷ(t2)Naïve Prediction ŷ-(t3)

Page 33: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

37

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Conceptual Overview

• Better to assume imperfect model by adding Gaussian noise

• dy/dt = u + w

• Distribution for prediction moves and spreads out

ŷ(t2)

Naïve Prediction

ŷ-(t3)

Prediction ŷ-(t3)

Page 34: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

38

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Conceptual Overview

• Now we take a measurement at t3• Need to once again correct the prediction• Same as before

Prediction ŷ-(t3)

Measurement z(t3)

Corrected optimal estimate ŷ(t3)

Page 35: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

39

Conceptual OverviewSummary:• Initial conditions (ŷk-1 and sk-1)• Prediction (ŷ-

k , s-k)

• Use initial conditions and model (e.g., constant velocity) to make prediction

• Measurement (zk)• Take measurement

• Correction (ŷk , sk)• Use measurement to correct prediction by ‘blending’

prediction and residual – always a case of merging only two Gaussians

• Optimal estimate with smaller variance

Page 36: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

40

Theoretical Basis

• Process to be estimated:

yk = Ayk-1 + Buk + wk-1

zk = Hyk + vk

Process Noise (w) with covariance Q

Measurement Noise (v) with covariance R

• Kalman Filter

Predicted: ŷ-k is estimate based on measurements at previous time-steps

ŷk = ŷ-k + K(zk - H ŷ-

k )

Corrected: ŷk has additional information – the measurement at time k

K = P-kHT(HP-

kHT + R)-1

ŷ-k = Ayk-1 + Buk

P-k = APk-1AT + Q

Pk = (I - KH)P-k

Page 37: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

41

Kalman Filter Algorithm (notation abuse) Algorithm Kalman_filter( µt-1, St-1, ut, zt):

Prediction:

Correction:

Return µt, St

ttttt uBA += -1µµ

tTtttt RAA +S=S -1

1)( -+SS= tTttt

Tttt QCCCK

)( tttttt CzK µµµ -+=

tttt CKI S-=S )(

Page 38: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

42

Theoretical Basis

ŷ-k = Ayk-1 + Buk

P-k = APk-1AT + Q

Prediction (Time Update)

(1) Project the state ahead

(2) Project the error covariance ahead

Correction (Measurement Update)

(1) Compute the Kalman Gain

(2) Update estimate with measurement zk

(3) Update Error Covariance

ŷk = ŷ-k + K(zk - H ŷ-

k )

K = P-kHT(HP-

kHT + R)-1

Pk = (I - KH)P-k

Page 39: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

43

The Prediction-Correction-CyclePrediction

Page 40: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

44

The Prediction-Correction-Cycle

Correction

Page 41: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

45

Theoretical Basis

ŷ-k = Ayk-1 + Buk

P-k = APk-1AT + Q

Prediction (Time Update)

(1) Project the state ahead

(2) Project the error covariance ahead

Correction (Measurement Update)

(1) Compute the Kalman Gain

(2) Update estimate with measurement zk

(3) Update Error Covariance

ŷk = ŷ-k + K(zk - H ŷ-

k )

K = P-kHT(HP-

kHT + R)-1

Pk = (I - KH)P-k

Page 42: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

46

Kalman Filter Summary• Highly efficient:

Polynomial in measurement dimensionality k and state dimensionality n:

O(k2.376 + n2)

• Optimal for linear Gaussian systems!

• Most robotics systems are nonlinear!

Page 43: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

9/17/18 47

Page 44: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Relating Regression to Kalman Filters

Kalman Filter• estimate the state• discrete-time process• linear stochastic

difference equation

Linear Regression• Estimate value• Finite set of data/values• Linear equations with

gaussian noise

9/17/18 48

Page 45: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

Relating Regression to Kalman Filters

• Kalman Filter produces “real-time” estimates of the coefficients of a linear regression

• Kalman filter is linear optimal estimator, it infers parameters from indirect, inaccurate and uncertain observations

• For Gaussian noise, the Kalman filter minimizes the mean square error of the estimated parameters

• Faulty intuition: Kalman filter is used for prediction of future events based on past data where as regression (least squares) does smoothing within end to end points

• This is not really true…both the estimators (and almost all estimators you can think of) can do either job.

9/17/18 49

Page 46: CMSC 422: Machine Learning Linear Regression Kalman Filters · 2018. 9. 18. · •Prediction (ŷ-k , s-k) •Use initial conditions and model (e.g., constant velocity) to make prediction

END

9/17/18 50